Instructions to use BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small") - Notebooks
- Google Colab
- Kaggle
MedQA-Llama3.1-8B-SFT-Small
QLoRA fine-tune of Llama-3.1-8B-Instruct on 50,000 medical Q&A pairs with
context from BrainHealthAI/MedQA_mutilangual. Companion to the larger
trilingual-trained MedQA-Llama3.1-8B-SFT-Big.
Part of the BRAIN HEALTH / Operation HELIX-FT project.
Output format: wraps the final answer in
<answer>...</answer>, replies in the question's language.
Training data
- Source:
BrainHealthAI/MedQA_mutilangual - Sampled: 50,000 rows (out of 76,382 train+test)
- Schema:
(context_question, question, answer, language, speciality) - Training prompt:
f"{context_question}, {question}" - No KG augmentation (per design — purpose is to compare against KG-augmented SFT-Big)
Training recipe (QLoRA)
| Setting | Value |
|---|---|
| Base model | meta-llama/Llama-3.1-8B-Instruct |
| Quantization | 4-bit NF4 + double quant |
| LoRA rank / α / dropout | 64 / 128 / 0.1 |
| Effective batch size | 16 (per_device 2 × grad_accum 8) |
| Learning rate | 2e-4 cosine, warmup 0.03 |
| Epochs | 3 (interrupted at intermediate checkpoint) |
| Max sequence length | 2048 |
| Hardware | RunPod L40S 48 GB |
Training note
Training was interrupted by a volume-disk-full incident before the final epoch could complete its save. The published adapter is the latest valid intermediate checkpoint automatically saved by the Hugging Face trainer. Eval performance is expected to be near-final but slightly under what a complete 3-epoch run would have achieved.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small")
tok = AutoTokenizer.from_pretrained("BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small")
msgs = [
{"role": "system", "content": "You are a careful medical assistant. Provide your final answer between <answer>...</answer>."},
{"role": "user", "content": "Question: My wife started Pradaxa a week ago. What happens if she stops it abruptly?"},
]
inputs = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))
Companion model
🔗 Williamsanderson/MedQA-Llama3.1-8B-SFT-Big — trilingual trained on 50K stratified samples + Dorosz KG. Best eval loss 0.768.
Limitations
- Prototype R&D only — not a certified medical device.
- Early termination: slightly under-trained vs spec.
- Limited language coverage (mostly EN + some FR).
- No knowledge-graph grounding for drug facts.
References
- QLoRA: Dettmers et al. (2023). arXiv:2305.14314
- Llama-3.1: Grattafiori et al. (2024). arXiv:2407.21783
Citation
@misc{medqa_sft_small_2026,
title = { MedQA-Llama3.1-8B-SFT-Small: Medical QA via QLoRA SFT on Llama-3.1-8B },
author = { BRAIN HEALTH project — Operation HELIX-FT },
year = { 2026 },
url = { https://huggingface.co/BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small }
}
- Downloads last month
- 62
Model tree for BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small
Base model
meta-llama/Llama-3.1-8B