MedQA-Llama3.1-8B-SFT-Small

QLoRA fine-tune of Llama-3.1-8B-Instruct on 50,000 medical Q&A pairs with context from BrainHealthAI/MedQA_mutilangual. Companion to the larger trilingual-trained MedQA-Llama3.1-8B-SFT-Big. Part of the BRAIN HEALTH / Operation HELIX-FT project.

Output format: wraps the final answer in <answer>...</answer>, replies in the question's language.

Training data

  • Source: BrainHealthAI/MedQA_mutilangual
  • Sampled: 50,000 rows (out of 76,382 train+test)
  • Schema: (context_question, question, answer, language, speciality)
  • Training prompt: f"{context_question}, {question}"
  • No KG augmentation (per design — purpose is to compare against KG-augmented SFT-Big)

Training recipe (QLoRA)

Setting Value
Base model meta-llama/Llama-3.1-8B-Instruct
Quantization 4-bit NF4 + double quant
LoRA rank / α / dropout 64 / 128 / 0.1
Effective batch size 16 (per_device 2 × grad_accum 8)
Learning rate 2e-4 cosine, warmup 0.03
Epochs 3 (interrupted at intermediate checkpoint)
Max sequence length 2048
Hardware RunPod L40S 48 GB

Training note

Training was interrupted by a volume-disk-full incident before the final epoch could complete its save. The published adapter is the latest valid intermediate checkpoint automatically saved by the Hugging Face trainer. Eval performance is expected to be near-final but slightly under what a complete 3-epoch run would have achieved.

SFT-Small training curves

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small")
tok = AutoTokenizer.from_pretrained("BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small")

msgs = [
    {"role": "system", "content": "You are a careful medical assistant. Provide your final answer between <answer>...</answer>."},
    {"role": "user",   "content": "Question: My wife started Pradaxa a week ago. What happens if she stops it abruptly?"},
]
inputs = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))

Companion model

🔗 Williamsanderson/MedQA-Llama3.1-8B-SFT-Big — trilingual trained on 50K stratified samples + Dorosz KG. Best eval loss 0.768.

Limitations

  • Prototype R&D only — not a certified medical device.
  • Early termination: slightly under-trained vs spec.
  • Limited language coverage (mostly EN + some FR).
  • No knowledge-graph grounding for drug facts.

References

Citation

@misc{medqa_sft_small_2026,
  title  = { MedQA-Llama3.1-8B-SFT-Small: Medical QA via QLoRA SFT on Llama-3.1-8B },
  author = { BRAIN HEALTH project — Operation HELIX-FT },
  year   = { 2026 },
  url    = { https://huggingface.co/BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small }
}
Downloads last month
62
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small

Adapter
(2268)
this model

Dataset used to train BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small

Papers for BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small