MedQA-Llama3.1-8B-SFT-Small

QLoRA fine-tune of Llama-3.1-8B-Instruct on 50,000 medical Q&A pairs with context from BrainHealthAI/MedQA_mutilangual. Companion to the larger trilingual-trained MedQA-Llama3.1-8B-SFT-Big. Part of the BRAIN HEALTH / Operation HELIX-FT project.

Output format: wraps the final answer in <answer>...</answer>, replies in the question's language.

Training data

Source: BrainHealthAI/MedQA_mutilangual
Sampled: 50,000 rows (out of 76,382 train+test)
Schema: (context_question, question, answer, language, speciality)
Training prompt: f"{context_question}, {question}"
No KG augmentation (per design — purpose is to compare against KG-augmented SFT-Big)

Training recipe (QLoRA)

Setting	Value
Base model	`meta-llama/Llama-3.1-8B-Instruct`
Quantization	4-bit NF4 + double quant
LoRA rank / α / dropout	64 / 128 / 0.1
Effective batch size	16 (per_device 2 × grad_accum 8)
Learning rate	2e-4 cosine, warmup 0.03
Epochs	3 (interrupted at intermediate checkpoint)
Max sequence length	2048
Hardware	RunPod L40S 48 GB

Training note

Training was interrupted by a volume-disk-full incident before the final epoch could complete its save. The published adapter is the latest valid intermediate checkpoint automatically saved by the Hugging Face trainer. Eval performance is expected to be near-final but slightly under what a complete 3-epoch run would have achieved.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small")
tok = AutoTokenizer.from_pretrained("BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small")

msgs = [
    {"role": "system", "content": "You are a careful medical assistant. Provide your final answer between <answer>...</answer>."},
    {"role": "user",   "content": "Question: My wife started Pradaxa a week ago. What happens if she stops it abruptly?"},
]
inputs = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))

Companion model

🔗 Williamsanderson/MedQA-Llama3.1-8B-SFT-Big — trilingual trained on 50K stratified samples + Dorosz KG. Best eval loss 0.768.

Limitations

Prototype R&D only — not a certified medical device.
Early termination: slightly under-trained vs spec.
Limited language coverage (mostly EN + some FR).
No knowledge-graph grounding for drug facts.

References

QLoRA: Dettmers et al. (2023). arXiv:2305.14314
Llama-3.1: Grattafiori et al. (2024). arXiv:2407.21783

Citation

@misc{medqa_sft_small_2026,
  title  = { MedQA-Llama3.1-8B-SFT-Small: Medical QA via QLoRA SFT on Llama-3.1-8B },
  author = { BRAIN HEALTH project — Operation HELIX-FT },
  year   = { 2026 },
  url    = { https://huggingface.co/BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small }
}

Downloads last month: 62

Model tree for BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2268)

this model

Dataset used to train BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small

Papers for BrainHealthAI/MedQA-Llama3.1-8B-SFT-Small

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 119

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 61