⚠️ Conference talk demo β€” not production weights.

This model accompanies a conference keynote on local on-device AI. Published as a reference for the fine-tuning patterns shown on stage β€” not a deployable artefact. No security audit, no SLA, pinned to the talk's state.


Gemma3-4B FT (f16) β€” RAG Synthesis (+ Vision)

Base model google/gemma-3-4b-it (4.3B params, multimodal: text + vision via mmproj)
License Gemma Terms of Use
Training script finetune/train_gemma3_4b.py
Method LoRA r=16, Ξ±=32, 3 epochs, lr=5e-5
Training data data/training-data/gemma3_4b_synthesis_{scenario}.jsonl (RAG passages + grounded answers)
Hardware tested RTX PRO 6000 (CUDA). MPS works but slow; QLoRA via --qlora for ≀24GB VRAM
Intended use RAG response synthesis β€” given retrieved passages and a user question, produce a grounded, source-faithful answer. The vision channel (mmproj) remains base-only.
Out of scope Tool calling (delegated to Qwen3.5-4B FT). Free-form chat without retrieved context.
Reference eval (Nextera) RAG keyword grounding: 96% on 25-query holdout. See docs/benchmarks/EVAL_RESULTS_*.md.
Known failure modes Will occasionally synthesise across documents that share lexical overlap but different domains β€” mitigated by the rewrite-query step that pre-filters retrieval.
Downloads last month
35
GGUF
Model size
4B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for thinktecture/gemma3-4b-ft-nextera-f16

Quantized
(221)
this model