TokForge Mobile Draft Models
Collection
Small MNN draft models and speculative-decoding bundles for TokForge on Android. Includes practical Qwen3 0.6B drafts plus experimental variants. • 5 items • Updated
Qwen3-0.6B draft model exported for TokForge + MNN speculative decoding, trained with an LK Alpha objective instead of standard KL.
This bundle is aimed at people who want to experiment with acceptance-oriented draft training on mobile.
This is the LK-loss variant of our 20K Qwen3 draft lane:
Qwen3-0.6B studentQwen3-8B teacher20K teacher datasetLK Alpha training objectiveMNN draft bundleCPU2d=3Qwen3-8B in TokForgeOn RedMagic SM8850 with Qwen3-8B target:
13.9 tok/s17.7 tok/s+27%Training acceptance (alpha) at the final logged epoch:
0.7350llm.mnnllm.mnn.weightllm_config.jsonconfig.jsonconfig_cpu.jsonThis bundle is meant for TokForge / MNN, not standard HF Inference.
Typical TokForge recipe:
{
"backend_type": "opencl",
"thread_num": 4,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 3,
"draft_config_path": "/path/to/config_cpu.json"
}
Known-good draft-side config:
{
"backend_type": "cpu",
"thread_num": 2,
"precision": "low",
"memory": "low",
"sampler_type": "greedy"
}
Qwen3 targets inside TokForge.Qwen3-8B.If you benchmark this on your own device, feel free to share results in Discord.