Qwen3-0.6B-lk-alpha-20k-MNN

Qwen3-0.6B draft model exported for TokForge + MNN speculative decoding, trained with an LK Alpha objective instead of standard KL.

This bundle is aimed at people who want to experiment with acceptance-oriented draft training on mobile.

Why this repo exists

This is the LK-loss variant of our 20K Qwen3 draft lane:

  • Qwen3-0.6B student
  • Qwen3-8B teacher
  • 20K teacher dataset
  • LK Alpha training objective
  • exported as a ready-to-use MNN draft bundle

Best-known use

  • Draft model backend: CPU
  • Draft threads: 2
  • Draft predict length: d=3
  • Target pairing: usually Qwen3-8B in TokForge

Benchmark snapshot

On RedMagic SM8850 with Qwen3-8B target:

  • AR baseline: 13.9 tok/s
  • This draft model: 17.7 tok/s
  • Uplift: about +27%

Training acceptance (alpha) at the final logged epoch:

  • 0.7350

Included files

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • config.json
  • config_cpu.json
  • tokenizer files
  • ONNX export artifact for reference

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "opencl",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Known-good draft-side config:

{
  "backend_type": "cpu",
  "thread_num": 2,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy"
}

Notes

  • In our testing, this trained objective improved acceptance over the KL baseline.
  • On short device benchmarks, the runtime win was in the same general band as the KL model.
  • This makes it a good experimental alternative, but not a guaranteed universal replacement.

Limitations and Intended Use

  • Intended for speculative decoding with larger Qwen3 targets inside TokForge.
  • Training acceptance improved over the KL baseline, but device throughput gains stayed in a similar band on short runs.
  • Best current evidence is strongest on Qwen3-8B.
  • This is a specialized runtime artifact, not a general-purpose pretrained release.

Collection

TokForge

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month
156
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-0.6B-lk-alpha-20k-MNN

Finetuned
Qwen/Qwen3-0.6B
Quantized
(287)
this model

Collection including darkmaniac7/Qwen3-0.6B-lk-alpha-20k-MNN