prism-coder-8b / README.md
dcostenco's picture
Upload README.md with huggingface_hub
9ae524d verified
metadata
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-aac
  - qwen3
  - gguf
base_model: Qwen/Qwen3-8B

prism-coder:8b β€” Tool Routing Model (iOS / Edge Tier)

Fine-tuned Qwen3-8B for 6-tool routing in the Prism AAC system. Primary deployment: iOS and edge devices via llama.cpp GGUF.

BFCL Routing Benchmark β€” v36 (Current)

Mean: 100.0% (3-seed average, seeds 2027/2028/2029, 102 cases each)

Category Count Description Accuracy
aac 12 AAC phrase requests β†’ plain text 100%
cmpct 6 Ledger compaction 100%
edge 6 Multi-step / compound requests 100%
hand 8 Agent handoff / relay 100%
info 5 General facts β†’ plain text 100%
irrel 10 Irrelevant / live queries β†’ plain text 100%
know 7 Knowledge base search 100%
load 9 Session context loading 100%
pred 8 Factual / knowledge queries β†’ plain text 100%
save 13 Session ledger save 100%
smem 12 Session memory search 100%
tran 6 Translation requests β†’ plain text 100%

Eval: MLX inference + thinking, temperature=0, 3-seed mean. Gate: β‰₯90% = deploy.

Cascade Benchmark (May 2026)

Full desktop cascade: 14b β†’ 32b β†’ Claude Opus (102 cases Γ— 3 seeds)

Metric Result
Cascade accuracy 100.0% (mean, 3 seeds)
Opus-solo etalon 98.3%
Ξ” vs Opus +1.7%
Traffic served by 14b 99% (101/102 cases avg)
Traffic escalated to 32b 1% (1/102 avg)
Traffic reaching Opus API 0%

Fine-tuned cascade outperforms Claude Opus on edge (+16.7%) and know (+14.3%).

Version History

Version BFCL Notes
v36 100.0% Fixed: smem "BFCL v4 notes" and "training loss" β†’ session_search_memory
v35 98.0% Proper safetensors merge β€” fixes mlx_lm.fuse LoRA loss
v32 98.0% Routing corpus v32_8b, direct safetensors merge
v31 95.1% Surgical smem/know boundary fix
v30 ~93% Baseline 8B routing

Tools

The model routes to exactly 6 tools:

Tool Trigger
session_load_context Load/resume project context
session_save_ledger Note/log/record/remember something
session_save_handoff Pass state to next agent/session
session_compact_ledger Shrink/prune ledger (no relay)
session_search_memory Recall prior session discussions
knowledge_search Search stored knowledge base

Plain text (no tool) for: AAC phrases, translations, weather, general facts, code, math.

Model Details

  • Base: Qwen/Qwen3-8B
  • Format: GGUF Q4_K_M (~4.9 GB)
  • Context: 32,768 tokens
  • Training: MLX LoRA, rank=16, 16 layers, 1000 iters, LR=2e-6, v36 corpus (806 examples)
  • Merge: mlx_lm.fuse β†’ llama.cpp convert β†’ Q4_K_M quantization

Usage

ollama pull dcostenco/prism-coder-8b
ollama run prism-coder:8b

Or in the Prism Coder IDE β€” set model to prism-coder:8b in Settings.