Instructions to use OsaurusAI/Hy3-preview-JANGTQ_K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/Hy3-preview-JANGTQ_K with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OsaurusAI/Hy3-preview-JANGTQ_K") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use OsaurusAI/Hy3-preview-JANGTQ_K with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/Hy3-preview-JANGTQ_K"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OsaurusAI/Hy3-preview-JANGTQ_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OsaurusAI/Hy3-preview-JANGTQ_K with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/Hy3-preview-JANGTQ_K"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OsaurusAI/Hy3-preview-JANGTQ_K
Run Hermes
hermes
- MLX LM
How to use OsaurusAI/Hy3-preview-JANGTQ_K with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "OsaurusAI/Hy3-preview-JANGTQ_K"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "OsaurusAI/Hy3-preview-JANGTQ_K" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OsaurusAI/Hy3-preview-JANGTQ_K", "messages": [ {"role": "user", "content": "Hello"} ] }'

Hy3-preview-JANGTQ_K
Tencent Hy3-preview — 102 GB on disk (down from ~557 GB BF16 source) —
mixed-bit JANGTQ_K quantization on routed experts + 8-bit affine
elsewhere. ~30 % bigger than Hy3-preview-JANGTQ (2-bit on routed
experts) in exchange for a measurable quality bump on down_proj
sensitivity, especially on long-output generation.
- Source: tencent/Hy3-preview (Hy3 architecture, 295 B total / 21 B active, BF16 native, 256 K context, 80 transformer layers + 1 MTP, 192 routed experts top-8 + 1 shared)
- Quantization: mixed-bit MXTQ on routed experts:
down_proj: 4-bit (4096-out, residual-stream sensitive)gate_proj: 2-bit (gated by SwiGLU)up_proj: 2-bit (multiplied with gate)- attention / shared expert / dense layer-0 / embed / lm_head / MTP matmuls: 8-bit affine
- RMSNorms / router gate /
expert_bias: fp16 / fp32 passthrough
- MTP: layer 80 weights preserved (
mtp_mode=preserved_disabled); decode is one-token-per-forward until accept/reject speculative loop ships. - Bundle size: 102 GB on-disk across 109 shards
- Runs on: M4 Max 128 GB / M5 Max 128 GB / Mac Studio 192 GB+
What's in the bundle
| Module | Source dtype | Bundle dtype |
|---|---|---|
| Routed experts (192 × 3 mats × 79 sparse layers, per-expert layout) | BF16 | JANGTQ_K: down 4-bit, gate/up 2-bit |
| Attention q/k/v/o + q/k norms | BF16 | 8-bit affine g=64 |
| Shared expert (gate/up/down) | BF16 | 8-bit affine g=64 |
| Dense layer-0 MLP | BF16 | 8-bit affine g=64 |
embed_tokens / lm_head |
BF16 | 8-bit affine g=64 |
| MTP layer matmuls | BF16 | 8-bit affine g=64 (preserved_disabled) |
RMSNorms / router.gate.weight / expert_bias |
BF16 / F32 | fp16 passthrough |
jangtq_runtime.safetensors sidecar (~22 KB) for Swift runtimes —
covers (in=1536, bits=4) + (in=4096, bits=2) codebooks + sign-flip
vectors (Hy3 routed projections have asymmetric [4096↔1536] dims).
Why mixed-bit?
Hy3 is top-8 routing, so JANGTQ (uniform 2-bit) already averages
codebook noise across 8 experts per token and ships coherent. JANGTQ_K
spends extra bits on down_proj — the projection whose output enters
the residual stream — to give long-output generation more headroom
before residual noise compounds. Same scheme that ZAYA1-8B-JANGTQ_K
ships for a strictly harder top-1 routing setup.
Loading (Python)
pip install jang-tools mlx-lm
from jang_tools.load_jangtq import load_jangtq_model
model, tokenizer = load_jangtq_model("OsaurusAI/Hy3-preview-JANGTQ_K")
chat = tokenizer.apply_chat_template(
[{{"role": "user", "content": "What is 2 + 2? Answer briefly."}}],
tokenize=False,
add_generation_prompt=True,
reasoning_effort="no_think",
)
load_jangtq_model auto-registers model_type=hy_v3 via
jang_tools.hy3 before building the MLX skeleton. The loader applies
the standard SwitchGLU fused gate+up + P15 router compile + P18 QKV
fusion patches automatically.
Reasoning + tools
- Reasoning parser:
qwen3(extracts<think>...</think>blocks) - Tool parser:
hunyuan(Tencent XML-like:<tool_calls><tool_call>name<tool_sep><arg_key>k</arg_key><arg_value>v</arg_value></tool_call></tool_calls>) - Reasoning effort:
no_think(default) |low|high— pass viaapply_chat_template(..., reasoning_effort="…") - Cache:
kv(standard GQA cache)
Runtime support matrix
| Surface | Status |
|---|---|
jang-tools Python (load_jangtq_model) |
✅ working — this README's load snippet |
vmlx-swift-lm Swift |
✅ working — Libraries/MLXLLM/Models/Hy3.swift + JANGTQ dispatch. Same family path that ships ZAYA and Bailing/Ling. |
vmlx_engine Python re-export |
pending |
| MTP speculative decode | preserved-disabled — weights present in bundle, accept/reject loop not yet implemented |
Credits
- Quantization + MLX runtime: Jinho Jang (eric@osaurus.ai)
- Source model: Tencent Hy3-preview team
- License: Tencent Hy Community License — non-commercial, EU/UK/SK excluded; consult the LICENSE for full terms
- Downloads last month
- 167
Quantized
Model tree for OsaurusAI/Hy3-preview-JANGTQ_K
Base model
tencent/Hy3-preview