Instructions to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF",
	filename="Qwen3.6-27B-UDT-Q3_K_XL_MTP.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP
# Run inference directly in the terminal:
llama-cli -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP
# Run inference directly in the terminal:
llama-cli -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP
# Run inference directly in the terminal:
./llama-cli -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Use Docker

docker model run hf.co/AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

LM Studio
Jan
Ollama
How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with Ollama:
```
ollama run hf.co/AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP
```

Unsloth Studio new

How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF to start chatting

Pi new

How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Run Hermes

hermes

Docker Model Runner
How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP
```

Lemonade

How to use AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL_MTP

Run and chat with the model

lemonade run user.Qwen3.6-27B-UDT-MTP-GGUF-Q4_K_XL_MTP

List all available models

lemonade list

Qwen 3.6 27B — UDT MTP GGUF

UDT (UD-Turbo) dynamic-imatrix quants of Qwen 3.6 27B (dense), built on top of atomic-llama-cpp-turboquant — a llama.cpp fork with TurboQuant WHT-rotated KV cache + shared-model NextN speculative decoding.

These are combined *_MTP.gguf files: the NextN auxiliary head ships inside the same GGUF as the target weights, so you point --model-draft at the same file and the server reuses the loaded llama_model (no second mmap, no second tokenizer).

UDT is not the same artifact as Unsloth's UD-* — it denotes our mask line on top of the same public MTP-aware imatrix.

Files

Quantized weights (combined `_MTP.gguf`, NextN head included)

File	Bits	Size	wikitext-2 PPL
`Qwen3.6-27B-UDT-Q3_K_XL_MTP.gguf`	~3.4	15.7 GiB	7.056 ± 0.047
`Qwen3.6-27B-UDT-Q4_K_XL_MTP.gguf`	~4.5	17.7 GiB	6.972 ± 0.046
`Qwen3.6-27B-UDT-Q5_K_XL_MTP.gguf`	~5.5	19.6 GiB	6.896 ± 0.045
`Qwen3.6-27B-UDT-Q6_K_MTP.gguf`	~6.5	21.6 GiB	6.929 ± 0.046
`Qwen3.6-27B-UDT-Q8_K_XL_MTP.gguf`	~8.0	25.5 GiB	(≈ BF16 reference)

PPL measured with llama-perplexity over wikitext-2-raw/wiki.test.raw, 580 chunks, n_ctx=512, NVIDIA H100. Recommended quant: Q4_K_XL — best PPL/size + smallest acceptable size to fit on a 24–32 GB GPU with TurboQuant3 KV. Use Q8_K_XL for near-lossless quality.

Multimodal projector (vision) — pass via `--mmproj`

File	Size	Notes
`mmproj-F16.gguf`	0.87 GiB	recommended default
`mmproj-BF16.gguf`	0.87 GiB	identical accuracy, BF16 storage

The projector is mirrored verbatim from unsloth/Qwen3.6-27B-MTP-GGUF — no changes from this repo, re-hosted for convenience so you can grab everything in one -hf line.

Importance matrix

File	Size	Source
`imatrix_unsloth.gguf_file`	13 MiB	`unsloth/Qwen3.6-27B-MTP-GGUF` (MTP-aware, 77 chunks) — re-hosted for reproducibility, all credit to Unsloth

What's special

UDT applies three layers on top of plain llama-quantize -tt:

MTP-aware imatrix — the imatrix_unsloth.gguf_file from Unsloth's Qwen3.6-27B-MTP-GGUF repo, calibrated with the NextN head active.
NEXTN-preserve mask — every blk.*.nextn.* and mtp.* tensor pinned to Q8_0. Cost: ~0 PPL, ~10 MB; gain: higher draft acceptance with speculative decoding.
TurboQuant3-friendly mask — attention Q/K (attn_q/attn_k) bumped to Q6_K to absorb the noise introduced by 3-bit KV compression (-ctk turbo3 -ctv turbo3).

The combined mask is scripts/quantize-masks/qwen36-ud-v3-combined.txt in the repo. Variant tags -V1, -V2, -base correspond to NEXTN-only, TurboQuant3-only, and the imatrix-only baseline; release files use the combined V3 mask.

Bench (MacBook Pro M4 Max, 40-core GPU, 48 GB, Metal, single slot)

A/B against unsloth/Qwen3.6-27B-MTP-GGUF UD-Q4_K_XL. Median TPS over 2 runs, --draft-max 2 --draft-min 1.

mode	reference `UD-Q4_K_XL`	UDT-Q4_K_XL	Δ tps	accept
`f16-base` (n=128 / n=512)	21.38 / 20.85	21.09 / 20.86	~equal	—
`turbo3-base`	20.18 / 20.02	19.57 / 19.20	−3%	—
`f16-nextn`	25.00 / 23.22	23.49 / 22.89	−3 / −1%	91.0 / 87.5 %
`turbo3-nextn`	21.93 / 20.60	23.32 / 21.78	+6 / +6%	95.4 / 84.8 %

UDT wins the recommended turbo3-nextn mode by +6 % tps and +10 pp acceptance (short) — the combination this mask was designed for.

Quick start

# llama.cpp build needs TurboQuant + NextN patches:
#   https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant

# 1) recommended: NextN + TurboQuant3 KV
llama-server \
  -hf  AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL \
  -hfd AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF:Q4_K_XL \
  --spec-type nextn --draft-max 2 --draft-min 1 \
  -c 8192 -ngl 99 -ngld 99 -fa on \
  -ctk turbo3 -ctv turbo3

# 2) or with a local file (point -m and -md at the same GGUF)
llama-server \
  -m  ./Qwen3.6-27B-UDT-Q4_K_XL_MTP.gguf \
  -md ./Qwen3.6-27B-UDT-Q4_K_XL_MTP.gguf \
  --spec-type nextn --draft-max 2 --draft-min 1 \
  -c 8192 -ngl 99 -ngld 99 -fa on \
  -ctk turbo3 -ctv turbo3

Helper script in the repo: scripts/run-qwen36-27b-nextn-server.sh.

Vision (multimodal)

Pass the projector with --mmproj:

llama-server \
  -m  ./Qwen3.6-27B-UDT-Q4_K_XL_MTP.gguf \
  --mmproj ./mmproj-F16.gguf \
  -c 8192 -ngl 99 -fa on

Reproduce

From a Unsloth BF16 MTP shard + Unsloth imatrix:

git clone https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant
cd atomic-llama-cpp-turboquant
cmake -B build -DGGML_CUDA=1 && cmake --build build -j --target llama-quantize

# download BF16 + imatrix to .scratch/qwen-ud-sources/27b/
bash scripts/qwen-udt/hf-download-sources.sh

# quantize Q4_K_XL with the V3 (release) mask
./scripts/quantize-qwen-udt.sh 27b Q4_K_M v3

Mask files: scripts/quantize-masks/qwen36-ud-{base,v1-nextn,v2-turbo3,v3-combined}.txt. Full runbook: docs/qwen-udt/RUNBOOK.md.

Credits & license

Qwen team (Qwen/Qwen3.6-27B) — base weights, Apache-2.0.
Unsloth (unsloth/Qwen3.6-27B-MTP-GGUF) — MTP-aware imatrix_unsloth.gguf_file and BF16 MTP source GGUFs. Huge thanks to the Unsloth team for releasing these public artifacts that made UDT possible.
@TheTom (TheTom/llama-cpp-turboquant) — original TurboQuant WHT-rotated quantization design.
AtomicChat — UDT mask recipes, NextN shared-model integration, benches, packaging. Repo: AtomicBot-ai/atomic-llama-cpp-turboquant.

License: Apache-2.0 (inherits from the upstream Qwen 3.6 weights).

Downloads last month: 2,392

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF

Base model

Qwen/Qwen3.6-27B

Quantized

(344)

this model

Collection including AtomicChat/Qwen3.6-27B-UDT-MTP-GGUF

Qwen 3.6 UDT MTP

Collection

Dynamic-imatrix GGUF quants of Qwen 3.6 27B & 35B-A3B. TurboQuant3 KV + shared-model NextN ready. • 2 items • Updated 6 days ago • 3