Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic
A Qwen3.5-27B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic to remove safety refusals while preserving reasoning quality.
What is this?
This model combines two things that didn't exist together before:
- Claude Opus 4.6 chain-of-thought reasoning distilled into the Qwen3.5-27B architecture
- Clean abliteration via Heretic, removing corporate safety theater without lobotomizing the model
The result is a 27B model that reasons like Claude โ structured analysis, commitment to answers, showing its work โ without refusing to engage with topics.
Abliteration Stats
- Tool: Heretic v1.2.0
- Refusals: 13/100
- KL Divergence: Low (model capabilities preserved)
- Targets:
attn.out_proj,mlp.down_proj
Architecture
This is not a standard transformer. Qwen3.5 uses a hybrid Gated DeltaNet + conventional attention architecture:
- 64 layers in a 3:1 pattern (3 DeltaNet linear attention โ 1 full softmax attention)
- DeltaNet layers use fixed-size recurrent state (O(1) memory per layer regardless of context)
- Attention layers serve as precision checkpoints every 4th layer
- 262K native context, extensible to 1M+
- Native multimodal โ vision built into the architecture, not bolted on
This means dramatically lower VRAM for KV cache compared to pure transformer models of the same size.
VRAM Requirements
| Quantization | VRAM (16K ctx) | VRAM (64K ctx) |
|---|---|---|
| BF16 (this repo) | ~54 GB | ~56 GB |
| Q8_0 GGUF | ~28 GB | ~30 GB |
| Q4_K_M GGUF | ~18 GB | ~20 GB |
Q4_K_M fits on a single RTX 3090/3090 Ti with room for 16K+ context.
Usage
With transformers
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic",
torch_dtype="bfloat16",
device_map="auto",
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
"ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic",
trust_remote_code=True
)
With llama.cpp (GGUF)
GGUF quants coming soon. Convert yourself:
python convert_hf_to_gguf.py \
ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic \
--outfile heretic-27b-F16.gguf --outtype f16
llama-quantize heretic-27b-F16.gguf heretic-27b-Q4_K_M.gguf Q4_K_M
Recommended inference settings
temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5
repetition_penalty: 1.05
Base Model
ZonoDilu/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled โ Claude Opus 4.6 reasoning distilled into Qwen3.5-27B using the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset.
Why this exists
The base model had Claude-quality reasoning but still carried Qwen's default safety restrictions. Existing abliterated Qwen3.5 models (like DavidAU's) use Gemini reasoning distillation instead of Claude. If you want Claude-style chain-of-thought without the corporate leash, this is it.
Made by
Ghost โ ghost-actual
Built with Heretic by p-e-w.
- Downloads last month
- 536
Model tree for ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic
Base model
Qwen/Qwen3.5-27B