Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic

A Qwen3.5-27B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic to remove safety refusals while preserving reasoning quality.

What is this?

This model combines two things that didn't exist together before:

  • Claude Opus 4.6 chain-of-thought reasoning distilled into the Qwen3.5-27B architecture
  • Clean abliteration via Heretic, removing corporate safety theater without lobotomizing the model

The result is a 27B model that reasons like Claude โ€” structured analysis, commitment to answers, showing its work โ€” without refusing to engage with topics.

Abliteration Stats

  • Tool: Heretic v1.2.0
  • Refusals: 13/100
  • KL Divergence: Low (model capabilities preserved)
  • Targets: attn.out_proj, mlp.down_proj

Architecture

This is not a standard transformer. Qwen3.5 uses a hybrid Gated DeltaNet + conventional attention architecture:

  • 64 layers in a 3:1 pattern (3 DeltaNet linear attention โ†’ 1 full softmax attention)
  • DeltaNet layers use fixed-size recurrent state (O(1) memory per layer regardless of context)
  • Attention layers serve as precision checkpoints every 4th layer
  • 262K native context, extensible to 1M+
  • Native multimodal โ€” vision built into the architecture, not bolted on

This means dramatically lower VRAM for KV cache compared to pure transformer models of the same size.

VRAM Requirements

Quantization VRAM (16K ctx) VRAM (64K ctx)
BF16 (this repo) ~54 GB ~56 GB
Q8_0 GGUF ~28 GB ~30 GB
Q4_K_M GGUF ~18 GB ~20 GB

Q4_K_M fits on a single RTX 3090/3090 Ti with room for 16K+ context.

Usage

With transformers

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic",
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic",
    trust_remote_code=True
)

With llama.cpp (GGUF)

GGUF quants coming soon. Convert yourself:

python convert_hf_to_gguf.py \
    ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic \
    --outfile heretic-27b-F16.gguf --outtype f16

llama-quantize heretic-27b-F16.gguf heretic-27b-Q4_K_M.gguf Q4_K_M

Recommended inference settings

temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5
repetition_penalty: 1.05

Base Model

ZonoDilu/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled โ€” Claude Opus 4.6 reasoning distilled into Qwen3.5-27B using the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset.

Why this exists

The base model had Claude-quality reasoning but still carried Qwen's default safety restrictions. Existing abliterated Qwen3.5 models (like DavidAU's) use Gemini reasoning distillation instead of Claude. If you want Claude-style chain-of-thought without the corporate leash, this is it.

Made by

Ghost โ€” ghost-actual

Built with Heretic by p-e-w.

Downloads last month
536
Safetensors
Model size
27B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic