Qwen3.5-35B-A3B-GPTQ-Int4

GPTQ INT4 quantization of Qwen/Qwen3.5-35B-A3B.

Quantization Details

Parameter Value
Method GPTQ
Bits 4
Group Size 128
Desc Act True
Symmetric False
Calibration WikiText-2

Model Architecture

  • Type: Mixture-of-Experts (MoE) with linear + full attention
  • Experts: 256 per layer, top-8 routing
  • Layers: 40 (30 linear attention + 10 full attention)
  • Hidden Size: 2048
  • Parameters: ~35B total, ~3B active

Usage

from gptqmodel import GPTQModel
from transformers import AutoTokenizer

model = GPTQModel.from_quantized("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")
tokenizer = AutoTokenizer.from_pretrained("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")

inputs = tokenizer("Hello, world!", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgments

Quantized using GPTQModel. Base model by Qwen.

Downloads last month
9
Safetensors
Model size
36B params
Tensor type
F32
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Hybrid

Quantized
(236)
this model