Qwen3.5-35B-A3B-GPTQ-Int4
GPTQ INT4 quantization of Qwen/Qwen3.5-35B-A3B.
Quantization Details
| Parameter | Value |
|---|---|
| Method | GPTQ |
| Bits | 4 |
| Group Size | 128 |
| Desc Act | True |
| Symmetric | False |
| Calibration | WikiText-2 |
Model Architecture
- Type: Mixture-of-Experts (MoE) with linear + full attention
- Experts: 256 per layer, top-8 routing
- Layers: 40 (30 linear attention + 10 full attention)
- Hidden Size: 2048
- Parameters: ~35B total, ~3B active
Usage
from gptqmodel import GPTQModel
from transformers import AutoTokenizer
model = GPTQModel.from_quantized("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")
tokenizer = AutoTokenizer.from_pretrained("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")
inputs = tokenizer("Hello, world!", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Acknowledgments
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support