mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-NVFP4 · AWQ 4-bit version of this Opus-Distilled-v2 model?

AWQ 4-bit version of this Opus-Distilled-v2 model?

by 0xburakcelik - opened 21 days ago

Hi,
Thank you for your excellent NVFP4 quantizations.
I'm using Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 (the v2 version with 14k Opus samples). It's currently the best reasoning model I have for coding and agent tasks - shorter CoT, better efficiency than base Qwen3.5-27B.

However, I'm on a single RTX 5090 and really want to run it with vLLM + FlashInfer to get MTP, continuous batching and higher speed.
Would you consider making an AWQ 4-bit version of this Opus-Distilled-v2 model?
The distillation dataset is public, so the data is already available. Many users with 40/50-series cards are waiting for a good AWQ quant of this specific model.
Thanks in advance!

Best regards

mconcat

Owner 20 days ago

omw

mconcat

Owner 19 days ago

https://huggingface.co/mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit

Let me know if there are any errors or problems

0xburakcelik

19 days ago

https://huggingface.co/mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit

Let me know if there are any errors or problems

Huge thanks for the lightning-fast!
Just 1 day after the request.
Now I can finally run this beast with vLLM + FlashInfer → MTP + continuous batching. Going from ~45 tok/s (GGUF) to potentially 150+ tok/s on a single 5090.
This is why open source is unbeatable. 🙏🔥

I'll provide feedback right away if I run into any issues.
Do you have any suggestions or tips before I start testing it with vLLM + FlashInfer?

ELVISIO

18 days ago

https://huggingface.co/mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit

Let me know if there are any errors or problems

@mconcat Thank you for your sharing!!! Would you consider making an 4B/9B AWQ 4bit model? Many Thanks!! I can run it on my laptop, lol

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment