Interest in a CPU optimizated version

#20
by 19440harry - opened

I've been running this model on an RTX 4070 Mobile and the reasoning quality is genuinely impressive, especially for a local 27B. It got me thinking about accessibility for users who don't have a discrete GPU.

The Byteshape project recently demonstrated CPU-viable deployment of Qwen3-Coder-30B-A3B on a Raspberry Pi 5: https://huggingface.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF

I understand that approach works partly because it's a MoE architecture with only 3B active parameters per forward pass, which is a different challenge from a dense 27B. But with aggressive quantization (Q2/Q3) and CPU-specific kernel optimization there may be a viable path here too.

A CPU-friendly build would open this model to a significantly wider audience, anyone running a mini PC, a home server, a Raspberry Pi cluster, or simply without GPU access. Given the reasoning quality this model delivers, that's a meaningful expansion of who can actually use it.

Questions:

Has CPU deployment been considered or tested?
Are there community members with CPU optimization experience who'd want to collaborate on this?
Would genuinely like to see this model reach more hardware.

Sign up or log in to comment