Qwen3.5 4B GGUF (Quantized)
This repository provides a GGUF quantized version of the original Qwen3.5 4B model, optimized for efficient local inference using tools like llama.cpp, LM Studio, and similar runtimes.
🔗 Base Model
This model is derived from the official:
👉 https://huggingface.co/Qwen/Qwen3.5-4B (by Alibaba / Qwen Team)
Please refer to the original model for full details, training methodology, benchmarks, and licensing terms.
⚙️ Quantization Details
- Format: GGUF
- Quantization: Q4_K_XL
- Size: ~2.9 GB
- Architecture: Qwen3.5
This version is designed to balance performance and memory efficiency, making it suitable for local deployments.
📦 Quantization Source
This GGUF file is sourced from:
👉 https://huggingface.co/unsloth/Qwen3.5-4B-GGUF
Specifically:
- Qwen3.5-4B-UD-Q4_K_XL.gguf
All credit for quantization goes to the original uploader (Unsloth).
🚀 Usage
You can run this model locally using:
llama.cpp
./main -m qwen3.5-4b-q4_k_xl.gguf -p "Explain SQL injection"
Other tools
- LM Studio
- KoboldCpp
- Ollama
💡 Example Use Cases
- General-purpose chat
- Coding assistance
- Technical explanations
- Integration into custom AI systems (e.g., agents, tools)
🧪 Tested With
- Local inference (CPU/GPU hybrid)
- Integration with external tools (web search, reasoning pipelines)
⚠️ Disclaimer
- This is not an original model.
- Behavior and capabilities are inherited from the base Qwen3.5 model.
📜 License
- Please follow the license of the original Qwen model.
🙌 Acknowledgements
- Qwen Team (Alibaba) — Base model
- Unsloth — GGUF quantization
- llama.cpp — GGUF runtime support
🌐 Related Project
This model is used in:
👉 CyberGuard AI (Cybersecurity assistant system)
- Hosted on huggingface spaces
- Downloads last month
- 71
4-bit