FastVLM Collection Efficient Vision Encoding for Vision Language Models • 8 items • Updated Mar 2 • 112
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 380k • 1.6k
Runtime error Agents Featured 2.02k Chat With Janus-Pro-7B 🌍 2.02k A unified multimodal understanding and generation model.
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Text Generation • 2B • Updated Feb 24, 2025 • 464k • • 1.5k