zyf515730395 's Collections
Qwen3 Embedding: Advancing Text Embedding and Reranking Through
Foundation Models
Paper
• 2506.05176
• Published • 81
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged
Reinforcement Learning
Paper
• 2506.04207
• Published • 48
Paper
• 2506.03569
• Published • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual
Understanding and Generation
Paper
• 2506.03147
• Published • 58
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware
Reinforcement Learning
Paper
• 2506.01713
• Published • 48
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Paper
• 2505.24025
• Published • 27
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial
Intelligence
Paper
• 2505.23747
• Published • 69
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
• 2505.04921
• Published • 187
Seed1.5-VL Technical Report
Paper
• 2505.07062
• Published • 157
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published • 99
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published • 308
Paper
• 2504.07491
• Published • 139
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published • 86
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Paper
• 2506.09344
• Published • 31
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal
Reasoning
Paper
• 2506.16141
• Published • 27
Paper
• 2508.10104
• Published • 305
Thyme: Think Beyond Images
Paper
• 2508.11630
• Published • 81
Qwen3-Omni Technical Report
Paper
• 2509.17765
• Published • 153
Kimi K2.5: Visual Agentic Intelligence
Paper
• 2602.02276
• Published • 264
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
• 2601.02204
• Published • 63