MLLM/LLM - a zyf515730395 Collection

zyf515730395 's Collections

Image Generation

Video Generation

MLLM/LLM

updated Feb 10

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 81
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4, 2025 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3, 2025 • 58
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Paper • 2506.01713 • Published Jun 2, 2025 • 48
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

Paper • 2505.24025 • Published May 29, 2025 • 27
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Paper • 2505.23747 • Published May 29, 2025 • 69
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8, 2025 • 187
Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11, 2025 • 157
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 99
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 86
Ming-Omni: A Unified Multimodal Model for Perception and Generation

Paper • 2506.09344 • Published Jun 11, 2025 • 31
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

Paper • 2506.16141 • Published Jun 19, 2025 • 27
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305
Thyme: Think Beyond Images

Paper • 2508.11630 • Published Aug 15, 2025 • 81
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153
Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 264
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 63