Collections
Discover the best community collections!
Collections including paper arxiv:2604.04707
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Paper • 2602.20161 • Published • 23 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper • 2603.21986 • Published • 123 -
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Paper • 2604.04184 • Published • 50
-
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Paper • 2511.09515 • Published • 20 -
Robot Learning from a Physical World Model
Paper • 2511.07416 • Published • 32 -
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
Paper • 2512.02425 • Published • 25 -
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents
Paper • 2512.14014 • Published • 3
-
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 103 -
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper • 2510.26692 • Published • 132 -
Natural-Language Agent Harnesses
Paper • 2603.25723 • Published • 25 -
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
Paper • 2604.01658 • Published • 54
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 11 -
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Paper • 2510.00406 • Published • 67 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 109 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 5
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 18 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
-
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper • 2510.26692 • Published • 132 -
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 144 -
Believe Your Model: Distribution-Guided Confidence Calibration
Paper • 2603.03872 • Published • 40 -
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
Paper • 2604.04707 • Published • 200
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 109 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 5
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Paper • 2602.20161 • Published • 23 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper • 2603.21986 • Published • 123 -
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Paper • 2604.04184 • Published • 50
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 18 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
-
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Paper • 2511.09515 • Published • 20 -
Robot Learning from a Physical World Model
Paper • 2511.07416 • Published • 32 -
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
Paper • 2512.02425 • Published • 25 -
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents
Paper • 2512.14014 • Published • 3
-
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper • 2510.26692 • Published • 132 -
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 144 -
Believe Your Model: Distribution-Guided Confidence Calibration
Paper • 2603.03872 • Published • 40 -
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
Paper • 2604.04707 • Published • 200
-
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 103 -
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper • 2510.26692 • Published • 132 -
Natural-Language Agent Harnesses
Paper • 2603.25723 • Published • 25 -
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
Paper • 2604.01658 • Published • 54
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 11 -
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Paper • 2510.00406 • Published • 67 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7