Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published 2 days ago • 2
FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization Paper • 2601.11200 • Published Jan 16
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published 2 days ago • 2
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published 2 days ago • 2
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 230
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 119
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 122
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 122
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 122 • 2
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 238
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28, 2025 • 37 • 2