ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation Paper • 2605.28293 • Published 2 days ago • 76
Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement Paper • 2605.26952 • Published 3 days ago • 12
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 4 days ago • 129
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 4 days ago • 129
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 4 days ago • 129
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published Apr 9 • 5
FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization Paper • 2601.11200 • Published Jan 16
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published Apr 9 • 5
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published Apr 9 • 5
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 121