EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 4 days ago • 44
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published Aug 19, 2025 • 14
Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions Paper • 2312.12299 • Published Dec 19, 2023
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators Paper • 2403.16950 • Published Mar 25, 2024 • 4
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models Paper • 2406.13975 • Published Jun 20, 2024