π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 9 days ago • 99
Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback Paper • 2605.17448 • Published 11 days ago • 17
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models Paper • 2605.23345 • Published 6 days ago • 12
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery Paper • 2605.23204 • Published 6 days ago • 25
Evolution Fine-Tuning Collection Internalizing Discovery Capability into LLM • 5 items • Updated 4 days ago
Forecasting Scientific Progress with Artificial Intelligence Paper • 2605.22681 • Published 7 days ago • 40
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 10 days ago • 48
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 9 days ago • 57
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 9 days ago • 182
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Paper • 2605.20668 • Published 8 days ago • 11
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale Paper • 2605.14445 • Published 14 days ago • 20
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 17 days ago • 45
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 15 days ago • 157
CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models Paper • 2605.08735 • Published 19 days ago • 69