LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on
Challenging Queries
Paper
• 2508.15760
• Published • 47
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Paper
• 2508.01780
• Published • 21
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
Paper
• 2304.08244
• Published • 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published • 162
Memp: Exploring Agent Procedural Memory
Paper
• 2508.06433
• Published • 36
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models
Paper
• 2507.12806
• Published • 21
Survey on Evaluation of LLM-based Agents
Paper
• 2503.16416
• Published • 96
AgentBench: Evaluating LLMs as Agents
Paper
• 2308.03688
• Published • 26
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Paper
• 2502.11221
• Published • 1
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes
Paper
• 2506.14728
• Published
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First
Paper
• 2509.00997
• Published • 2
Small Language Models are the Future of Agentic AI
Paper
• 2506.02153
• Published • 24
MCP-AgentBench: Evaluating Real-World Language Agent Performance with
MCP-Mediated Tools
Paper
• 2509.09734
• Published • 16
Paper
• 2509.10147
• Published • 27
ReAct: Synergizing Reasoning and Acting in Language Models
Paper
• 2210.03629
• Published • 34
ARE: Scaling Up Agent Environments and Evaluations
Paper
• 2509.17158
• Published • 36
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP
Use
Paper
• 2509.24002
• Published • 179