Papers
arxiv:2605.27881

Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?

Published on May 27
Authors:
,
,
,
,

Abstract

A controlled study examines key dimensions affecting search agent performance, revealing that data coverage issues have greater impact than training algorithm differences and that simple outcome-based rewards often outperform complex process-based methods.

AI-generated summary

Search agents powered by large language models can autonomously decompose queries, retrieve information, and synthesize answers through multi-step reasoning. However, the rapid growth of training methods has outpaced controlled comparison: existing works differ in retrieval corpora, reward designs, and training protocols, making it unclear what actually drives improvements. We present a controlled empirical study that isolates three under-explored dimensions of search agent training. First, we identify a critical data-coverage issue in the widely used Wikipedia 2018 corpus and show that correcting it alone yields larger gains than the differences between training algorithms. Second, we systematically compare outcome-based and process-based reward methods across three base models, finding that the simplest outcome-based approach achieves competitive or superior performance in most settings, and that process-level credit assignment can over-correct agent behavior. Third, we analyze training data diversity, off-policy data utilization, and search budget scaling, distilling practical guidelines for training effective search agents. Our code is available at https://github.com/YiboZhao624/SearchAgentReview.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.27881
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.27881 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.27881 in a Space README.md to link it from this page.

Collections including this paper 1