Researchers have developed AgentIR, a reasoning-aware retrieval system designed specifically for AI research agents that achieves 68% accuracy on the BrowseComp-Plus benchmark—an 18 percentage point improvement over conventional embedding models. Published March 4, 2026 on arXiv, the system jointly embeds an agent's reasoning trace alongside its query, exploiting the rich contextual information that agents naturally generate but existing retrievers ignore.
Paradigm Shift From Human-Centric to Agent-Centric Retrieval
Deep research agents differ fundamentally from human users in how they search. While humans issue and refine queries without documenting their thought processes, agents generate explicit natural language reasoning before each search call. This reasoning reveals search intent, context from previous searches, intermediate conclusions, and specific information gaps the agent is trying to fill—signals that conventional retrieval systems completely discard.
AgentIR introduces reasoning-aware retrieval as a new paradigm that treats this agent-generated reasoning as a first-class input. The system jointly embeds both the query and the full reasoning trace, allowing the retriever to understand not just what the agent is searching for, but why it needs that information and how it fits into the broader research task.
Technical Architecture and Performance
The research team—Zijian Chen, Xueguang Ma, Shengyao Zhuang, Jimmy Lin, and Akari Asai, and Victor Zhong—developed two core components:
- Reasoning-Aware Retrieval: A retrieval architecture that jointly processes agent reasoning traces and queries
- DR-Synth: A data synthesis method that generates training data for deep research retrievers from standard QA datasets
Benchmark results demonstrate substantial gains:
- AgentIR-4B with Tongyi-DeepResearch agent: 68% accuracy on BrowseComp-Plus
- Conventional embedding models (twice the size): 50% accuracy
- BM25 baseline: 37% accuracy
- Net improvement: 31 percentage points over keyword-based retrieval
Both components proved independently effective, with their combination yielding the trained AgentIR-4B embedding model. The research team has released code and data at https://texttron.github.io/AgentIR/.
Implications for Agent Infrastructure
As AI agents become primary consumers of retrieval systems rather than humans, this research signals a broader shift toward agent-native infrastructure. The substantial accuracy improvements demonstrate that systems purpose-built for agent workflows can dramatically outperform adapted human-centric tools. The approach generalizes beyond research agents to any AI system that generates reasoning traces before information-seeking actions, including customer service agents, coding assistants, and analytical systems.
Key Takeaways
- AgentIR achieves 68% accuracy on BrowseComp-Plus, outperforming conventional embeddings by 18 percentage points and BM25 by 31 points
- The system jointly embeds agent reasoning traces with queries, exploiting contextual signals that existing retrievers ignore
- DR-Synth synthesizes training data for deep research retrievers from standard QA datasets
- Research demonstrates a paradigm shift toward building retrieval systems specifically for AI agents rather than humans
- Code and data are publicly available at https://texttron.github.io/AgentIR/