Researchers Yoonsang Lee, Howard Yen, Xi Ye, and Danqi Chen have developed AggAgent, a novel approach to parallel test-time scaling that treats multiple agent trajectories as an explorable environment. Published on arXiv on April 13, 2026, the method achieves an average 5.3% absolute improvement across all tasks and 10.3% on deep research tasks, outperforming all existing aggregation methods.
AggAgent Addresses Critical Limitations in Parallel Test-Time Scaling
Parallel test-time scaling generates multiple agent rollouts simultaneously and aggregates them into a final response. While this approach works well for chain-of-thought reasoning, it faces unique challenges for agentic tasks where trajectories are long, multi-turn, and tool-augmented. Traditional aggregation methods either discard rich trajectory information by using only final answers or exceed context window limits when concatenating all trajectories. AggAgent solves this by using an aggregation agent equipped with lightweight tools to inspect candidate solutions and search across trajectories, enabling it to navigate and synthesize information on demand.
Framework Achieves 10.3% Improvement on Deep Research Tasks
The researchers tested AggAgent across 6 benchmarks and three model families: GLM-4.7, Qwen3.5, and MiniMax-M2.5. The method achieved an average improvement of 5.3% absolute across all tasks, with the strongest performance on deep research tasks at 10.3% improvement. The framework outperformed all existing aggregation methods while adding minimal overhead, as the aggregation cost remains bounded by a single agentic rollout rather than scaling with the number of trajectories.
Selective Inspection Replaces Naive Concatenation
Rather than attempting to fit all trajectories into context, AggAgent uses tools to selectively inspect and navigate the trajectory space. This represents a shift from naive aggregation approaches that concatenate everything or simple voting methods that use only final answers. The tool-augmented exploration enables intelligent synthesis of information from multiple long-horizon agentic tasks such as agentic search and deep research.
Method Provides Cost-Efficient Scaling for Long-Horizon Tasks
The researchers conclude that agentic aggregation establishes an effective and cost-efficient approach to parallel test-time scaling. Because the aggregation cost is equivalent to running one additional trajectory rather than N times more, the method scales efficiently even when combining many parallel rollouts. This efficiency is particularly valuable for long-horizon agentic tasks where individual trajectories can be extensive and expensive.
Key Takeaways
- AggAgent achieves 5.3% average improvement across all tasks and 10.3% improvement specifically on deep research tasks
- The method treats parallel agent trajectories as an explorable environment, using lightweight tools to selectively inspect and synthesize information
- Aggregation cost remains bounded by a single agentic rollout, making the approach cost-efficient regardless of the number of parallel trajectories
- AggAgent outperforms all existing aggregation methods across 6 benchmarks and three model families (GLM-4.7, Qwen3.5, MiniMax-M2.5)
- The framework addresses unique challenges in agentic tasks where trajectories are long, multi-turn, and tool-augmented, avoiding context window limitations