Recursive Agent Optimization: Training Agents That Spawn Sub-Agents for Divide-and-Conquer Problem Solving

Researchers have introduced Recursive Agent Optimization (RAO), a reinforcement learning approach that trains AI agents to recursively spawn and delegate tasks to copies of themselves. Published on arXiv on May 7, 2026 by Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, and Graham Neubig, RAO enables agents to scale to problems beyond their context window and generalize to tasks harder than those encountered during training.

Recursive Agents Implement Inference-Time Divide-and-Conquer

The core concept behind RAO is training agents that can decompose problems and delegate subtasks to new instances of themselves recursively. This creates a natural inference-time scaling algorithm where complex problems are broken down into manageable pieces, similar to how divide-and-conquer algorithms work in traditional computer science—but learned through reinforcement learning rather than explicitly programmed.

RAO teaches agents when and how to delegate effectively, along with how to communicate between parent and child agent instances. This learned delegation capability enables agents to tackle tasks that exceed their individual context windows by distributing work across multiple recursive instances.

RAO Improves Training Efficiency and Generalization

Agents trained with RAO demonstrate better training efficiency compared to standard single-agent approaches. The framework enables scaling to tasks beyond the model's context window limitations and generalization to problems significantly more difficult than training examples. Additionally, recursive agents can achieve reduced wall-clock time compared to single-agent systems when subtasks are executed in parallel.

The research represents a novel training paradigm that aligns agent capabilities with recursive problem decomposition strategies. Rather than requiring agents to solve increasingly complex problems monolithically, RAO enables them to learn meta-strategies for breaking down complexity.

Implications for Scaling Agent Capabilities

RAO addresses a fundamental scaling challenge in agentic AI: as tasks grow in complexity and length, single-agent approaches face hard limits from context windows and computational constraints. By enabling learned recursive decomposition, RAO provides a path toward handling arbitrarily complex tasks through hierarchical delegation rather than ever-larger models.

The framework's ability to generalize beyond training difficulty suggests that agents learn genuine problem decomposition strategies rather than task-specific solutions. This has implications for developing more capable and flexible AI systems that can adapt to novel challenges through learned structural reasoning.

Key Takeaways

RAO trains agents to recursively spawn sub-agents and delegate subtasks through reinforcement learning
Recursive agents can scale to tasks beyond their context window by distributing work across instances
Agents trained with RAO show better training efficiency and generalize to harder problems than seen during training
Parallel recursion can reduce wall-clock time compared to single-agent approaches
The framework enables learned divide-and-conquer strategies rather than requiring explicit programming

Recursive Agents Implement Inference-Time Divide-and-Conquer

RAO Improves Training Efficiency and Generalization

Implications for Scaling Agent Capabilities

Key Takeaways

RAO trains agents to recursively spawn sub-agents and delegate subtasks through reinforcement learning

Recursive agents can scale to tasks beyond their context window by distributing work across instances

Agents trained with RAO show better training efficiency and generalize to harder problems than seen during training

Parallel recursion can reduce wall-clock time compared to single-agent approaches

The framework enables learned divide-and-conquer strategies rather than requiring explicit programming