Researchers have introduced Agentic Critical Training (ACT), a reinforcement learning approach that teaches language model agents to autonomously reason about action quality rather than simply imitating expert behavior. Published on arXiv on March 9, 2026, the method shows an average 5.07-point improvement over traditional imitation learning across three challenging agent benchmarks.
ACT Addresses Fundamental Limitations of Imitation Learning
Training large language models as autonomous agents typically begins with imitation learning, but this approach "only teaches agents what to do without understanding why," according to the researchers. Agents trained through imitation never contrast successful actions against suboptimal alternatives and thus lack awareness of action quality. While recent approaches attempt to introduce self-reflection through expert-alternative action comparisons, the training paradigm remains imitation-based—models imitate pre-constructed reflection text rather than learning to reason autonomously.
Reinforcement Learning Paradigm Rewards Critical Judgment
ACT fundamentally changes the training approach by treating action evaluation as a reinforcement learning problem. The system trains agents to identify the better action among alternatives, rewarding them when their judgment is correct. This drives models to autonomously develop reasoning about action quality, producing what researchers call "genuine self-reflection rather than imitating it." Unlike imitation-based approaches, ACT enables agents to learn the underlying principles that make certain actions superior.
Performance Gains Across Multiple Benchmarks
The researchers evaluated ACT on three challenging agent benchmarks: ALFWorld, WebShop, and ScienceWorld. Results show:
- 5.07-point average improvement over imitation learning
- 4.62-point improvement over standard reinforcement learning
- 2.42-point improvement compared to knowledge distillation approaches for reflection capability
- Strong out-of-distribution generalization on agentic tasks
- Improved performance on general reasoning benchmarks without reasoning-specific training data
Discrimination-Based Learning Creates More Capable Agents
The key insight behind ACT is that agents develop critical reasoning by learning to discriminate between expert and alternative actions through reinforcement signals. This discrimination-based approach enables agents to understand not just what actions to take, but why certain actions are superior. The method produces agents that can generalize their reasoning capabilities beyond the specific tasks they were trained on, suggesting a more fundamental understanding of action quality.
Path Toward Reflective Autonomous Agents
The researchers position ACT as "a promising path toward developing more reflective and capable LLM agents." By shifting from imitation to discrimination-based reasoning development, the paradigm addresses a fundamental limitation in how autonomous agents are currently trained. The ability to improve general reasoning performance without task-specific data suggests that learning to critically evaluate actions creates transferable reasoning skills applicable across diverse domains.
Key Takeaways
- ACT uses reinforcement learning to teach agents to discriminate between good and bad actions rather than imitating expert behavior
- The method achieves a 5.07-point average improvement over imitation learning across ALFWorld, WebShop, and ScienceWorld benchmarks
- ACT outperforms standard reinforcement learning by 4.62 points and knowledge distillation approaches by 2.42 points
- Agents trained with ACT show strong generalization to out-of-distribution tasks and improved general reasoning without domain-specific training
- The paradigm enables genuine self-reflection about action quality rather than imitation of pre-written reflection text