Agentic Critical Training Teaches LLM Agents to Reason About Action Quality

Researchers have introduced Agentic Critical Training (ACT), a reinforcement learning approach that teaches language model agents to autonomously reason about action quality rather than simply imitating expert behavior. Published on arXiv on March 9, 2026, the method shows an average 5.07-point improvement over traditional imitation learning across three challenging agent benchmarks.

ACT Addresses Fundamental Limitations of Imitation Learning

Training large language models as autonomous agents typically begins with imitation learning, but this approach "only teaches agents what to do without understanding why," according to the researchers. Agents trained through imitation never contrast successful actions against suboptimal alternatives and thus lack awareness of action quality. While recent approaches attempt to introduce self-reflection through expert-alternative action comparisons, the training paradigm remains imitation-based—models imitate pre-constructed reflection text rather than learning to reason autonomously.

Reinforcement Learning Paradigm Rewards Critical Judgment

ACT fundamentally changes the training approach by treating action evaluation as a reinforcement learning problem. The system trains agents to identify the better action among alternatives, rewarding them when their judgment is correct. This drives models to autonomously develop reasoning about action quality, producing what researchers call "genuine self-reflection rather than imitating it." Unlike imitation-based approaches, ACT enables agents to learn the underlying principles that make certain actions superior.

Performance Gains Across Multiple Benchmarks

The researchers evaluated ACT on three challenging agent benchmarks: ALFWorld, WebShop, and ScienceWorld. Results show:

5.07-point average improvement over imitation learning
4.62-point improvement over standard reinforcement learning
2.42-point improvement compared to knowledge distillation approaches for reflection capability
Strong out-of-distribution generalization on agentic tasks
Improved performance on general reasoning benchmarks without reasoning-specific training data

Discrimination-Based Learning Creates More Capable Agents

The key insight behind ACT is that agents develop critical reasoning by learning to discriminate between expert and alternative actions through reinforcement signals. This discrimination-based approach enables agents to understand not just what actions to take, but why certain actions are superior. The method produces agents that can generalize their reasoning capabilities beyond the specific tasks they were trained on, suggesting a more fundamental understanding of action quality.

Path Toward Reflective Autonomous Agents

The researchers position ACT as "a promising path toward developing more reflective and capable LLM agents." By shifting from imitation to discrimination-based reasoning development, the paradigm addresses a fundamental limitation in how autonomous agents are currently trained. The ability to improve general reasoning performance without task-specific data suggests that learning to critically evaluate actions creates transferable reasoning skills applicable across diverse domains.

Key Takeaways

ACT uses reinforcement learning to teach agents to discriminate between good and bad actions rather than imitating expert behavior
The method achieves a 5.07-point average improvement over imitation learning across ALFWorld, WebShop, and ScienceWorld benchmarks
ACT outperforms standard reinforcement learning by 4.62 points and knowledge distillation approaches by 2.42 points
Agents trained with ACT show strong generalization to out-of-distribution tasks and improved general reasoning without domain-specific training
The paradigm enables genuine self-reflection about action quality rather than imitation of pre-written reflection text

ACT Addresses Fundamental Limitations of Imitation Learning

Reinforcement Learning Paradigm Rewards Critical Judgment

Performance Gains Across Multiple Benchmarks

The researchers evaluated ACT on three challenging agent benchmarks: ALFWorld, WebShop, and ScienceWorld. Results show:

5.07-point average improvement over imitation learning

4.62-point improvement over standard reinforcement learning

2.42-point improvement compared to knowledge distillation approaches for reflection capability

Strong out-of-distribution generalization on agentic tasks

Improved performance on general reasoning benchmarks without reasoning-specific training data

Discrimination-Based Learning Creates More Capable Agents

Path Toward Reflective Autonomous Agents

Key Takeaways

ACT uses reinforcement learning to teach agents to discriminate between good and bad actions rather than imitating expert behavior

The method achieves a 5.07-point average improvement over imitation learning across ALFWorld, WebShop, and ScienceWorld benchmarks

ACT outperforms standard reinforcement learning by 4.62 points and knowledge distillation approaches by 2.42 points

Agents trained with ACT show strong generalization to out-of-distribution tasks and improved general reasoning without domain-specific training

The paradigm enables genuine self-reflection about action quality rather than imitation of pre-written reflection text