Co-pi-tree Distills LLM Reasoning Into Interpretable Policy Trees for Human-AI Collaboration

Researchers have introduced Co-pi-tree, a method that distills large language model reasoning into interpretable policy trees for human-AI collaboration. Published on arXiv by Beiwen Zhang and colleagues, the approach addresses critical limitations in existing collaboration systems by combining LLM reasoning quality with efficient execution and full interpretability.

Traditional Approaches Fall Short on Interpretability and Efficiency

Current methods for human-AI collaboration face two major problems. Multi-agent reinforcement learning produces black-box policies that lack interpretability and raise safety concerns. Meanwhile, querying LLMs at every decision step causes slow responses and prohibitively high inference costs for real-time collaboration tasks.

Co-pi-tree Creates Self-Improving Policy Trees Through Closed-Loop Learning

The Co-pi-tree framework constructs executable policy trees consisting of partner-behavior prediction and agent-action selection components. The system distills LLM reasoning into policy tree code, evaluates performance through partner interaction, and uses natural language feedback to improve problematic branches. This closed-loop process continues until performance requirements are met, enabling the system to self-improve without requiring LLM queries during execution.

Experimental Results Show 97% Latency Reduction With 35% Performance Gains

Testing in the Overcooked-AI environment demonstrated substantial improvements across multiple metrics:

Average reward increased by 35.4% over baseline
LLM queries reduced by 77.7%
Test-time latency decreased by 97.1%
Policies remained fully interpretable as explicit code

The approach successfully combines symbolic reasoning with neural capabilities, using LLMs to generate interpretable policies rather than directly controlling agents. The resulting policy trees can be inspected, debugged, and modified by human developers while maintaining the reasoning quality of large language models.

Key Takeaways

Co-pi-tree distills LLM reasoning into interpretable policy tree code that executes efficiently without requiring queries at every decision step
The closed-loop method achieves 35.4% higher rewards while reducing LLM queries by 77.7% and test-time latency by 97.1% in the Overcooked-AI environment
Policy trees consist of partner-behavior prediction and agent-action selection components that can be inspected and modified as explicit code
The system self-improves by analyzing interaction feedback and updating problematic branches using natural language summaries
This paradigm shift enables human-AI collaboration systems that combine LLM reasoning quality with execution efficiency and full interpretability

Traditional Approaches Fall Short on Interpretability and Efficiency

Co-pi-tree Creates Self-Improving Policy Trees Through Closed-Loop Learning

Experimental Results Show 97% Latency Reduction With 35% Performance Gains

Testing in the Overcooked-AI environment demonstrated substantial improvements across multiple metrics:

Average reward increased by 35.4% over baseline

LLM queries reduced by 77.7%

Test-time latency decreased by 97.1%

Policies remained fully interpretable as explicit code

Key Takeaways

Co-pi-tree distills LLM reasoning into interpretable policy tree code that executes efficiently without requiring queries at every decision step

The closed-loop method achieves 35.4% higher rewards while reducing LLM queries by 77.7% and test-time latency by 97.1% in the Overcooked-AI environment

Policy trees consist of partner-behavior prediction and agent-action selection components that can be inspected and modified as explicit code

The system self-improves by analyzing interaction feedback and updating problematic branches using natural language summaries

This paradigm shift enables human-AI collaboration systems that combine LLM reasoning quality with execution efficiency and full interpretability