Researchers have introduced Socratic-SWE, a self-evolution framework that enables coding agents to improve by learning from their own problem-solving experiences. The system, developed by Chuan Xiao and colleagues, achieves 50.40% on SWE-bench Verified after three iterations by distilling solving traces into structured skills that target the agent's specific weaknesses. This approach marks a shift from traditional methods that generate training tasks through fixed mutation procedures independent of an agent's actual failure modes.
Closed-Loop Learning From Actual Problem-Solving
Traditional self-evolution methods generate training tasks through "fixed mutation or bug-injection procedures, making the resulting distributions largely independent of the agent's own weaknesses." Socratic-SWE creates a closed-loop where agents systematically address their observed failure patterns rather than training on generic synthetic bugs.
The framework operates through five key steps:
- Distilling solving traces into structured agent skills that summarize recurring failures and effective repair patterns
- Using these synthesized skills to guide generation of targeted repair tasks in real repositories
- Validating candidates through execution-based testing
- Scoring tasks with solver-gradient alignment reward to ensure they are both verifiable and useful for improving the Solver
- Feeding updated Solver traces back into the system, enabling the curriculum to adapt across successive rounds
Strong Performance Across Multiple Benchmarks
Socratic-SWE achieves 50.40% on SWE-bench Verified after three iterations and "consistently improves over self-evolving baselines under the same compute budget." The system was also evaluated on SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0, demonstrating broad applicability across different coding task types.
The key insight is that "solving traces can serve as a scalable substrate for self-evolving SWE agents." Rather than treating traces only as reward signals, the framework extracts structured knowledge about what the agent struggles with and what repair patterns prove effective.
Implications for Continuous Agent Improvement
This approach could enable continuous improvement where agents systematically address their failure modes rather than training on random mutations. By generating tasks specifically targeting observed weaknesses, Socratic-SWE creates a personalized curriculum that evolves with the agent's capabilities. The researchers demonstrate that the framework's curriculum adapts across successive training rounds, with each iteration building on insights from previous problem-solving attempts.
Key Takeaways
- Socratic-SWE achieves 50.40% on SWE-bench Verified after three iterations by learning from its own solving traces
- The framework distills solving traces into structured skills that identify recurring failures and effective repair patterns
- Unlike traditional methods using fixed mutations, Socratic-SWE generates tasks targeting the agent's specific weaknesses
- The system consistently outperforms self-evolving baselines under the same compute budget
- The closed-loop approach enables continuous improvement through personalized, adaptive curricula