Socratic-SWE: Self-Evolving Coding Agents Learn From Their Own Solving Traces

Researchers have introduced Socratic-SWE, a self-evolution framework that enables coding agents to improve by learning from their own problem-solving experiences. The system, developed by Chuan Xiao and colleagues, achieves 50.40% on SWE-bench Verified after three iterations by distilling solving traces into structured skills that target the agent's specific weaknesses. This approach marks a shift from traditional methods that generate training tasks through fixed mutation procedures independent of an agent's actual failure modes.

Closed-Loop Learning From Actual Problem-Solving

Traditional self-evolution methods generate training tasks through "fixed mutation or bug-injection procedures, making the resulting distributions largely independent of the agent's own weaknesses." Socratic-SWE creates a closed-loop where agents systematically address their observed failure patterns rather than training on generic synthetic bugs.

The framework operates through five key steps:

Distilling solving traces into structured agent skills that summarize recurring failures and effective repair patterns
Using these synthesized skills to guide generation of targeted repair tasks in real repositories
Validating candidates through execution-based testing
Scoring tasks with solver-gradient alignment reward to ensure they are both verifiable and useful for improving the Solver
Feeding updated Solver traces back into the system, enabling the curriculum to adapt across successive rounds

Strong Performance Across Multiple Benchmarks

Socratic-SWE achieves 50.40% on SWE-bench Verified after three iterations and "consistently improves over self-evolving baselines under the same compute budget." The system was also evaluated on SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0, demonstrating broad applicability across different coding task types.

The key insight is that "solving traces can serve as a scalable substrate for self-evolving SWE agents." Rather than treating traces only as reward signals, the framework extracts structured knowledge about what the agent struggles with and what repair patterns prove effective.

Implications for Continuous Agent Improvement

This approach could enable continuous improvement where agents systematically address their failure modes rather than training on random mutations. By generating tasks specifically targeting observed weaknesses, Socratic-SWE creates a personalized curriculum that evolves with the agent's capabilities. The researchers demonstrate that the framework's curriculum adapts across successive training rounds, with each iteration building on insights from previous problem-solving attempts.

Key Takeaways

Socratic-SWE achieves 50.40% on SWE-bench Verified after three iterations by learning from its own solving traces
The framework distills solving traces into structured skills that identify recurring failures and effective repair patterns
Unlike traditional methods using fixed mutations, Socratic-SWE generates tasks targeting the agent's specific weaknesses
The system consistently outperforms self-evolving baselines under the same compute budget
The closed-loop approach enables continuous improvement through personalized, adaptive curricula

Closed-Loop Learning From Actual Problem-Solving

The framework operates through five key steps:

Distilling solving traces into structured agent skills that summarize recurring failures and effective repair patterns

Using these synthesized skills to guide generation of targeted repair tasks in real repositories

Validating candidates through execution-based testing

Scoring tasks with solver-gradient alignment reward to ensure they are both verifiable and useful for improving the Solver

Feeding updated Solver traces back into the system, enabling the curriculum to adapt across successive rounds

Strong Performance Across Multiple Benchmarks

Implications for Continuous Agent Improvement

Key Takeaways

Socratic-SWE achieves 50.40% on SWE-bench Verified after three iterations by learning from its own solving traces

The framework distills solving traces into structured skills that identify recurring failures and effective repair patterns

Unlike traditional methods using fixed mutations, Socratic-SWE generates tasks targeting the agent's specific weaknesses

The system consistently outperforms self-evolving baselines under the same compute budget

The closed-loop approach enables continuous improvement through personalized, adaptive curricula