Steve-Evolving: Self-Evolving Minecraft Agent Masters Long-Horizon Tasks Through Diagnostic Learning

A team of researchers including Zhengwei Xie, Zhisheng Chen, Ziyan Weng, Tingyu Wu, Chenglong Li, Vireo Zhang, and Kun Wang published Steve-Evolving on arXiv on March 13, 2026 (arXiv:2603.13131). The framework addresses a critical insight: the main bottleneck in long-horizon embodied AI tasks is not single-step planning quality but how interaction experience is organized and evolved over time.

Three-Phase Architecture Enables Continual Learning Without Parameter Updates

Steve-Evolving implements a three-phase framework that achieves self-evolution without updating model parameters:

Experience Anchoring solidifies each subgoal attempt into structured experience tuples with a fixed schema containing pre-state, action, diagnosis-result, and post-state. The system organizes experiences in a three-tier space with multi-dimensional indices including condition signatures, spatial hashing, and semantic tags, plus rolling summarization for efficient retrieval.

Experience Distillation generalizes successful trajectories into reusable skills with explicit preconditions and verification criteria. Failed attempts are distilled into executable guardrails that capture root causes and prevent risky operations at both subgoal and task levels.

Knowledge-Driven Closed-Loop Control injects retrieved skills and guardrails into the LLM planner. Diagnosis-triggered local replanning updates active constraints online, creating a continual evolution process.

Fine-Grained Diagnosis Signals Drive Effective Learning

The framework's key innovation is its fine-grained diagnosis system. Beyond binary success/failure signals, the execution layer provides compositional diagnosis including state-difference summaries, enumerated failure causes, continuous indicators, and stagnation/loop detection. This information density enables effective attribution and learning from failed attempts.

Experiments on the Minecraft MCU long-horizon suite demonstrate consistent improvements over static-retrieval baselines. The non-parametric approach allows the agent to continuously improve through experience without requiring model retraining, making it practical for open-world embodied AI applications.

Key Takeaways

Steve-Evolving published on arXiv March 13, 2026 as non-parametric self-evolution framework for Minecraft agents
Framework identifies experience organization rather than single-step planning as the key bottleneck in long-horizon tasks
Three-phase architecture includes experience anchoring, distillation, and knowledge-driven closed-loop control
Fine-grained diagnosis system provides compositional signals beyond binary success/failure outcomes
Experiments on Minecraft MCU show consistent improvements over static-retrieval baselines without parameter updates

Three-Phase Architecture Enables Continual Learning Without Parameter Updates

Steve-Evolving implements a three-phase framework that achieves self-evolution without updating model parameters:

Fine-Grained Diagnosis Signals Drive Effective Learning

Key Takeaways

Steve-Evolving published on arXiv March 13, 2026 as non-parametric self-evolution framework for Minecraft agents

Framework identifies experience organization rather than single-step planning as the key bottleneck in long-horizon tasks

Three-phase architecture includes experience anchoring, distillation, and knowledge-driven closed-loop control

Fine-grained diagnosis system provides compositional signals beyond binary success/failure outcomes

Experiments on Minecraft MCU show consistent improvements over static-retrieval baselines without parameter updates