MLEvolve Tops MLE-Bench: Self-Evolving Framework Achieves SOTA in ML Algorithm Discovery

Researchers from InternScience introduced MLEvolve on June 4, 2026, a self-evolving multi-agent framework for automated machine learning algorithm discovery. The system achieved the top ranking on MLE-Bench under a 12-hour budget as of February 14, 2026, using only half the standard runtime while outperforming specialized algorithm discovery methods across multiple domains.

Progressive MCGS Search Enables Cross-Branch Learning

MLEvolve extends traditional tree search to Progressive Monte Carlo Graph Search, introducing cross-branch information flow through graph-based reference edges. The system gradually shifts from broad exploration to focused exploitation using an entropy-inspired progressive schedule. This approach addresses a key limitation of existing machine learning engineering agents: inter-branch information isolation that prevents learning from parallel exploration paths.

Retrospective Memory Accumulates Task-Specific Experience

The framework introduces Retrospective Memory, combining a cold-start domain knowledge base with dynamic global memory for task-specific experience retrieval and reuse. Unlike memoryless search approaches that start fresh with each task, Retrospective Memory allows MLEvolve to accumulate heuristics and domain-specific patterns across multiple algorithm discovery attempts. This accumulated experience enables the agent's performance to improve as it encounters more tasks.

Decoupled Planning and Coding Stabilizes Long-Horizon Iteration

For stable long-horizon iteration, MLEvolve decouples strategic planning from code generation with adaptive coding modes. This hierarchical control structure separates high-level algorithmic strategy from implementation details, reducing the brittleness that often affects end-to-end code generation in complex algorithm design tasks. The decoupling enables the system to maintain coherent long-term strategy while adapting implementation tactics based on intermediate results.

Cross-Domain Generalization Beyond ML Benchmarks

MLEvolve demonstrated strong cross-domain generalization, outperforming AlphaEvolve on mathematical algorithm optimization tasks despite being trained primarily on machine learning engineering challenges. The system achieved state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate on MLE-Bench. This suggests the approach could extend beyond machine learning to other algorithmic domains requiring iterative refinement and discovery.

Key Takeaways

MLEvolve achieved the top ranking on MLE-Bench under a 12-hour budget as of February 14, 2026, using half the standard runtime
Progressive MCGS Search introduces graph-based reference edges for cross-branch information flow and entropy-inspired exploration-exploitation scheduling
Retrospective Memory combines cold-start domain knowledge with dynamic global memory, enabling experience accumulation across tasks
The framework decouples strategic planning from code generation with adaptive coding modes for stable long-horizon algorithm discovery
MLEvolve demonstrated cross-domain generalization by outperforming AlphaEvolve on mathematical algorithm optimization tasks

Progressive MCGS Search Enables Cross-Branch Learning

Retrospective Memory Accumulates Task-Specific Experience

Decoupled Planning and Coding Stabilizes Long-Horizon Iteration

Cross-Domain Generalization Beyond ML Benchmarks

Key Takeaways

MLEvolve achieved the top ranking on MLE-Bench under a 12-hour budget as of February 14, 2026, using half the standard runtime

Progressive MCGS Search introduces graph-based reference edges for cross-branch information flow and entropy-inspired exploration-exploitation scheduling

Retrospective Memory combines cold-start domain knowledge with dynamic global memory, enabling experience accumulation across tasks

The framework decouples strategic planning from code generation with adaptive coding modes for stable long-horizon algorithm discovery

MLEvolve demonstrated cross-domain generalization by outperforming AlphaEvolve on mathematical algorithm optimization tasks