Andrej Karpathy's Autoresearch Enables AI Agents to Improve Their Own Training Code

Andrej Karpathy released autoresearch on March 7, 2026, a framework that allows AI agents to autonomously improve machine learning training code. The system strips down LLM training to a single-file, 630-line Python codebase where humans write high-level strategy prompts while AI agents iteratively modify the training code based on performance metrics.

How Autoresearch Works

The framework operates on a simple division of labor: humans maintain a program.md file with research strategy, while an AI agent modifies train.py based on validation loss results. The system runs on a single GPU and can perform hundreds of experiments overnight, proposing changes and keeping improvements that reduce validation loss.

Karpathy's implementation includes:

Single-file architecture requiring only one GPU
Autonomous experiment execution with automatic metric tracking
Systematic testing of architectural changes, optimization tweaks, and hyperparameter adjustments
Ability to run continuously for days without human intervention

Real-World Results Demonstrate Scalability

In his own testing, Karpathy left autoresearch running for approximately two days on a depth=12 model. The system discovered around 20 changes that improved validation loss. When tested on March 9, 2026, all 20 improvements proved additive and successfully transferred to larger depth=24 models, demonstrating that discoveries made on smaller models can scale to production systems.

Shopify CEO Tobi Lütke confirmed running autoresearch internally with positive results, indicating early enterprise adoption.

Community Adoption and Extensions

The GitHub repository gained 26,237 stars within six days of its March 6, 2026 creation. The community quickly developed extensions, including autoresearch-mlx with 537 stars, which enables autonomous research on Apple Silicon Macs.

One user reported running autoresearch for over 11 hours with an AI agent positioned as "chief scientist of an AI lab with 8 GPUs," completing 568 parallel experiments with the agent autonomously deciding next steps.

Vision for Collaborative AI Research

Karpathy outlined his next-step vision on March 8, 2026: making autoresearch "asynchronously massively collaborative for agents" in a SETI@home-style distributed system. The goal shifts from emulating a single PhD student to emulating an entire research community, suggesting a future where AI agents collectively advance machine learning research.

Key Takeaways

Autoresearch is a 630-line Python framework enabling AI agents to autonomously improve ML training code on single GPUs
Karpathy's testing found 20 improvements over two days that all transferred successfully to larger models
The GitHub repository gained 26,237 stars in six days, with community ports already enabling Mac-based research
Shopify confirmed internal deployment with strong results, demonstrating enterprise readiness
Karpathy envisions scaling to a distributed network of collaborative AI research agents

How Autoresearch Works

Karpathy's implementation includes:

Single-file architecture requiring only one GPU

Autonomous experiment execution with automatic metric tracking

Systematic testing of architectural changes, optimization tweaks, and hyperparameter adjustments

Ability to run continuously for days without human intervention

Real-World Results Demonstrate Scalability

Community Adoption and Extensions

Vision for Collaborative AI Research

Key Takeaways

Autoresearch is a 630-line Python framework enabling AI agents to autonomously improve ML training code on single GPUs

Karpathy's testing found 20 improvements over two days that all transferred successfully to larger models

The GitHub repository gained 26,237 stars in six days, with community ports already enabling Mac-based research

Shopify confirmed internal deployment with strong results, demonstrating enterprise readiness

Karpathy envisions scaling to a distributed network of collaborative AI research agents