Andrej Karpathy's Autoresearch Enables AI Agents to Run 100+ ML Experiments Overnight on Single GPU

Autoresearch Runs 100+ Experiments Per Night on Consumer Hardware

The 630-line codebase consists of three key components: prepare.py (fixed utilities), train.py (a single editable file containing the model, optimizer, and training loop), and program.md (markdown instructions guiding the agent's research strategy). Training runs execute in exactly 5 minutes regardless of hardware, enabling approximately 12 experiments per hour or roughly 100 overnight on consumer GPUs.

The system operates by having AI agents modify training code, run 5-minute experiments, evaluate results using validation bits-per-byte (val_bpb) as the metric, then decide whether to keep or discard changes before repeating the cycle. One community member reported running 83 experiments overnight with 15 improvements kept, achieving autonomous validation loss improvements without human intervention.

Built on Simplified Training Framework, Works on Any NVIDIA GPU

Autoresearch is based on nanochat, a simplified LLM training framework, and works on any NVIDIA GPU despite being tested on H100 hardware. The project requires only PyTorch and minimal dependencies, dramatically lowering the barrier from multi-thousand-dollar compute clusters to consumer-grade hardware.

Karpathy, an OpenAI founding member and former Tesla AI head, designed the project to democratize AI research. The README opens with a fictional vision of "autonomous swarms of AI agents running across compute cluster megastructures in the skies"—prompting one user to note that "the fiction part: megastructures in the skies. the non-fiction part: everything else."

Community Creates Platform-Specific Forks for Broader Access

The community has already created five platform-specific forks, including autoresearch-mlx with 350 stars, which ports the system to Apple Silicon using MLX, enabling Mac users to run experiments without PyTorch. The project's MIT license encourages such adaptations.

A Hacker News discussion garnered 189 points and 51 comments, reflecting strong technical community engagement. Community members highlighted the shift from centralized, well-funded labs to decentralized research: "The best AI labs of 2027 might be just 2-3 people."

Key Takeaways

Autoresearch enables AI agents to autonomously run ~100 ML experiments overnight on single consumer GPUs, with 5-minute training cycles allowing ~12 experiments per hour
The 630-line open-source codebase gained 14,826 GitHub stars within 3 days of its March 6, 2026 release
Created by Andrej Karpathy (OpenAI founding member, former Tesla AI head) to democratize AI research by lowering compute requirements from thousand-dollar clusters to consumer hardware
Community members reported successful autonomous research loops, with one user running 83 experiments overnight and keeping 15 improvements with zero human intervention
Five platform-specific forks emerged including autoresearch-mlx (350 stars) enabling Apple Silicon users to run experiments without PyTorch

Autoresearch Runs 100+ Experiments Per Night on Consumer Hardware

Built on Simplified Training Framework, Works on Any NVIDIA GPU

Community Creates Platform-Specific Forks for Broader Access

Key Takeaways

Autoresearch enables AI agents to autonomously run ~100 ML experiments overnight on single consumer GPUs, with 5-minute training cycles allowing ~12 experiments per hour

The 630-line open-source codebase gained 14,826 GitHub stars within 3 days of its March 6, 2026 release

Created by Andrej Karpathy (OpenAI founding member, former Tesla AI head) to democratize AI research by lowering compute requirements from thousand-dollar clusters to consumer hardware

Community members reported successful autonomous research loops, with one user running 83 experiments overnight and keeping 15 improvements with zero human intervention

Five platform-specific forks emerged including autoresearch-mlx (350 stars) enabling Apple Silicon users to run experiments without PyTorch