Nanocode: Train Your Own Claude Code Model for $200 on TPUs

Salman Mohammadi released nanocode on GitHub, an open-source library showing the complete pipeline for training agentic coding models using Constitutional AI—the alignment technique Anthropic uses for Claude. The project demonstrates how to train competitive coding models for $200 on Google TPUs, making sophisticated AI model training accessible to researchers and indie developers.

Complete Training Pipeline, Not Just a Pre-Trained Model

As the creator states, nanocode is "a library showing you how to train your own Claude Code end-to-end"—not just a pre-trained model but a complete training framework demonstrating the full pipeline from data preparation through evaluation. Built entirely in JAX and optimized for TPUs, the library provides transparency into training techniques that are typically proprietary.

Competitive Performance at $200 Training Cost

The nanocode-d24 model (1.3B parameters) costs $200 to train on TPU v6e-8 over 9 hours, achieving a CORE metric (general reasoning) of 0.227 and 50.9% better code tokenization efficiency than comparable nanochat baseline. A smaller variant, nanocode-d20 (477M parameters), trains for just $34 in 1.5 hours.

Constitutional AI Implementation in Three Stages

The project implements three training stages:

Constitutional SFT: Synthetic data generation using generator-critique loops to align responses with a "SOUL" document defining desired behavior
Agentic SFT: Training on approximately 120K examples from existing code instruction datasets, plus 2,000 complex multi-turn rollouts demonstrating tool use
DPO (Direct Preference Optimization): Preference learning that improved accuracy from 45% to 88%

Custom Agentic Interface with Four Tools

The model features a 4096 token context length (versus nanochat's 2048), extended with code-specific data from The Stack-V2. The agentic interface includes four tools: Read files (with offset/limit parameters), Edit files (using old_string/new_string), Grep for patterns, and Bash for Unix commands.

Free TPU Access Through Google TRC Program

Nanocode makes sophisticated AI model training accessible through free TPU credits via Google's TPU Research Cloud (TRC) program. The pure JAX implementation demonstrates practical advantages for production AI training, showing that competitive models can be trained outside of PyTorch ecosystems.

Community Reception and Educational Value

The project reached 154 points on Hacker News with 24 comments. By providing complete end-to-end code showing every step from data preparation through evaluation, nanocode enables researchers and indie developers to experiment with Constitutional AI methods that are typically proprietary, advancing understanding of alignment techniques.

Key Takeaways

Nanocode demonstrates complete Constitutional AI training pipeline for agentic coding models, not just a pre-trained model
The 1.3B parameter model costs $200 to train on TPU v6e-8, with smaller 477M variant at $34
Three-stage training (Constitutional SFT, Agentic SFT, DPO) improved accuracy from 45% to 88%
Free TPU credits available through Google TRC program make advanced model training accessible
Pure JAX implementation shows competitive models can be trained outside PyTorch ecosystems

Complete Training Pipeline, Not Just a Pre-Trained Model

Competitive Performance at $200 Training Cost

Constitutional AI Implementation in Three Stages

The project implements three training stages:

Constitutional SFT: Synthetic data generation using generator-critique loops to align responses with a "SOUL" document defining desired behavior

Agentic SFT: Training on approximately 120K examples from existing code instruction datasets, plus 2,000 complex multi-turn rollouts demonstrating tool use

DPO (Direct Preference Optimization): Preference learning that improved accuracy from 45% to 88%

Community Reception and Educational Value

Key Takeaways

Nanocode demonstrates complete Constitutional AI training pipeline for agentic coding models, not just a pre-trained model

The 1.3B parameter model costs $200 to train on TPU v6e-8, with smaller 477M variant at $34

Three-stage training (Constitutional SFT, Agentic SFT, DPO) improved accuracy from 45% to 88%

Free TPU credits available through Google TRC program make advanced model training accessible

Pure JAX implementation shows competitive models can be trained outside PyTorch ecosystems