QED-Nano: 4B Model Matches Larger Systems on Olympiad Math Proofs

Researchers from the LM-Provers team released QED-Nano, a 4B parameter model designed for Olympiad-level mathematical proofs that surpasses much larger open models and approaches proprietary system performance. The paper, published on arXiv on April 6, 2026, addresses a critical gap: while proprietary AI systems demonstrated gold-level IMO 2025 performance, their training pipelines remain undisclosed and expensive.

Three-Stage Training Recipe

QED-Nano employs a three-stage training process. First, supervised fine-tuning imbues proof-writing styles by distilling from DeepSeek-Math-V2. Second, reinforcement learning with rubric-based rewards refines proof generation. Third, expanding RL with reasoning cache decomposes long proofs into iterative summarize-and-refine cycles, enabling stronger test-time reasoning without proportional memory costs.

Performance and Efficiency Gains

The model achieves 57% on IMO-ProofBench, surpassing proof-generation performance of much larger open models including Nomos-1 and GPT-OSS-120B. QED-Nano approaches the performance of proprietary models like Gemini 3 Pro while requiring a fraction of the inference cost. The reasoning cache innovation enables test-time scaling without proportional memory increases, a key advantage for deployment.

Complete Open Release

The team open-sourced the full QED-Nano and QED-Nano-SFT models, along with FineProofs-SFT and FineProofs-RL datasets. Complete training and evaluation code is available to support research on open mathematical reasoning. Authors include Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, and Aviral Kumar.

The release represents the first fully open pipeline for training small models to competitive levels on proof-based problems, following recent demonstrations of proprietary systems achieving gold-level IMO performance. The work demonstrates that efficient smaller models can match or exceed larger systems on specialized mathematical reasoning tasks.

Key Takeaways

QED-Nano achieves 57% on IMO-ProofBench with only 4B parameters, surpassing larger open models
The three-stage training process combines supervised fine-tuning, rubric-based RL, and reasoning cache expansion
Performance approaches proprietary models like Gemini 3 Pro at a fraction of inference cost
Full models, datasets, and training code are open-sourced to support mathematical reasoning research
Reasoning cache innovation enables test-time scaling without proportional memory increases

Three-Stage Training Recipe

Performance and Efficiency Gains

Complete Open Release

Key Takeaways

QED-Nano achieves 57% on IMO-ProofBench with only 4B parameters, surpassing larger open models

The three-stage training process combines supervised fine-tuning, rubric-based RL, and reasoning cache expansion

Performance approaches proprietary models like Gemini 3 Pro at a fraction of inference cost

Full models, datasets, and training code are open-sourced to support mathematical reasoning research

Reasoning cache innovation enables test-time scaling without proportional memory increases