Researchers from the LM-Provers team released QED-Nano, a 4B parameter model designed for Olympiad-level mathematical proofs that surpasses much larger open models and approaches proprietary system performance. The paper, published on arXiv on April 6, 2026, addresses a critical gap: while proprietary AI systems demonstrated gold-level IMO 2025 performance, their training pipelines remain undisclosed and expensive.
Three-Stage Training Recipe
QED-Nano employs a three-stage training process. First, supervised fine-tuning imbues proof-writing styles by distilling from DeepSeek-Math-V2. Second, reinforcement learning with rubric-based rewards refines proof generation. Third, expanding RL with reasoning cache decomposes long proofs into iterative summarize-and-refine cycles, enabling stronger test-time reasoning without proportional memory costs.
Performance and Efficiency Gains
The model achieves 57% on IMO-ProofBench, surpassing proof-generation performance of much larger open models including Nomos-1 and GPT-OSS-120B. QED-Nano approaches the performance of proprietary models like Gemini 3 Pro while requiring a fraction of the inference cost. The reasoning cache innovation enables test-time scaling without proportional memory increases, a key advantage for deployment.
Complete Open Release
The team open-sourced the full QED-Nano and QED-Nano-SFT models, along with FineProofs-SFT and FineProofs-RL datasets. Complete training and evaluation code is available to support research on open mathematical reasoning. Authors include Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, and Aviral Kumar.
The release represents the first fully open pipeline for training small models to competitive levels on proof-based problems, following recent demonstrations of proprietary systems achieving gold-level IMO performance. The work demonstrates that efficient smaller models can match or exceed larger systems on specialized mathematical reasoning tasks.
Key Takeaways
- QED-Nano achieves 57% on IMO-ProofBench with only 4B parameters, surpassing larger open models
- The three-stage training process combines supervised fine-tuning, rubric-based RL, and reasoning cache expansion
- Performance approaches proprietary models like Gemini 3 Pro at a fraction of inference cost
- Full models, datasets, and training code are open-sourced to support mathematical reasoning research
- Reasoning cache innovation enables test-time scaling without proportional memory increases