Researchers from Princeton and CMU have developed Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 that achieves 100% accuracy on the MiniF2F-test benchmark. The system uses blueprint generation—a dependency graph of definitions and lemmas—rather than traditional recursive decomposition, achieving state-of-the-art performance at up to 500x lower cost than comparable open-source pipelines.
Blueprint-Based Architecture Outperforms Recursive Approaches
Goedel-Architect centers on generating and refining blueprints, which are dependency graphs that build up to the main theorem through formally stated definitions and lemmas. The system first creates a blueprint with declared dependencies, optionally guided by natural language proofs. A tool-equipped Lean prover component then closes each open lemma node in parallel using relevant dependencies, with failed lemmas driving refinement of the global blueprint.
This approach contrasts with mainstream methods using recursive lemma decomposition, which can inefficiently loop on dead-end strategies. By planning the proof structure upfront, Goedel-Architect avoids wasted computation on unproductive paths.
State-of-the-Art Results Across Multiple Benchmarks
Using open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone model, Goedel-Architect demonstrates exceptional performance:
- 99.2% pass@1 on MiniF2F-test without natural language guidance
- 100% on MiniF2F-test when seeded with natural language proofs on harder problems, closing the remaining two problems
- 75.6% pass@1 on PutnamBench (rising to 88.8% or 597/672 with proof seeding)
- 4/6 problems solved on IMO 2025
- 11/12 problems solved on Putnam 2025
- 3/6 problems solved on USAMO 2026
The research team includes Jui-Hui Chung, Ziyang Cai, and Sanjeev Arora among other authors. The arXiv paper (2606.06468) was published June 4, 2026.
Cost-Effective Open-Source Solution
Goedel-Architect achieves these results at a price point up to 500x less than comparable open-source pipelines, making advanced theorem proving more accessible to researchers and institutions. The framework's efficiency stems from its parallel proof execution and strategic blueprint refinement, which minimize redundant computation.
Key Takeaways
- Goedel-Architect achieves 100% accuracy on MiniF2F-test and 88.8% on PutnamBench using blueprint-based theorem proving in Lean 4
- The system uses dependency graphs rather than recursive decomposition, avoiding inefficient loops on dead-end strategies
- Built on DeepSeek-V4-Flash (284B-A13B), the framework operates at up to 500x lower cost than comparable open-source solutions
- The system solved 4/6 IMO 2025 problems, 11/12 Putnam 2025 problems, and 3/6 USAMO 2026 problems
- Researchers from Princeton and CMU published the work on arXiv (2606.06468) on June 4, 2026