Google DeepMind announced on February 11, 2026, that its Aletheia agent has advanced beyond solving competition-level problems to functioning as a collaborative research partner for scientists tackling open problems in mathematics, physics, and computer science. The system achieved 91.9% on IMO-ProofBench Advanced while using less compute than previous approaches, and autonomously solved 4 open Erdős conjectures—mathematical problems that had previously stumped human mathematicians.
Aletheia Achieves Breakthrough Performance on Mathematical Benchmarks
The system demonstrated unprecedented capabilities across multiple evaluation metrics. Aletheia achieved up to 90% accuracy on advanced formal proof benchmarks and improved performance from roughly 65% to 95.1% on certain benchmarks, significantly outperforming previous state-of-the-art systems. Most notably, the agent scored 91.9% on IMO-ProofBench Advanced, surpassing the January 2026 Deep Think score while requiring less computational resources.
System Autonomously Solves Previously Unsolved Mathematical Conjectures
When deployed on 700 open Erdős conjectures, Aletheia produced 212 candidates and resolved 13 problems through either autonomous solutions or literature identification. The system also generated a complete research paper on eigenweights without human intervention that researchers consider publishable quality. This represents a fundamental shift from AI systems that solve known problems to those capable of discovering new mathematics and creating original research contributions at a professional level.
Aletheia Moves Beyond Competition Mathematics to Real Scientific Research
Unlike typical AI models designed to solve puzzles or competition problems, Aletheia functions as a genuine research tool. The system is built on an advanced reasoning version of Gemini Deep Think and represents Google's strategic push into autonomous scientific research capabilities. The agent tackles PhD-level problems with scalable approaches, demonstrating that AI can now contribute meaningfully to active areas of mathematical and scientific inquiry rather than merely reproducing existing knowledge.
Key Takeaways
- Google DeepMind's Aletheia achieved 91.9% on IMO-ProofBench Advanced and up to 90% accuracy on advanced formal proof benchmarks
- The system autonomously solved 4 open Erdős conjectures and resolved 13 out of 700 tested problems
- Aletheia generated a publishable research paper on eigenweights without human intervention
- Performance improved from roughly 65% to 95.1% on certain benchmarks, outperforming previous state-of-the-art by significant margins
- The system represents a shift from AI solving known problems to AI discovering new mathematics and contributing original research