Google DeepMind's Aletheia AI Agent Solves Open Mathematical Research Problems

Google DeepMind announced on February 11, 2026, that its Aletheia agent has advanced beyond solving competition-level problems to functioning as a collaborative research partner for scientists tackling open problems in mathematics, physics, and computer science. The system achieved 91.9% on IMO-ProofBench Advanced while using less compute than previous approaches, and autonomously solved 4 open Erdős conjectures—mathematical problems that had previously stumped human mathematicians.

Aletheia Achieves Breakthrough Performance on Mathematical Benchmarks

The system demonstrated unprecedented capabilities across multiple evaluation metrics. Aletheia achieved up to 90% accuracy on advanced formal proof benchmarks and improved performance from roughly 65% to 95.1% on certain benchmarks, significantly outperforming previous state-of-the-art systems. Most notably, the agent scored 91.9% on IMO-ProofBench Advanced, surpassing the January 2026 Deep Think score while requiring less computational resources.

System Autonomously Solves Previously Unsolved Mathematical Conjectures

When deployed on 700 open Erdős conjectures, Aletheia produced 212 candidates and resolved 13 problems through either autonomous solutions or literature identification. The system also generated a complete research paper on eigenweights without human intervention that researchers consider publishable quality. This represents a fundamental shift from AI systems that solve known problems to those capable of discovering new mathematics and creating original research contributions at a professional level.

Aletheia Moves Beyond Competition Mathematics to Real Scientific Research

Unlike typical AI models designed to solve puzzles or competition problems, Aletheia functions as a genuine research tool. The system is built on an advanced reasoning version of Gemini Deep Think and represents Google's strategic push into autonomous scientific research capabilities. The agent tackles PhD-level problems with scalable approaches, demonstrating that AI can now contribute meaningfully to active areas of mathematical and scientific inquiry rather than merely reproducing existing knowledge.

Key Takeaways

Google DeepMind's Aletheia achieved 91.9% on IMO-ProofBench Advanced and up to 90% accuracy on advanced formal proof benchmarks
The system autonomously solved 4 open Erdős conjectures and resolved 13 out of 700 tested problems
Aletheia generated a publishable research paper on eigenweights without human intervention
Performance improved from roughly 65% to 95.1% on certain benchmarks, outperforming previous state-of-the-art by significant margins
The system represents a shift from AI solving known problems to AI discovering new mathematics and contributing original research

Aletheia Achieves Breakthrough Performance on Mathematical Benchmarks

System Autonomously Solves Previously Unsolved Mathematical Conjectures

Aletheia Moves Beyond Competition Mathematics to Real Scientific Research

Key Takeaways

Google DeepMind's Aletheia achieved 91.9% on IMO-ProofBench Advanced and up to 90% accuracy on advanced formal proof benchmarks

The system autonomously solved 4 open Erdős conjectures and resolved 13 out of 700 tested problems

Aletheia generated a publishable research paper on eigenweights without human intervention

Performance improved from roughly 65% to 95.1% on certain benchmarks, outperforming previous state-of-the-art by significant margins

The system represents a shift from AI solving known problems to AI discovering new mathematics and contributing original research