Mistral Releases Leanstral: First Open-Source Agent for Formal Proof Engineering

Mistral AI released Leanstral on March 16, 2026, marking the first open-source code agent specifically designed for Lean 4, a proof assistant capable of expressing complex mathematical objects and software specifications. The model uses a highly sparse architecture with only 6B active parameters, delivering competitive performance at a fraction of the cost of larger alternatives.

Leanstral Achieves Competitive Performance at 92x Lower Cost

On the FLTEval benchmark, which tests realistic proof engineering scenarios rather than isolated problems, Leanstral achieved a pass@2 score of 26.3 at a cost of $36. This outperforms GLM5-744B (16.6) and rivals Qwen3.5-397B pass@4 (25.4) and Claude Sonnet pass@2 (23.7, $549). While Claude Opus 4.6 leads with 39.6 pass@2, it costs $1,650—92 times more than Leanstral. At pass@16, Leanstral reaches 31.9, demonstrating strong performance scaling.

The FLTEval benchmark represents a significant shift in evaluation methodology. According to Mistral, "Leanstral is benchmarked for completing all formal proofs and correctly defining new mathematical concepts in each PR to the FLT project, instead of isolated mathematical problems, to reflect usefulness in realistic proof engineering scenarios."

Combining Code Generation With Formal Verification

Leanstral's core innovation lies in merging code generation with formal verification. Mistral envisions "a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications." Rather than requiring developers to debug machine-generated code, users specify desired outcomes and Leanstral produces formally verified implementations.

The model demonstrated remarkable adaptability when Lean 4.29.0-rc6 introduced breaking changes that caused mysterious compilation failures. Leanstral successfully diagnosed that a 'def' keyword blocking pattern matching needed replacement with 'abbrev' for transparent definitional equality—solving the issue without explicit training on the new version. In another demonstration, it successfully translated Rocq proof code to Lean, including custom notation, and proved properties about imperative programs.

Technical Integration and Availability

Leanstral supports arbitrary Model Context Protocols (MCPs) through Mistral Vibe and is specifically trained for maximum performance with lean-lsp-mcp. The model is available through three channels: Mistral Vibe for zero-setup integration using the /leanstral command, the Labs API with a free or near-free 'labs-leanstral-2603' endpoint for a limited period, and self-hosted deployment using Apache 2.0 licensed weights.

The announcement gained significant attention on Hacker News, accumulating 277 points with 49 comments.

Key Takeaways

Leanstral is the first open-source code agent designed specifically for Lean 4 formal proof engineering, using only 6B active parameters
The model achieves 26.3 pass@2 on FLTEval for $36, compared to Claude Opus 4.6's 39.6 pass@2 at $1,650—92 times more expensive
Leanstral successfully diagnosed and fixed breaking changes in Lean 4.29.0-rc6 without explicit training on the new version
Available under Apache 2.0 license through Mistral Vibe, Labs API, and self-hosted deployment
FLTEval benchmark tests realistic proof engineering scenarios rather than isolated mathematical problems

Leanstral Achieves Competitive Performance at 92x Lower Cost

Combining Code Generation With Formal Verification

Technical Integration and Availability

The announcement gained significant attention on Hacker News, accumulating 277 points with 49 comments.

Key Takeaways

Leanstral is the first open-source code agent designed specifically for Lean 4 formal proof engineering, using only 6B active parameters

The model achieves 26.3 pass@2 on FLTEval for $36, compared to Claude Opus 4.6's 39.6 pass@2 at $1,650—92 times more expensive

Leanstral successfully diagnosed and fixed breaking changes in Lean 4.29.0-rc6 without explicit training on the new version

Available under Apache 2.0 license through Mistral Vibe, Labs API, and self-hosted deployment

FLTEval benchmark tests realistic proof engineering scenarios rather than isolated mathematical problems