IBM released Granite 4.1 on April 30, 2026, introducing three model sizes—3B, 8B, and 30B parameters—that use decoder-only dense transformer architecture without mixture-of-experts routing. The 8B model achieves performance matching or exceeding its 32B MoE predecessor while offering predictable latency and cost.
8B Model Matches or Exceeds 32B MoE Across Key Benchmarks
The Granite 4.1 8B model demonstrates significant efficiency gains over the previous 32B MoE architecture:
- ArenaHard: 8B scores 69.0, surpassing the previous generation
- BFCL V3 tool calling: 8B reaches 68.3 versus 64.7 for the 32B MoE
- GSM8K math: 8B achieves 92.5; 30B reaches 94.2
- DeepMind-Math: 8B scores 80.1; 30B achieves 81.9
- EvalPlus coding: 8B reaches 80.2; 30B scores 82.7
Data Quality Pipeline Used LLM-as-Judge Evaluation
IBM implemented a rigorous data curation process before fine-tuning. The team used LLM-as-Judge evaluation across six dimensions, filtering down to 4.1 million curated samples while automatically rejecting hallucinations and false premises.
Five-Phase Training Strategy With Sequential RL Stages
The training methodology included five distinct phases with evolving data mixtures. The approach progressed from broad content in Phase 1 to specialized instruction data in Phases 3-4, culminating in long-context training reaching 512K tokens in Phase 5. IBM then applied four sequential reinforcement learning stages, including a dedicated math recovery phase after initial RLHF caused regression in math benchmarks.
Apache 2.0 License Enables Commercial Deployment
Granite 4.1 is available under Apache 2.0 license, permitting commercial use. The models are accessible through Hugging Face (ibm-granite), Ollama, vLLM, and Transformers, with FP8 quantized variants available for production deployment.
Key Takeaways
- IBM's Granite 4.1 8B model matches or exceeds the previous 32B MoE architecture across most benchmarks while using a simpler dense transformer design
- The 8B model scores 68.3 on BFCL V3 tool calling versus 64.7 for the 32B MoE, and achieves 92.5 on GSM8K math tasks
- IBM curated 4.1 million samples using LLM-as-Judge evaluation across six dimensions, automatically filtering hallucinations and false premises
- Training used five distinct phases culminating in 512K token long-context capability, plus four sequential RL stages including math recovery
- Models are released under Apache 2.0 license and available on Hugging Face, Ollama, vLLM, and Transformers with FP8 quantized variants