IBM's Granite 4.1: 8B Model Matching 32B MoE Performance

IBM released Granite 4.1 on April 30, 2026, introducing three model sizes—3B, 8B, and 30B parameters—that use decoder-only dense transformer architecture without mixture-of-experts routing. The 8B model achieves performance matching or exceeding its 32B MoE predecessor while offering predictable latency and cost.

8B Model Matches or Exceeds 32B MoE Across Key Benchmarks

The Granite 4.1 8B model demonstrates significant efficiency gains over the previous 32B MoE architecture:

ArenaHard: 8B scores 69.0, surpassing the previous generation
BFCL V3 tool calling: 8B reaches 68.3 versus 64.7 for the 32B MoE
GSM8K math: 8B achieves 92.5; 30B reaches 94.2
DeepMind-Math: 8B scores 80.1; 30B achieves 81.9
EvalPlus coding: 8B reaches 80.2; 30B scores 82.7

Data Quality Pipeline Used LLM-as-Judge Evaluation

IBM implemented a rigorous data curation process before fine-tuning. The team used LLM-as-Judge evaluation across six dimensions, filtering down to 4.1 million curated samples while automatically rejecting hallucinations and false premises.

Five-Phase Training Strategy With Sequential RL Stages

The training methodology included five distinct phases with evolving data mixtures. The approach progressed from broad content in Phase 1 to specialized instruction data in Phases 3-4, culminating in long-context training reaching 512K tokens in Phase 5. IBM then applied four sequential reinforcement learning stages, including a dedicated math recovery phase after initial RLHF caused regression in math benchmarks.

Apache 2.0 License Enables Commercial Deployment

Granite 4.1 is available under Apache 2.0 license, permitting commercial use. The models are accessible through Hugging Face (ibm-granite), Ollama, vLLM, and Transformers, with FP8 quantized variants available for production deployment.

Key Takeaways

IBM's Granite 4.1 8B model matches or exceeds the previous 32B MoE architecture across most benchmarks while using a simpler dense transformer design
The 8B model scores 68.3 on BFCL V3 tool calling versus 64.7 for the 32B MoE, and achieves 92.5 on GSM8K math tasks
IBM curated 4.1 million samples using LLM-as-Judge evaluation across six dimensions, automatically filtering hallucinations and false premises
Training used five distinct phases culminating in 512K token long-context capability, plus four sequential RL stages including math recovery
Models are released under Apache 2.0 license and available on Hugging Face, Ollama, vLLM, and Transformers with FP8 quantized variants

8B Model Matches or Exceeds 32B MoE Across Key Benchmarks

The Granite 4.1 8B model demonstrates significant efficiency gains over the previous 32B MoE architecture:

ArenaHard: 8B scores 69.0, surpassing the previous generation

BFCL V3 tool calling: 8B reaches 68.3 versus 64.7 for the 32B MoE

GSM8K math: 8B achieves 92.5; 30B reaches 94.2

DeepMind-Math: 8B scores 80.1; 30B achieves 81.9

EvalPlus coding: 8B reaches 80.2; 30B scores 82.7

Five-Phase Training Strategy With Sequential RL Stages

Key Takeaways

IBM's Granite 4.1 8B model matches or exceeds the previous 32B MoE architecture across most benchmarks while using a simpler dense transformer design

The 8B model scores 68.3 on BFCL V3 tool calling versus 64.7 for the 32B MoE, and achieves 92.5 on GSM8K math tasks

IBM curated 4.1 million samples using LLM-as-Judge evaluation across six dimensions, automatically filtering hallucinations and false premises

Training used five distinct phases culminating in 512K token long-context capability, plus four sequential RL stages including math recovery

Models are released under Apache 2.0 license and available on Hugging Face, Ollama, vLLM, and Transformers with FP8 quantized variants