LG EXAONE 4.5 Outperforms GPT-5 Mini on STEM Benchmarks With 33B Parameters

LG AI Research announced EXAONE 4.5 on April 9, 2026, as a multimodal AI model that outperformed several frontier models on STEM benchmarks while using just 33 billion parameters. The model scored 77.3 on average across five key STEM benchmarks, exceeding GPT-5 mini's 73.5 and Claude 4.5 Sonnet's 74.6.

Smaller Model Achieves Competitive Performance Against Larger Competitors

EXAONE 4.5's benchmark results demonstrate efficiency gains in model architecture and training:

EXAONE 4.5 (33B parameters): 77.3 average STEM score
GPT-5-mini: 73.5
Claude 4.5 Sonnet: 74.6
Alibaba Qwen-3 (235B parameters): 77.0

The model achieved comparable performance to Alibaba's Qwen-3 despite using approximately one-seventh the number of parameters, suggesting significant optimization in model design.

Strong Coding Performance Exceeds Google's Latest Model

EXAONE 4.5 scored 81.4 on LiveCodeBench v6, surpassing Google's Gemma 4 (80.0). This technical edge in coding tasks highlights the model's capabilities in practical software development applications.

Non-Commercial License Limits Real-World Adoption

Despite strong performance metrics, EXAONE 4.5 carries a non-commercial license that prevents developers and companies from using it for commercial applications. This licensing restriction caps its practical reach compared to commercially-licensed alternatives, limiting its potential impact despite technical achievements.

Model Supports Simultaneous Text and Image Understanding

EXAONE 4.5 processes both text and images simultaneously, enabling multimodal reasoning capabilities. This positions it as a vision-language model capable of tasks requiring combined understanding of visual and textual information.

Release Highlights Tension Between Open Weights and Commercial Viability

The announcement demonstrates how smaller, well-optimized models can compete with larger frontier models from major AI labs on specific task categories. However, the non-commercial license prevents it from becoming a practical alternative for most developers, highlighting ongoing tensions in the AI community between open-weight releases and commercial accessibility.

Key Takeaways

LG EXAONE 4.5 scored 77.3 on average across five STEM benchmarks, beating GPT-5 mini (73.5) and Claude 4.5 Sonnet (74.6)
The model uses 33 billion parameters, approximately one-seventh the size of Alibaba's Qwen-3 (235B) while achieving comparable performance
EXAONE 4.5 scored 81.4 on LiveCodeBench v6, exceeding Google's Gemma 4 (80.0) in coding tasks
A non-commercial license prevents developers and companies from using the model for commercial applications
The release available on Hugging Face demonstrates that smaller, well-optimized models can compete with larger frontier models on specific benchmarks

Smaller Model Achieves Competitive Performance Against Larger Competitors

EXAONE 4.5's benchmark results demonstrate efficiency gains in model architecture and training:

EXAONE 4.5 (33B parameters): 77.3 average STEM score

GPT-5-mini: 73.5

Claude 4.5 Sonnet: 74.6

Alibaba Qwen-3 (235B parameters): 77.0

The model achieved comparable performance to Alibaba's Qwen-3 despite using approximately one-seventh the number of parameters, suggesting significant optimization in model design.

Non-Commercial License Limits Real-World Adoption

Release Highlights Tension Between Open Weights and Commercial Viability

Key Takeaways

LG EXAONE 4.5 scored 77.3 on average across five STEM benchmarks, beating GPT-5 mini (73.5) and Claude 4.5 Sonnet (74.6)

The model uses 33 billion parameters, approximately one-seventh the size of Alibaba's Qwen-3 (235B) while achieving comparable performance

EXAONE 4.5 scored 81.4 on LiveCodeBench v6, exceeding Google's Gemma 4 (80.0) in coding tasks

A non-commercial license prevents developers and companies from using the model for commercial applications

The release available on Hugging Face demonstrates that smaller, well-optimized models can compete with larger frontier models on specific benchmarks