Baidu released ERNIE-Image on April 14, 2026, an open-source text-to-image generation model specifically designed to excel at rendering text within images. The 8B parameter model runs on consumer GPUs with 24GB VRAM and is available under Apache 2.0 license, addressing a persistent weakness in generative AI: accurate text rendering in posters, infographics, and UI designs.
Single-Stream Diffusion Transformer Architecture With 8B Parameters
ERNIE-Image uses a single-stream Diffusion Transformer (DiT) architecture with only 8B DiT parameters, making it significantly smaller than competitors while maintaining competitive performance. The model includes a lightweight Prompt Enhancer that converts brief user inputs into detailed structured descriptions, enabling more accurate image generation from simple prompts.
Two variants are available: the base model requires 50 inference steps with CFG 4.0, while ERNIE-Image-Turbo uses just 8 steps with CFG 1.0, optimized through distillation and reinforcement learning. Both models support deployment through Diffusers and SGLang frameworks.
Superior Performance on Text-Heavy Tasks and Complex Layouts
The model demonstrates exceptional capability for dense, long-form, and layout-sensitive text rendering. ERNIE-Image particularly excels at generating posters, comics, storyboards, and multi-panel compositions where text must be legible and properly positioned. It handles complex prompts involving multiple objects and relationships while supporting both realistic photography and stylized aesthetics.
Benchmark results show strong performance with a GENEval score of 0.8856 and LongTextBench score of 0.9733. According to Baidu, ERNIE-Image often outperforms larger competitors including FLUX and Qwen-Image on text rendering tasks, despite using fewer parameters.
Bilingual Support and Consumer Hardware Accessibility
ERNIE-Image provides strong bilingual support and demonstrates competitive performance across English and Chinese benchmarks including GenEval and OneIG. The 24GB VRAM requirement makes the model accessible to developers with consumer-grade GPUs, removing the hardware barrier typically associated with state-of-the-art image generation models.
The Apache 2.0 license allows commercial use and modification, enabling developers and researchers to build applications requiring reliable text rendering without licensing restrictions. The GitHub repository has gained 242 stars since its April 14 release.
Key Takeaways
- Baidu released ERNIE-Image on April 14, 2026, an 8B parameter open-source text-to-image model under Apache 2.0 license
- The model achieves a LongTextBench score of 0.9733, excelling at rendering dense, long-form text in posters, infographics, and UI designs
- ERNIE-Image runs on consumer GPUs with 24GB VRAM and offers a turbo variant requiring only 8 inference steps
- The model outperforms larger competitors like FLUX and Qwen-Image on text-heavy tasks despite using fewer parameters
- Strong bilingual support enables competitive performance across English and Chinese benchmarks