PrismML emerged from stealth on March 31, 2026, introducing the world's first commercially viable 1-bit large language models built on research developed at Caltech. The company released three models under Apache 2.0 license with dramatically reduced memory and compute requirements while maintaining competitive performance with traditional architectures.
Revolutionary 1-Bit Architecture Reduces Model Size by 14x
PrismML fundamentally redesigned neural networks at the mathematical level, creating models with native 1-bit parameter precision instead of traditional 16- or 32-bit architectures. The flagship 1-bit Bonsai 8B model requires only 1GB of memory compared to 16GB for equivalent full-precision models—a 14x reduction in footprint. This breakthrough enables powerful language models to run on devices previously unable to support local AI inference.
The company released three models:
- 1-bit Bonsai 8B: 8 billion parameters with 1GB memory footprint
- 1-bit Bonsai 4B: 0.5GB memory footprint
- 1-bit Bonsai 1.7B: 0.24GB memory footprint
Performance Benchmarks Show 8x Faster Inference
Despite the dramatic size reduction, PrismML's models deliver competitive performance:
- 8x faster inference compared to full-precision equivalents
- 5x more energy efficient
- Matches leading 8B models on standard benchmarks
- Intelligence density score of 1.06/GB versus 0.10/GB for Qwen3 8B
The models are available on Hugging Face for free download starting March 31, 2026. The breakthrough enables entirely new deployment scenarios for mobile devices, IoT applications, and edge computing environments where memory and power constraints previously prevented LLM deployment.
Key Takeaways
- PrismML released the first commercially viable 1-bit LLMs with native 1-bit parameter precision, reducing an 8B model to just 1GB memory footprint (14x smaller than full-precision)
- The 1-bit Bonsai models achieve 8x faster inference and 5x better energy efficiency while matching leading 8B models on benchmarks
- All three models (8B, 4B, and 1.7B parameters) are available under Apache 2.0 license on Hugging Face as of March 31, 2026
- The technology originated from Caltech research and enables powerful LLMs to run on mobile devices and edge computing environments
- Intelligence density reaches 1.06/GB, representing a 10x improvement over comparable parameter-count models