Samuel Faj released Distill, an open-source CLI tool that compresses verbose command outputs for AI coding workflows, achieving up to 98.7% token reduction. Released on March 6, 2026, the project gained 146 GitHub stars within 48 hours by addressing a critical pain point: the enormous token costs of sending raw CLI outputs to paid LLMs like Claude Code and GitHub Copilot.
The tool works by piping command outputs through a local LLM model that extracts only relevant information based on user queries, dramatically reducing API costs and improving response times for AI-assisted development.
How Distill Solves the Token Waste Problem
Command outputs like logs, test results, diffs, and stack traces routinely consume thousands of tokens when sent to LLMs. A single ripgrep search result can use excessive tokens just to answer basic questions, leading to high API costs and slower response times. Distill addresses this by preprocessing outputs locally before sending them to expensive cloud LLMs.
The tool's usage is straightforward:
git diff | distill 'what changed?'bun test | distill 'did tests pass?'npm audit | distill 'extract vulnerabilities as JSON'
Distill uses a local LLM model (default: Qwen 3.5 2B via Ollama) to analyze input and provide focused responses. This approach avoids API costs entirely for the compression step while preserving actionable information.
Technical Implementation and Performance
Distill is built with TypeScript/Node.js and streams input to local LLMs for processing. Key features include:
- Persistent settings for model selection, timeouts, and reasoning preferences
- Integration with agent tools like Codex, Claude Code, and OpenCode through global instruction files
- Support for multiple local LLM backends via Ollama
The developer's demonstration showed remarkable efficiency: a 7,648-token ripgrep search output was condensed to just 99 tokens—a 98.7% reduction—while maintaining all essential information. This directly translates to lower LLM API costs and faster response times for agent-based development workflows.
Why Token Optimization Matters for AI Coding
As AI coding assistants become standard tools, token costs represent a major operational expense. Tools like Claude Code and GitHub Copilot charge by token usage, and verbose CLI outputs can quickly drain budgets. The timing is particularly relevant given recent developments:
- Claude Code's scheduled tasks feature launched March 6, 2026, increasing automated CLI usage
- The proliferation of agent frameworks has amplified the volume of command outputs processed by LLMs
- Rising API costs make efficient token management critical for sustainable AI-assisted development
Distill provides a practical solution by handling compression locally, making it especially valuable for developers running frequent commands through AI assistants. The tool's use of free local models for preprocessing means the only API costs are for the final, compressed outputs.
Community Reception and Use Cases
Developers have identified multiple applications for Distill:
- Log analysis and error extraction from verbose application logs
- Test output parsing to identify specific failures
- Debugging workflows that generate large stack traces
- Security audit result summarization
- Code review diff analysis
The project's open-source nature and zero-dependency design (beyond Node.js and Ollama) have contributed to positive community reception. The tool represents a practical approach to a growing problem as AI coding tools become more integrated into daily development workflows.
Key Takeaways
- Distill is an open-source CLI tool that compresses command outputs by up to 98.7% before sending to LLMs, reducing API costs and improving response times
- The tool gained 146 GitHub stars within 48 hours of its March 6, 2026 release by addressing token waste in AI coding workflows
- Distill uses local LLM models (default: Qwen 3.5 2B) to preprocess outputs, avoiding API costs for the compression step
- A demonstration showed compression from 7,648 tokens to 99 tokens while preserving actionable information
- The tool integrates with popular AI coding assistants like Claude Code, Codex, and OpenCode, making token optimization seamless for developers