Indie Developer Releases Distill: CLI Tool That Compresses Command Outputs by 98.7% to Save LLM Tokens

Samuel Faj released Distill, an open-source CLI tool that compresses verbose command outputs for AI coding workflows, achieving up to 98.7% token reduction. Released on March 6, 2026, the project gained 146 GitHub stars within 48 hours by addressing a critical pain point: the enormous token costs of sending raw CLI outputs to paid LLMs like Claude Code and GitHub Copilot.

The tool works by piping command outputs through a local LLM model that extracts only relevant information based on user queries, dramatically reducing API costs and improving response times for AI-assisted development.

How Distill Solves the Token Waste Problem

Command outputs like logs, test results, diffs, and stack traces routinely consume thousands of tokens when sent to LLMs. A single ripgrep search result can use excessive tokens just to answer basic questions, leading to high API costs and slower response times. Distill addresses this by preprocessing outputs locally before sending them to expensive cloud LLMs.

The tool's usage is straightforward:

git diff | distill 'what changed?'
bun test | distill 'did tests pass?'
npm audit | distill 'extract vulnerabilities as JSON'

Distill uses a local LLM model (default: Qwen 3.5 2B via Ollama) to analyze input and provide focused responses. This approach avoids API costs entirely for the compression step while preserving actionable information.

Technical Implementation and Performance

Distill is built with TypeScript/Node.js and streams input to local LLMs for processing. Key features include:

Persistent settings for model selection, timeouts, and reasoning preferences
Integration with agent tools like Codex, Claude Code, and OpenCode through global instruction files
Support for multiple local LLM backends via Ollama

The developer's demonstration showed remarkable efficiency: a 7,648-token ripgrep search output was condensed to just 99 tokens—a 98.7% reduction—while maintaining all essential information. This directly translates to lower LLM API costs and faster response times for agent-based development workflows.

Why Token Optimization Matters for AI Coding

As AI coding assistants become standard tools, token costs represent a major operational expense. Tools like Claude Code and GitHub Copilot charge by token usage, and verbose CLI outputs can quickly drain budgets. The timing is particularly relevant given recent developments:

Claude Code's scheduled tasks feature launched March 6, 2026, increasing automated CLI usage
The proliferation of agent frameworks has amplified the volume of command outputs processed by LLMs
Rising API costs make efficient token management critical for sustainable AI-assisted development

Distill provides a practical solution by handling compression locally, making it especially valuable for developers running frequent commands through AI assistants. The tool's use of free local models for preprocessing means the only API costs are for the final, compressed outputs.

Community Reception and Use Cases

Developers have identified multiple applications for Distill:

Log analysis and error extraction from verbose application logs
Test output parsing to identify specific failures
Debugging workflows that generate large stack traces
Security audit result summarization
Code review diff analysis

The project's open-source nature and zero-dependency design (beyond Node.js and Ollama) have contributed to positive community reception. The tool represents a practical approach to a growing problem as AI coding tools become more integrated into daily development workflows.

Key Takeaways

Distill is an open-source CLI tool that compresses command outputs by up to 98.7% before sending to LLMs, reducing API costs and improving response times
The tool gained 146 GitHub stars within 48 hours of its March 6, 2026 release by addressing token waste in AI coding workflows
Distill uses local LLM models (default: Qwen 3.5 2B) to preprocess outputs, avoiding API costs for the compression step
A demonstration showed compression from 7,648 tokens to 99 tokens while preserving actionable information
The tool integrates with popular AI coding assistants like Claude Code, Codex, and OpenCode, making token optimization seamless for developers

How Distill Solves the Token Waste Problem

The tool's usage is straightforward:

git diff | distill 'what changed?'

bun test | distill 'did tests pass?'

npm audit | distill 'extract vulnerabilities as JSON'

Technical Implementation and Performance

Distill is built with TypeScript/Node.js and streams input to local LLMs for processing. Key features include:

Persistent settings for model selection, timeouts, and reasoning preferences

Integration with agent tools like Codex, Claude Code, and OpenCode through global instruction files

Support for multiple local LLM backends via Ollama

Why Token Optimization Matters for AI Coding

Claude Code's scheduled tasks feature launched March 6, 2026, increasing automated CLI usage

The proliferation of agent frameworks has amplified the volume of command outputs processed by LLMs

Rising API costs make efficient token management critical for sustainable AI-assisted development

Community Reception and Use Cases

Developers have identified multiple applications for Distill:

Log analysis and error extraction from verbose application logs

Test output parsing to identify specific failures

Debugging workflows that generate large stack traces

Security audit result summarization

Code review diff analysis

Key Takeaways

Distill is an open-source CLI tool that compresses command outputs by up to 98.7% before sending to LLMs, reducing API costs and improving response times

The tool gained 146 GitHub stars within 48 hours of its March 6, 2026 release by addressing token waste in AI coding workflows

Distill uses local LLM models (default: Qwen 3.5 2B) to preprocess outputs, avoiding API costs for the compression step

A demonstration showed compression from 7,648 tokens to 99 tokens while preserving actionable information

The tool integrates with popular AI coding assistants like Claude Code, Codex, and OpenCode, making token optimization seamless for developers