Context Gateway Compresses Agent Context by 90% Before Reaching LLM

Y Combinator Winter 2026 startup Compresr has released Context Gateway, an open-source agentic proxy that compresses tool outputs before they enter an AI model's context window. The tool launched on Hacker News on March 13, 2026, and has accumulated 307 GitHub stars, addressing a critical problem in AI agent workflows: context bloat that degrades model accuracy and increases costs.

The Context Problem Agents Face

AI agents struggle with context management as single operations like file reads or grep commands can inject thousands of tokens into the context window, most of which are irrelevant noise. Long-context benchmarks reveal this problem's severity — OpenAI's GPT-5.4 evaluation shows accuracy dropping from 97.2% at 32,000 tokens to just 36.6% at 1 million tokens.

How Context Gateway Works

Context Gateway uses small language models (SLMs) to intelligently compress tool outputs based on the agent's intent. The system examines model internals and trains classifiers to identify which context portions carry the most signal. When an agent calls grep searching for error handling patterns, the SLM retains relevant matches while stripping unnecessary content.

The proxy includes several key features:

Intent-conditioned compression that preserves contextually relevant information
An expand() function that retrieves original outputs if the model needs removed content
Background compaction triggered at 85% window capacity
Lazy-loading of tool descriptions, showing only relevant tools for current tasks

Technical Approach and Performance

The system employs strided block-parallel sampling to generate multiple rollouts from nested prefixes concurrently. It batches feature extraction over these rollouts and uses resulting embeddings for on-policy policy-gradient updates. The team connects their method theoretically to KL-regularized feature-matching and energy-based modeling.

Across question-answering coding, unstructured coding, and translation tasks, Context Gateway's energy-based fine-tuning approach matches RLVR performance and outperforms standard supervised fine-tuning on downstream accuracy while achieving lower validation cross-entropy.

Integration and Adoption

Context Gateway supports integration with Claude Code, Cursor IDE, OpenClaw, and custom configurations. Installation requires a single command: curl -fsSL https://compresr.ai/api/install | sh. The tool includes spending caps, a dashboard for tracking sessions, and Slack notifications when agents await user input.

The project has attracted an active Discord community and shows 52 commits across 12 releases with 7 contributors. The repository includes 29 forks, indicating developer interest in adapting the technology for specific use cases.

Key Takeaways

Context Gateway uses small language models to compress agent tool outputs before they reach the main LLM, reducing context bloat by up to 90%
OpenAI's GPT-5.4 accuracy drops from 97.2% to 36.6% as context grows from 32k to 1M tokens, demonstrating the severity of the context management problem
The system performs intent-conditioned compression, preserving only context relevant to the agent's specific tool call purpose
Context Gateway integrates with Claude Code, Cursor IDE, and OpenClaw, with 307 GitHub stars since its March 2026 launch
The tool includes background compaction at 85% window capacity and an expand() function to retrieve original outputs when needed

The Context Problem Agents Face

How Context Gateway Works

The proxy includes several key features:

Intent-conditioned compression that preserves contextually relevant information

An expand() function that retrieves original outputs if the model needs removed content

Background compaction triggered at 85% window capacity

Lazy-loading of tool descriptions, showing only relevant tools for current tasks

Technical Approach and Performance

Integration and Adoption

Key Takeaways

Context Gateway uses small language models to compress agent tool outputs before they reach the main LLM, reducing context bloat by up to 90%

OpenAI's GPT-5.4 accuracy drops from 97.2% to 36.6% as context grows from 32k to 1M tokens, demonstrating the severity of the context management problem

The system performs intent-conditioned compression, preserving only context relevant to the agent's specific tool call purpose

Context Gateway integrates with Claude Code, Cursor IDE, and OpenClaw, with 307 GitHub stars since its March 2026 launch

The tool includes background compaction at 85% window capacity and an expand() function to retrieve original outputs when needed