Ben Cochran, a Distinguished Engineer with over 20 years at NVIDIA and AMD, has released Statewright — a state machine guardrails system that reduced AI agent failure rates from 80% to 0% on SWE-bench tasks. The tool enforces deterministic constraints through protocol-level controls rather than prompt engineering.
Protocol-Level Enforcement Outperforms Prompt-Based Constraints
Statewright uses a Rust engine that evaluates state machine definitions including states, transitions, guards, and tool restrictions. The orchestration layer enforces these constraints without using an LLM — it simply implements the state machine logic. A plugin layer integrates with Claude Code via MCP, with Codex and Cursor support planned.
When users activate a workflow, hooks automatically enforce guardrails per state. Models see only 5 available tools instead of dozens, receive clear instructions for the current phase, and transition between states only when conditions are met. Guardrails include per-state tool visibility control, bash command restrictions blocking redirects and destructive operations, edit size limits, file-per-state caps, command allow-lists with prefix matching, conditional transitions using programmatic predicates, approval gates requiring human review, environment variable scoping, and session isolation.
The system offers two enforcement levels. "Hard" enforcement blocks tool calls at the protocol layer before models see them, working with Claude Code, Codex, opencode, and Pi. "Advisory" enforcement injects rules into context for tools like Cursor.
Models From 13B to Frontier Class Show Consistent Improvements
On a 5-task SWE-bench subset, two models — gemma4:31b (19.9GB) and gpt-oss:20b (13.8GB) — improved from 2/10 to 10/10 success rate with Statewright constraints. Results remained consistent across model families (qwen-coder, gpt-oss, gemma4) above the 13B parameter threshold. Below that size, models can navigate state machines but lack sufficient context retention for accurate edits.
Frontier models also benefited. Haiku and Sonnet performed above their typical capabilities, while Opus solved tasks more reliably with fewer tokens and reduced occurrence of failure loops. Fine-tuning did not produce comparable functional improvements.
As Cochran explained: "Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger?"
Visual Editor and MCP Integration Available Now
The core insight: context window utilization matters more than raw context size. A tightly scoped working context at each step outperforms models given unrestricted access to all tools. Constraining non-idempotent LLMs using deterministic code provides reliability improvements that prompt engineering cannot match.
A visual editor at statewright.ai enables workflow customization through a graph interface. Unlike DAGs, state machines support loops and retries — matching the actual requirements of agentic work. Example workflows include a planning state with read-only tools, an implementation state with scoped edit tools and write-friendly bash commands, and a testing state with bash restricted to testing commands only.
Statewright is available now with a free tier. Installation: /plugin marketplace add statewright/statewright
Key Takeaways
- Statewright improved agent success rates from 2/10 to 10/10 on SWE-bench tasks by enforcing state machine constraints at protocol level
- Built by Ben Cochran, Distinguished Engineer with 20+ years at NVIDIA and AMD, using a Rust engine with MCP integration
- Provides per-state tool restrictions, bash command filtering, edit size limits, and approval gates enforced before models see available actions
- Models above 13B parameters show consistent improvements across families; frontier models (Haiku, Sonnet, Opus) solve more reliably with fewer tokens
- Context window utilization through tight scoping outperforms giving models unrestricted access to all tools simultaneously