ClawGuard Blocks Prompt Injection in Tool-Using AI Agents With Deterministic Rules

Researchers have developed ClawGuard, a runtime security framework that protects tool-augmented LLM agents against indirect prompt injection attacks through deterministic rule enforcement at tool-call boundaries. The system, created by Wei Zhao, Zhe Li, Peixin Zhang, and Jun Sun, achieves robust protection across five state-of-the-art language models without requiring safety-specific fine-tuning or architectural modifications.

Framework Enforces User-Confirmed Rules at Every Tool-Call Boundary

ClawGuard addresses a critical vulnerability in tool-augmented LLM agents where adversaries embed malicious instructions within tool-returned content, which agents incorporate into their conversation history as trusted observations. The framework automatically derives task-specific access constraints from the user's stated objective before any external tool invocation, blocking three attack channels: web and local content injection, MCP server injection, and skill file injection.

Unlike alignment-based defenses that can be bypassed, ClawGuard provides deterministic, auditable protection by intercepting adversarial tool calls before any real-world effect is produced. The researchers state this transforms unreliable alignment-dependent defense into a deterministic, auditable mechanism.

Testing Spans Three Benchmarks Across Multiple Models

The research team evaluated ClawGuard across comprehensive testing scenarios:

Five state-of-the-art language models tested
Three benchmarks: AgentDojo, SkillInject, and MCPSecBench
Robust protection achieved without compromising agent utility
No requirement for model modification or infrastructure changes

The deterministic approach contrasts sharply with alignment-based defenses. By catching malicious tool calls at execution time rather than relying on the model's safety training, ClawGuard establishes a security boundary that cannot be circumvented through clever prompting.

Research Contributes to Emerging Agent Security Focus

The publication coincides with increased attention to tool-using agent security, evidenced by similar naming conventions in contemporaneous research. The work establishes deterministic tool-call boundary enforcement as an effective defense mechanism for secure agentic AI systems. The researchers have made their code publicly available on GitHub, enabling practitioners to implement the framework in production systems.

Key Takeaways

ClawGuard enforces deterministic rules at tool-call boundaries to block indirect prompt injection attacks
The framework automatically derives task-specific access constraints from user objectives before tool invocation
Testing across five state-of-the-art models and three benchmarks shows robust protection without compromising utility
Unlike alignment-based defenses, ClawGuard provides auditable protection that cannot be bypassed through prompting
The system requires no model fine-tuning or architectural changes to implement

Framework Enforces User-Confirmed Rules at Every Tool-Call Boundary

Testing Spans Three Benchmarks Across Multiple Models

The research team evaluated ClawGuard across comprehensive testing scenarios:

Five state-of-the-art language models tested

Three benchmarks: AgentDojo, SkillInject, and MCPSecBench

Robust protection achieved without compromising agent utility

No requirement for model modification or infrastructure changes

Research Contributes to Emerging Agent Security Focus

Key Takeaways

ClawGuard enforces deterministic rules at tool-call boundaries to block indirect prompt injection attacks

The framework automatically derives task-specific access constraints from user objectives before tool invocation

Testing across five state-of-the-art models and three benchmarks shows robust protection without compromising utility

Unlike alignment-based defenses, ClawGuard provides auditable protection that cannot be bypassed through prompting

The system requires no model fine-tuning or architectural changes to implement