Researchers have developed ClawGuard, a runtime security framework that protects tool-augmented LLM agents against indirect prompt injection attacks through deterministic rule enforcement at tool-call boundaries. The system, created by Wei Zhao, Zhe Li, Peixin Zhang, and Jun Sun, achieves robust protection across five state-of-the-art language models without requiring safety-specific fine-tuning or architectural modifications.
Framework Enforces User-Confirmed Rules at Every Tool-Call Boundary
ClawGuard addresses a critical vulnerability in tool-augmented LLM agents where adversaries embed malicious instructions within tool-returned content, which agents incorporate into their conversation history as trusted observations. The framework automatically derives task-specific access constraints from the user's stated objective before any external tool invocation, blocking three attack channels: web and local content injection, MCP server injection, and skill file injection.
Unlike alignment-based defenses that can be bypassed, ClawGuard provides deterministic, auditable protection by intercepting adversarial tool calls before any real-world effect is produced. The researchers state this transforms unreliable alignment-dependent defense into a deterministic, auditable mechanism.
Testing Spans Three Benchmarks Across Multiple Models
The research team evaluated ClawGuard across comprehensive testing scenarios:
- Five state-of-the-art language models tested
- Three benchmarks: AgentDojo, SkillInject, and MCPSecBench
- Robust protection achieved without compromising agent utility
- No requirement for model modification or infrastructure changes
The deterministic approach contrasts sharply with alignment-based defenses. By catching malicious tool calls at execution time rather than relying on the model's safety training, ClawGuard establishes a security boundary that cannot be circumvented through clever prompting.
Research Contributes to Emerging Agent Security Focus
The publication coincides with increased attention to tool-using agent security, evidenced by similar naming conventions in contemporaneous research. The work establishes deterministic tool-call boundary enforcement as an effective defense mechanism for secure agentic AI systems. The researchers have made their code publicly available on GitHub, enabling practitioners to implement the framework in production systems.
Key Takeaways
- ClawGuard enforces deterministic rules at tool-call boundaries to block indirect prompt injection attacks
- The framework automatically derives task-specific access constraints from user objectives before tool invocation
- Testing across five state-of-the-art models and three benchmarks shows robust protection without compromising utility
- Unlike alignment-based defenses, ClawGuard provides auditable protection that cannot be bypassed through prompting
- The system requires no model fine-tuning or architectural changes to implement