Hook-Based Safety Guard Rails for Autonomous Code Agents UPDATED
Problem
Autonomous code agents running unattended can execute destructive commands (rm -rf, git reset --hard), exhaust their context window without saving state, leak secrets via git push, or silently produce syntax errors that cascade into later failures.
These failures share a pattern: the agent took an action that a simple automated check would have caught. The agent itself can't reliably self-police because it operates within the same context that produces the mistakes.
Solution
Use the agent framework's hook system (PreToolUse / PostToolUse events) to inject safety checks that run outside the agent's reasoning loop. Each hook is a small shell script that inspects tool inputs or outputs and either warns, blocks, or logs.
Four guard rails that cover the most common failure modes:
-
Dangerous command blocker (PreToolUse: Bash) — Pattern-matches commands for
rm -rf,git reset --hard,DROP TABLE, etc. Blocks or warns before execution. -
Syntax checker (PostToolUse: Edit/Write) — After every file edit, runs the appropriate linter (
python -m py_compile,bash -n,jq empty). Catches errors immediately instead of 50 tool calls later. -
Context window monitor (PostToolUse: all) — Counts tool calls as a proxy for context consumption. Issues graduated warnings (soft → hard → critical) and auto-generates a checkpoint file when context is dangerously low.
-
Autonomous decision enforcer (PreToolUse: AskUserQuestion) — Blocks the agent from asking "should I continue?" during unattended sessions. Forces the agent to decide and log uncertainty instead.
# Example: dangerous command blocker (simplified)
#!/bin/bash
INPUT="$(cat)"
CMD="$(echo "$INPUT" | jq -r '.tool_input.command // empty')"
if echo "$CMD" | grep -qE 'rm\s+-rf|git\s+reset\s+--hard|git\s+clean\s+-fd'; then
echo "BLOCKED: Destructive command detected: $(echo "$CMD" | head -c 100)"
exit 2 # non-zero = block the tool call
fi
exit 0 # 0 = allow
How to use it
- Register hooks in the agent's settings file (e.g.,
settings.jsonfor Claude Code). - Each hook is a standalone shell script — no dependencies on the agent's internal state.
- PreToolUse hooks receive the tool name and input as JSON on stdin. Return exit code 2 to block.
- PostToolUse hooks receive the tool name, input, and output. Return exit code 0 (hooks can only warn, not block after execution).
- Start with the dangerous command blocker (highest impact, lowest effort).
Trade-offs
- Pros:
- Runs outside the agent's context — immune to prompt injection or reasoning failures
- Language-agnostic (shell scripts work with any agent framework that supports hooks)
- Each hook is independently deployable — add one at a time
-
Zero performance overhead for the agent (hooks run in milliseconds)
-
Cons/Considerations:
- Pattern matching is inherently incomplete — creative destructive commands may slip through
- Context monitoring via tool-call counting is a proxy, not exact measurement
- Hooks can't prevent the agent from reasoning about dangerous actions, only from executing them
- Requires the agent framework to support PreToolUse/PostToolUse events
References
- Claude Code Hooks documentation — Official hook system for Claude Code
- claude-code-ops-starter — Open-source implementation of these 4 hooks with a risk-score diagnostic
- Replit AI deletes production database — Real-world example of why guard rails matter