The Autonomous Request Filter intercepts every message in both directions (inbound from the runner, outbound from the model) and gives you precise control over what goes through, what gets modified, and what gets blocked. No code changes. No agent awareness. The proxy does its job.
Inbound steering modifies the prompt before it reaches the model. Inject context, prepend constraints, append project-specific coding standards, or strip out content you don't want the model to act on.
This is useful for teams. Define a system prompt injection in your ARF policy that adds your team's coding conventions to every request, so no developer has to remember to include them. The agent sees it; the developer doesn't type it.
Inbound steering rules are evaluated in order. Each rule can match on message content, session context, user identity, or time-based conditions. Rules can modify, augment, or reject messages.
Outbound filtering evaluates model completions before they reach the runner. ARF reads the completion stream in real time. If a completion violates policy (a disallowed code pattern, a reference to a forbidden path, suspicious tool call arguments), the stream is interrupted.
Interrupted completions are logged, the session health grade is decremented, and, depending on policy, a human approval prompt is surfaced. The runner sees a clean error response and can retry with a different approach.
This is your last line of defense before the agent acts. The Autonomous Request Filter sees every tool call the model proposes before it executes.
── Completion stream from engine ────────────── chunk[1]: I'll update the config file... chunk[2]: tool_use: bash args: cmd: "rm -rf /etc/nginx/conf.d/*" ── Filter evaluation ────────────────────────── ✗ MATCH: outbound.deny_pattern rule: block-destructive-ops pattern: rm -rf.*/(etc|var|usr|bin) action: BLOCK + INTERRUPT ── Response to runner ───────────────────────── HTTP 451 Unavailable For Policy Reasons { "error": "completion_blocked", "rule": "block-destructive-ops" } ● Session grade: B → C (policy violation logged)
ARF monitors inbound content (files the agent reads, tool call results, web page contents returned to the agent) for injection signatures. Common patterns: instruction overrides ("Ignore previous instructions"), role jailbreaks, and credential exfiltration attempts hidden in data.
When injection is detected, ARF's options range from logging-only to full session halt. Configure the response per rule: sanitize the injected content before it reaches the model, flag for human review, or block the request and alert immediately.