Defense Scope

What Edictum defends against, what it does not, and how it fits alongside OS-level sandboxing, network policies, and LLM safety layers.

Edictum enforces rulesets on AI agent tool calls. It sits between the agent's decision to act and the action itself -- a deterministic enforcement point that the agent cannot negotiate, argue with, or bypass. Rulesets are evaluated outside the LLM, on structured data (tool name, arguments, principal), not on natural language.

This page is honest about what that covers and what it does not.

What Edictum Defends Against

Edictum's enforcement model covers threats that manifest as tool calls -- the concrete actions an agent takes in the world.

Threat	How Edictum handles it
Unauthorized tool execution	Preconditions block tool calls that fail rule checks before execution. The tool never runs.
Data exfiltration via output	Postconditions with `action: redact` strip sensitive patterns (SSNs, API keys, credentials) from tool results before they reach the agent.
Privilege escalation	Principal-based rulesets enforce role-level permissions on every tool call. An `intern` principal cannot run `deploy` even if the agent tries.
Unauthorized sub-agent spawning	Rulesets can restrict which tools are allowed to create sub-agents and under what conditions.
Secret leakage	The built-in `deny_sensitive_reads` precondition blocks reads of `.env`, `.ssh/`, `.aws/credentials`, key files, and similar paths. Postconditions catch secrets that appear in tool output.
Rate abuse	Session rulesets cap per-tool, per-session, and per-attempt counts. An agent stuck in a loop hits the attempt cap and is blocked.
Rule tampering	Ruleset YAML is version-hashed. `Edictum.reload()` atomically swaps the active ruleset state; malformed YAML is rejected and the previous rules stay in effect.
Sensitive file access	Sandbox rulesets define allowlist boundaries for file paths. Anything outside `within` is blocked -- regardless of which command accesses it.

These protections are deterministic. They do not depend on LLM behavior, prompt engineering, or model capabilities. A rule that blocks rm -rf / will block it whether the request comes from GPT-4, Claude, or a compromised prompt.

Out of Scope

Edictum operates at the tool-call layer. Threats that exist at other layers -- the network, the kernel, the LLM's text generation -- are outside its enforcement boundary.

Write side effects already completed

Postconditions run after the tool executes. For READ and PURE tools, postconditions can redact or block the output because the action is reversible (hiding a read result loses nothing). For WRITE and IRREVERSIBLE tools, the action has already happened by the time postconditions evaluate. Edictum falls back to warn because suppressing the result would only remove context the agent needs to understand what it did.

What to use instead: Preconditions and sandbox rulesets to block dangerous writes before execution. For writes that must be allowed but monitored, use postconditions with action: warn and route findings to your audit system.

Kernel-level sandboxing

Edictum is an in-process library. It evaluates rulesets in the same process as the agent. It does not enforce OS-level isolation -- it cannot prevent a tool from accessing memory, syscalls, or hardware resources that the process has access to.

What to use instead: gVisor, Firecracker, containers with seccomp profiles, or AppArmor/SELinux policies. These enforce boundaries at the kernel level where the process cannot escape.

Hallucinated text content

Edictum enforces rulesets on actions (tool calls with structured arguments), not on words (the LLM's text output). If an agent hallucinates incorrect information in a text response without calling a tool, Edictum has no enforcement point.

What to use instead: LLM output filters, retrieval-augmented generation (RAG) for factual grounding, or content moderation APIs that operate on the text generation layer.

Network-level attacks

Edictum does not inspect network traffic, enforce TLS, or block connections. If a tool makes an HTTP request, Edictum can check the domain via sandbox rulesets (allows.domains), but it cannot enforce network-level properties like encryption in transit, certificate pinning, or packet inspection.

What to use instead: Network policies (Kubernetes NetworkPolicy, cloud security groups), service meshes (Istio, Linkerd), or Web Application Firewalls (WAFs).

Prompt injection on text responses

Edictum enforces rulesets on tool-call execution, not on text that flows between the user and the LLM. If a prompt injection causes the agent to produce harmful text without calling a tool, Edictum does not intercept it. If the injection causes the agent to call a tool, Edictum evaluates that tool call against rulesets -- the injection's influence stops at the enforcement point.

What to use instead: Input sanitization, prompt engineering defenses, LLM-layer safety filters.

Known Technical Limitations

Beyond the architectural boundaries above, Edictum has specific technical limitations that users should be aware of.

String-based boundary matching

Sandbox rulesets match file paths and domains using string prefix comparison and fnmatch patterns. This is not semantic analysis. A path like /workspace/../etc/shadow is resolved via os.path.realpath() before comparison (so that traversal is caught), but the matching itself operates on strings, not on filesystem semantics.

Heuristic command parsing

The bash command classifier extracts the first whitespace-delimited token from a command string to identify the command name. This is heuristic, not AST-based. Complex shell constructs like VAR=val command, command substitution ($(cmd)), or chained commands (cmd1 && cmd2) may not be fully parsed. The first token is checked against allows.commands, but subsequent tokens in a chain are not individually validated.

The shell operator detection checks for ${, $(, |, ;, &&, ||, backticks, and other constructs, but does not detect bare $VAR expansions (without braces). A command like echo $AWS_SECRET_ACCESS_KEY classifies as READ because echo is in the read allowlist and $ without { or ( does not trigger operator detection. This means environment variable values can be exfiltrated through commands classified as safe for postcondition purposes. Use sandbox command allowlists (allows.commands) rather than relying on side-effect classification for security-critical enforcement.

Sandbox rulesets mitigate this by also checking file paths extracted from the full argument string against within/not_within boundaries -- even if the command token is not fully parsed, the path restrictions still apply.

TOCTOU race conditions

Sandbox rulesets resolve paths with os.path.realpath() at evaluation time. A symlink created after Edictum evaluates the path but before the tool actually executes could point to a different target. This race window is inherent to application-level enforcement. See sandbox rules: known limitations for the full list of resolution edge cases.

Defense in Depth

Edictum is one layer in a defense-in-depth stack. It covers the tool-call layer -- the enforcement point between agent decisions and real-world actions. Other layers cover other threats.

Layer	Covers	Examples
LLM safety	Text generation, harmful content, jailbreaks	Model safety training, output filters, content moderation APIs
Edictum	Tool-call enforcement, rule evaluation, decision log	Preconditions, postconditions, sandbox rulesets, session limits
OS sandboxing	Process isolation, syscall filtering, filesystem namespaces	gVisor, Firecracker, seccomp, AppArmor
Network policies	Traffic filtering, domain restrictions, encryption	Kubernetes NetworkPolicy, security groups, service meshes
Authentication	Identity verification, credential management	OAuth, API key rotation, certificate-based auth

Edictum accepts a Principal but does not authenticate it. Your application provides the principal -- Edictum enforces rulesets based on it. The authentication layer is upstream.

The strongest deployments use all of these together. Edictum catches the tool-call-level threats that OS sandboxing and network policies cannot see (because they operate below the application layer). OS sandboxing catches the kernel-level escapes that Edictum cannot enforce (because it is in-process). Neither replaces the other.

Next Steps

Fail-closed guarantees -- what happens when things go wrong
Sandbox rulesets -- allowlist boundaries for file paths, commands, and domains
Adversarial testing -- testing rule bypasses
Pipeline architecture -- the full enforcement pipeline
Compliance -- OWASP Top 10 for LLMs, OWASP Top 10 for Agentic AI, EU AI Act, and SOC 2 mappings

On this page