Defense Scope
What Edictum defends against, what it does not, and how it fits alongside OS-level sandboxing, network policies, and LLM safety layers.
Right page if: you need to understand Edictum's threat model -- what it defends against, what is out of scope, and where it fits in a defense-in-depth stack. Wrong page if: you need fail-closed behavior details (what happens when things break) -- see https://docs.edictum.ai/docs/security/fail-closed. For compliance framework mappings (OWASP Top 10 for LLMs, EU AI Act, SOC 2), see https://docs.edictum.ai/docs/security/compliance. Gotcha: Edictum cannot undo WRITE/IRREVERSIBLE side effects after execution -- use preconditions to block dangerous writes BEFORE they run. Sandbox path matching uses os.path.realpath() but is subject to TOCTOU race conditions with symlinks created between evaluation and execution.
Edictum enforces rulesets on AI agent tool calls. It sits between the agent's decision to act and the action itself -- a deterministic enforcement point that the agent cannot negotiate, argue with, or bypass. Rulesets are evaluated outside the LLM, on structured data (tool name, arguments, principal), not on natural language.
This page is honest about what that covers and what it does not.
What Edictum Defends Against
Edictum's enforcement model covers threats that manifest as tool calls -- the concrete actions an agent takes in the world.
| Threat | How Edictum handles it |
|---|---|
| Unauthorized tool execution | Preconditions block tool calls that fail rule checks before execution. The tool never runs. |
| Data exfiltration via output | Postconditions with action: redact strip sensitive patterns (SSNs, API keys, credentials) from tool results before they reach the agent. |
| Privilege escalation | Principal-based rulesets enforce role-level permissions on every tool call. An intern principal cannot run deploy even if the agent tries. |
| Unauthorized sub-agent spawning | Rulesets can restrict which tools are allowed to create sub-agents and under what conditions. |
| Secret leakage | The built-in deny_sensitive_reads precondition blocks reads of .env, .ssh/, .aws/credentials, key files, and similar paths. Postconditions catch secrets that appear in tool output. |
| Rate abuse | Session rulesets cap per-tool, per-session, and per-attempt counts. An agent stuck in a loop hits the attempt cap and is blocked. |
| Rule tampering | Ruleset YAML is version-hashed. Edictum.reload() atomically swaps the active ruleset state; malformed YAML is rejected and the previous rules stay in effect. |
| Sensitive file access | Sandbox rulesets define allowlist boundaries for file paths. Anything outside within is blocked -- regardless of which command accesses it. |
These protections are deterministic. They do not depend on LLM behavior, prompt engineering, or model capabilities. A rule that blocks rm -rf / will block it whether the request comes from GPT-4, Claude, or a compromised prompt.
Out of Scope
Edictum operates at the tool-call layer. Threats that exist at other layers -- the network, the kernel, the LLM's text generation -- are outside its enforcement boundary.
Write side effects already completed
Postconditions run after the tool executes. For READ and PURE tools, postconditions can redact or block the output because the action is reversible (hiding a read result loses nothing). For WRITE and IRREVERSIBLE tools, the action has already happened by the time postconditions evaluate. Edictum falls back to warn because suppressing the result would only remove context the agent needs to understand what it did.
What to use instead: Preconditions and sandbox rulesets to block dangerous writes before execution. For writes that must be allowed but monitored, use postconditions with action: warn and route findings to your audit system.
Kernel-level sandboxing
Edictum is an in-process library. It evaluates rulesets in the same process as the agent. It does not enforce OS-level isolation -- it cannot prevent a tool from accessing memory, syscalls, or hardware resources that the process has access to.
What to use instead: gVisor, Firecracker, containers with seccomp profiles, or AppArmor/SELinux policies. These enforce boundaries at the kernel level where the process cannot escape.
Hallucinated text content
Edictum enforces rulesets on actions (tool calls with structured arguments), not on words (the LLM's text output). If an agent hallucinates incorrect information in a text response without calling a tool, Edictum has no enforcement point.
What to use instead: LLM output filters, retrieval-augmented generation (RAG) for factual grounding, or content moderation APIs that operate on the text generation layer.
Network-level attacks
Edictum does not inspect network traffic, enforce TLS, or block connections. If a tool makes an HTTP request, Edictum can check the domain via sandbox rulesets (allows.domains), but it cannot enforce network-level properties like encryption in transit, certificate pinning, or packet inspection.
What to use instead: Network policies (Kubernetes NetworkPolicy, cloud security groups), service meshes (Istio, Linkerd), or Web Application Firewalls (WAFs).
Prompt injection on text responses
Edictum enforces rulesets on tool-call execution, not on text that flows between the user and the LLM. If a prompt injection causes the agent to produce harmful text without calling a tool, Edictum does not intercept it. If the injection causes the agent to call a tool, Edictum evaluates that tool call against rulesets -- the injection's influence stops at the enforcement point.
What to use instead: Input sanitization, prompt engineering defenses, LLM-layer safety filters.
Known Technical Limitations
Beyond the architectural boundaries above, Edictum has specific technical limitations that users should be aware of.
String-based boundary matching
Sandbox rulesets match file paths and domains using string prefix comparison and fnmatch patterns. This is not semantic analysis. A path like /workspace/../etc/shadow is resolved via os.path.realpath() before comparison (so that traversal is caught), but the matching itself operates on strings, not on filesystem semantics.
Heuristic command parsing
The bash command classifier extracts the first whitespace-delimited token from a command string to identify the command name. This is heuristic, not AST-based. Complex shell constructs like VAR=val command, command substitution ($(cmd)), or chained commands (cmd1 && cmd2) may not be fully parsed. The first token is checked against allows.commands, but subsequent tokens in a chain are not individually validated.
The shell operator detection checks for ${, $(, |, ;, &&, ||, backticks, and other constructs, but does not detect bare $VAR expansions (without braces). A command like echo $AWS_SECRET_ACCESS_KEY classifies as READ because echo is in the read allowlist and $ without { or ( does not trigger operator detection. This means environment variable values can be exfiltrated through commands classified as safe for postcondition purposes. Use sandbox command allowlists (allows.commands) rather than relying on side-effect classification for security-critical enforcement.
Sandbox rulesets mitigate this by also checking file paths extracted from the full argument string against within/not_within boundaries -- even if the command token is not fully parsed, the path restrictions still apply.
TOCTOU race conditions
Sandbox rulesets resolve paths with os.path.realpath() at evaluation time. A symlink created after Edictum evaluates the path but before the tool actually executes could point to a different target. This race window is inherent to application-level enforcement. See sandbox rules: known limitations for the full list of resolution edge cases.
Defense in Depth
Edictum is one layer in a defense-in-depth stack. It covers the tool-call layer -- the enforcement point between agent decisions and real-world actions. Other layers cover other threats.
| Layer | Covers | Examples |
|---|---|---|
| LLM safety | Text generation, harmful content, jailbreaks | Model safety training, output filters, content moderation APIs |
| Edictum | Tool-call enforcement, rule evaluation, decision log | Preconditions, postconditions, sandbox rulesets, session limits |
| OS sandboxing | Process isolation, syscall filtering, filesystem namespaces | gVisor, Firecracker, seccomp, AppArmor |
| Network policies | Traffic filtering, domain restrictions, encryption | Kubernetes NetworkPolicy, security groups, service meshes |
| Authentication | Identity verification, credential management | OAuth, API key rotation, certificate-based auth |
Edictum accepts a Principal but does not authenticate it. Your application provides the principal -- Edictum enforces rulesets based on it. The authentication layer is upstream.
The strongest deployments use all of these together. Edictum catches the tool-call-level threats that OS sandboxing and network policies cannot see (because they operate below the application layer). OS sandboxing catches the kernel-level escapes that Edictum cannot enforce (because it is in-process). Neither replaces the other.
Next Steps
- Fail-closed guarantees -- what happens when things go wrong
- Sandbox rulesets -- allowlist boundaries for file paths, commands, and domains
- Adversarial testing -- testing rule bypasses
- Pipeline architecture -- the full enforcement pipeline
- Compliance -- OWASP Top 10 for LLMs, OWASP Top 10 for Agentic AI, EU AI Act, and SOC 2 mappings
Last updated on