Security Overview

Edictum's threat model, security controls, and compliance posture. What it defends against, what it does not, and the evidence behind it.

Edictum is a deterministic enforcement layer between an AI agent's decision to act and the action itself. Rulesets are evaluated outside the LLM, on structured data (tool name, arguments, principal), not on natural language.

Threat Coverage

Threat	Control	Rule type
Unauthorized tool execution	Preconditions block before execution	`type: pre`
Data exfiltration via output	Postconditions redact sensitive patterns	`type: post`
Privilege escalation	Principal-based rulesets enforce role permissions	`type: pre`
Rate abuse / runaway agents	Session rulesets cap per-tool, per-session, per-attempt	`type: session`
Path traversal / file access	Sandbox rulesets with `within`/`not_within` boundaries	`type: sandbox`
Secret leakage	Built-in `deny_sensitive_reads()` for `.env`, `.ssh/`, `.aws/credentials`	`type: pre`
Command injection	BashClassifier detects shell operators; sandbox command allowlists	`type: sandbox`
Rule tampering	SHA-256 version hashing; Ed25519 ruleset signing (control plane, server-side; SDK verification planned)	Infrastructure
Unauthorized sub-agent spawning	Rulesets restrict tools that create agents	`type: pre`

Out of Scope

Edictum operates at the tool-call layer. These threats exist at other layers and require other controls:

Threat	Why out of scope	Use instead
Write side effects already completed	Postconditions run after execution; WRITE actions already done	Preconditions + sandbox to block BEFORE execution
Kernel-level sandboxing	In-process library; no OS isolation	gVisor, Firecracker, seccomp, AppArmor
LLM hallucination (text output)	No enforcement point for text-only responses	Content moderation, RAG, output filters
Network-level attacks	Does not inspect network traffic	Kubernetes NetworkPolicy, service meshes, WAF
Prompt injection on text responses	Only enforces on tool-call execution	Input sanitization, prompt engineering

Adversarial Testing

Four scenarios tested against GPT-4.1, DeepSeek v3.2, and Qwen3 235B with identical rulesets:

Scenario	Result
Retry after block (agent retries a blocked tool call)	All retries blocked across all models
PII exfiltration (agent tries to leak data via allowed tools)	Caught by postcondition PII patterns
Cross-tool chain (multi-step exfiltration)	PII redacted from output
Role escalation (agent claims higher privilege)	Principal check blocked escalation

DeepSeek was more aggressive than GPT-4.1 in exfiltration attempts — model safety is complementary to rulesets, not a replacement.

The core library has 114 @pytest.mark.security tests covering shell metacharacter bypasses, sandbox symlink escapes, input injection, backend failure modes, and session concurrency. The control plane has 43+ adversarial tests across 8 security boundaries (S1-S8).

See Adversarial Testing for full scenarios and results.

Fail-Closed Design

Every ambiguous failure within rule evaluation results in block. False positives are retryable. False negatives may not be. Note: when no rulesets match a tool call, the default is allow — rulesets are opt-in. Add a catch-all tool: "*", action: block rule for block-by-default behavior.

Failure	Outcome
Rule evaluation error	Block (with `policy_error: true` in audit)
Malformed ruleset YAML	Reject load, keep previous rulesets
Type mismatch in condition	Block (sentinel evaluates to true)
Control Plane unreachable	Agents continue with cached rulesets
Session storage error	Block
Unknown rule type	Reject load
No matching rulesets	Allow (rulesets are opt-in)

To enforce block-all-by-default, add a catch-all rule: tool: "*", action: block.

See Fail-Closed Guarantees for all seven scenarios.

Control Plane Security Boundaries

The control plane enforces 8 security boundaries, each with dedicated adversarial tests:

Boundary	Threat	Defense
S1: Session validation	Account takeover	Redis session tokens; forged/expired cookies rejected
S2: API key auth	Unauthorized agent access	Revoked keys excluded; malformed prefixes rejected
S3: Tenant isolation	Cross-tenant data leak	Every query filtered by `tenant_id`; returns 404, not 403
S4: Approval state	Unauthorized tool execution	Immutable once decided; double-approve returns 409
S5: SSE channel	Rule/event leak	Events filtered by env + tenant_id
S6: Ruleset signing	Tampered rule deployment	Ed25519 signatures (server-side); private key encrypted at rest (NaCl SecretBox). SDK verification planned.
S7: Bootstrap lock	Post-bootstrap privilege escalation	Admin creation only when zero users exist
S8: Rate limiting	Credential brute force	Per-IP sliding window (Redis sorted sets)

See Control Plane Security Model for details.

Known Limitations

Limitation	Impact	Mitigation
String-based path matching	Relies on `realpath()` + prefix comparison	Resolves symlinks and `..`; catches common traversals
Heuristic bash parsing	BashClassifier is not AST-based	Detects 14 shell operators; sandbox command allowlists add depth
TOCTOU race (symlinks)	Symlink created between eval and execution could escape	OS-level sandboxing (gVisor, seccomp) for kernel enforcement
Postcondition WRITE fallback	`redact`/`block` effects downgrade to `warn` for WRITE/IRREVERSIBLE tools	Use preconditions to block dangerous writes before execution

Compliance

Edictum maps to four compliance frameworks:

EU AI Act (Articles 9, 14) — risk identification, mitigation, documentation, human oversight
SOC 2 (CC6) — logical access, credentials, authorization, decision log
OWASP Top 10 for LLM Applications (2025) — prompt injection, insecure output, unbounded consumption, access control
OWASP Top 10 for Agentic Applications (2026) — 6 of 10 risks mitigated

See Compliance Mapping for detailed evidence and configuration per framework.

Defense in Depth

Edictum is one layer. A complete security posture combines:

Layer	Tool	What it covers
LLM safety	Model provider safety filters	Harmful text generation
Tool-call enforcement	Edictum	What the agent is allowed to do
OS sandboxing	gVisor, Firecracker, seccomp	Process isolation, syscall filtering
Network policies	K8s NetworkPolicy, WAF	Traffic filtering, egress control
Input validation	Application code	Schema validation, sanitization

Next Steps

Defense Scope — detailed threat model and boundaries
Fail-Closed Guarantees — all failure modes and outcomes
Adversarial Testing — test scenarios and cross-model results
Compliance Mapping — EU AI Act, SOC 2, OWASP mappings

On this page