Testing Your Rulesets

This guide covers how to validate, dry-run, unit test, and regression test your Edictum rulesets.

CLI Validation

Run edictum validate to catch schema, syntax, and semantic errors before deployment:

$ edictum validate rules.yaml

  rules.yaml — 5 rules (2 post, 2 pre, 1 session)

Validation checks include:

YAML parse errors
Missing required fields (apiVersion, kind, metadata.name, defaults.mode)
Invalid regex patterns in matches / matches_any
Duplicate rule IDs within a ruleset
Invalid action for rule type (preconditions allow block or ask; postconditions allow warn, redact, or block)
Use of output.text in a precondition

CLI Rule Check

Use edictum check to simulate a tool call against your rulesets without executing anything:

$ edictum check rules.yaml \
    --tool read_file \
    --args '{"path": ".env"}' \
    --principal-role analyst

✗ BLOCKED by block-secret-reads — Analysts cannot read '.env'. Ask an admin for help.

Verify allowed calls:

$ edictum check rules.yaml \
    --tool read_file \
    --args '{"path": "readme.txt"}' \
    --principal-role analyst

✓ ALLOWED (1 rules evaluated)

This is useful for quick spot-checks during development. For batch testing, use edictum test.

Batch Testing With YAML Test Cases

Use edictum test to run a suite of test cases against your rulesets. Define expected outcomes in a YAML file and let the CLI verify them all at once:

# tests/ruleset-cases.yaml
cases:
  - id: block-env-file
    tool: read_file
    args:
      path: "/app/.env"
    principal:
      role: analyst
    expect: block
    match_contract: block-sensitive-reads

  - id: allow-readme
    tool: read_file
    args:
      path: "README.md"
    principal:
      role: analyst
    expect: allow

  - id: block-deploy-without-ticket
    tool: deploy_service
    args:
      service: api
      env: production
    principal:
      role: sre
    expect: block
    match_contract: require-ticket

  - id: allow-deploy-with-ticket
    tool: deploy_service
    args:
      service: api
      env: production
    principal:
      role: sre
      ticket_ref: JIRA-456
    expect: allow

  - id: platform-team-access
    tool: deploy_service
    args:
      env: production
    principal:
      role: developer
      claims:
        department: platform
        clearance: high
    expect: allow

Run it:

$ edictum test rules.yaml --cases tests/rule-cases.yaml

  block-env-file: read_file -> BLOCK (block-sensitive-reads) +
  allow-readme: read_file -> ALLOW +
  block-deploy-without-ticket: deploy_service -> BLOCK (require-ticket) +
  allow-deploy-with-ticket: deploy_service -> ALLOW +
  platform-team-access: deploy_service -> ALLOW +

5/5 passed, 0 failed

Key features:

expect -- allow or block. The test passes if the precondition decision matches.
match_contract -- optional. When set, verifies that the specific rule ID triggered the block. Catches cases where the right decision happens for the wrong reason.
principal -- supports role, user_id, ticket_ref, and claims (arbitrary key-value pairs). Omit to test without principal context.

Preconditions only

--cases evaluates preconditions only. For postcondition testing, use --calls (see below) or pytest with guard.evaluate().

This is the recommended approach for ruleset regression testing in CI. Keep your test cases file alongside your rulesets and run edictum test on every PR.

Evaluating Tool Calls With `--calls`

When you need to test postconditions or want a quick evaluation without defining expected decisions, use --calls with a JSON file:

[
  {"tool": "read_file", "args": {"path": "README.md"}},
  {"tool": "read_file", "args": {"path": "/app/.env"}},
  {"tool": "read_file", "args": {"path": "data.txt"}, "output": "SSN: 123-45-6789"}
]

Run it:

$ edictum test rules.yaml --calls tests/calls.json

  #   Tool         Decision Contracts  Details
  1   read_file    ALLOW    1
  2   read_file    BLOCK    1          Sensitive file '/app/.env' blocked.
  3   read_file    WARN     1          PII detected.

Key differences from --cases:

Postconditions supported -- include an output field to trigger postcondition evaluation.
Exhaustive evaluation -- all matching rules run, no short-circuit on first block.
No expected decisions -- results report what happened, not pass/fail against expectations.
JSON output -- add --json for machine-readable output in CI pipelines.

See the CLI reference for the full format.

Unit Testing With pytest

For programmatic testing, use guard.evaluate() for dry-run checks or guard.run() to test with actual tool execution.

Dry-run with `evaluate()`

evaluate() checks a tool call against all matching rulesets without executing the tool. It evaluates exhaustively (all matching rulesets, no short-circuit) and returns an EvaluationResult:

from edictum import Edictum, Principal

guard = Edictum.from_yaml("rules.yaml")

# Test a precondition block
result = guard.evaluate("read_file", {"path": ".env"})
assert result.decision == "block"
assert "block-dotenv" in result.rules[0].rule_id

# Test an allowed call
result = guard.evaluate("read_file", {"path": "readme.txt"})
assert result.decision == "allow"

# Test a postcondition warning (pass output to trigger postconditions)
result = guard.evaluate("read_file", {"path": "data.txt"}, output="SSN: 123-45-6789")
assert result.decision == "warn"
assert len(result.warn_reasons) > 0

# Test with principal context
result = guard.evaluate(
    "deploy_service",
    {"service": "api"},
    principal=Principal(role="sre", ticket_ref="JIRA-123"),
)
assert result.decision == "allow"

evaluate() is sync and does not require asyncio. The EvaluationResult contains:

Field	Type	Description
`decision`	`str`	`"allow"`, `"block"`, or `"warn"`
`tool_name`	`str`	The tool name evaluated
`rules`	`list[RuleResult]`	Per-rule results with `rule_id`, `passed`, `message`, `tags`, `observed`, `policy_error`
`block_reasons`	`list[str]`	Messages from failed preconditions
`warn_reasons`	`list[str]`	Messages from failed postconditions
`rules_evaluated`	`int`	Total number of rules checked
`policy_error`	`bool`	`True` if any rule had an evaluation error

For batch evaluation, use evaluate_batch():

results = guard.evaluate_batch([
    {"tool": "read_file", "args": {"path": ".env"}},
    {"tool": "read_file", "args": {"path": "readme.txt"}},
])
assert results[0].decision == "block"
assert results[1].decision == "allow"

Full execution with `run()`

Use guard.run() when you need to test the complete pipeline including tool execution, session tracking, and audit:

import asyncio
import pytest
from edictum import Edictum, EdictumDenied

@pytest.fixture
def guard():
    return Edictum.from_yaml("rules.yaml")

def test_sensitive_read_denied(guard):
    async def read_file(path):
        return f"contents of {path}"

    with pytest.raises(EdictumDenied):
        asyncio.run(guard.run("read_file", {"path": ".env"}, read_file))

def test_normal_read_allowed(guard):
    async def read_file(path):
        return f"contents of {path}"

    result = asyncio.run(guard.run("read_file", {"path": "readme.txt"}, read_file))
    assert "contents" in result

Test patterns to cover:

Blocked calls -- assert that EdictumDenied is raised for calls that should be blocked.
Allowed calls -- assert that the tool result is returned for calls that should pass.
Edge cases -- test boundary values, missing principal fields, wildcard tool targets.
Session limits -- call guard.run() in a loop to verify session-level limits fire at the correct count.

When to use `evaluate()` vs `run()`

Use evaluate() for rule logic testing -- it's sync, fast, and doesn't need mock tool functions. Use run() when you need to test the full pipeline including session state, hooks, and audit.

Integration Testing With Observe Mode

Test rulesets in a running system without blocking real tool calls. Deploy with mode: observe and collect audit events:

from edictum import Edictum, Principal
from edictum.audit import FileAuditSink, RedactionPolicy

redaction = RedactionPolicy()
sink = FileAuditSink("test-audit.jsonl", redaction=redaction)

guard = Edictum.from_yaml("rules.yaml", audit_sink=sink, redaction=redaction)
# defaults.mode should be "observe" in the YAML

After running your agent through a test scenario, inspect test-audit.jsonl for:

CALL_WOULD_DENY events -- these are calls that would be blocked in enforce mode.
Absence of false positives -- legitimate calls should not produce would-block events.

Regression Testing

Save audit logs from a known-good run and compare against updated rulesets using edictum replay:

$ edictum replay rulesets/v2.yaml --audit-log audit/baseline.jsonl

Replayed 340 events, 0 would change

If the replay shows changes, investigate before deploying:

$ edictum replay rulesets/v2.yaml --audit-log audit/baseline.jsonl

Replayed 340 events, 2 would change

Changed verdicts:
  read_file: call_allowed -> blocked
    Rule: block-config-reads
  bash: call_allowed -> blocked
    Rule: block-destructive-commands

Incorporate replay into your CI pipeline to catch unintended rule regressions:

# GitHub Actions example
- name: Validate rulesets
  run: edictum validate rulesets/production.yaml

- name: Replay baseline audit log
  run: |
    edictum replay rulesets/production.yaml \
      --audit-log tests/audit-baseline.jsonl

Testing Checklist

Validate -- edictum validate passes with zero errors.
Dry-run -- edictum check produces expected block/allow for key scenarios.
Batch test (cases) -- edictum test --cases passes all YAML test cases with correct verdicts and rule matches.
Batch test (calls) -- edictum test --calls evaluates representative tool calls including postconditions.
Unit tests -- pytest tests with guard.evaluate() cover preconditions, postconditions, and edge cases. Use guard.run() for session limit tests.
Observe mode -- deploy in observe mode and review CALL_WOULD_DENY events.
Replay -- edictum replay against a baseline audit log shows no regressions.
Enforce -- flip to mode: enforce after all checks pass.

On this page