Edictum
Reference

Dry-Run Evaluation

You need to test whether a tool call would be allowed or denied without actually executing it.

AI Assistance

Right page if: you need to test whether a tool call would be allowed or denied without executing it -- `evaluate()` is synchronous, produces no audit events, and checks all contracts exhaustively. Wrong page if: you need the full runtime pipeline with session state and audit -- use `run()`. For command-line testing, see https://docs.edictum.ai/docs/reference/cli. Gotcha: `evaluate()` skips session contracts (no session context in dry-run) but does evaluate sandbox contracts. Postconditions are only checked when you pass the `output` parameter.

You need to test whether a tool call would be allowed or denied without actually executing it. The evaluate() and evaluate_batch() methods on the Edictum class check a tool call against all matching contracts and return a detailed result -- no tool execution, no session state changes, no audit events.


Quick Example

from edictum import Edictum

guard = Edictum.from_yaml("contracts.yaml")

result = guard.evaluate("read_file", {"path": ".env"})
print(result.verdict)        # "deny"
print(result.deny_reasons)   # ["Sensitive file '.env' denied."]

evaluate()

def evaluate(
    self,
    tool_name: str,
    args: dict[str, Any],
    *,
    principal: Principal | None = None,
    output: str | None = None,
    environment: str | None = None,
) -> EvaluationResult

Evaluates a single tool call against all matching contracts. This method is synchronous -- no await required.

Parameters

ParameterTypeDefaultDescription
tool_namestrrequiredThe tool being called
argsdict[str, Any]requiredTool call arguments
principalPrincipal | NoneNoneIdentity context for the call
outputstr | NoneNoneSimulated tool output. When provided, postconditions are evaluated against this value
environmentstr | NoneNoneOverride the guard's default environment

Behavior

  • Exhaustive evaluation. All matching contracts are evaluated. The pipeline does not short-circuit on the first denial -- you see every contract that would fire.
  • No tool execution. The tool function is never called.
  • No session state. Session contracts are skipped because there is no session context in a dry-run.
  • Sandbox contracts are evaluated. Unlike session contracts, sandbox contracts are stateless and are always included in dry-run evaluation.
  • Postconditions require output. Postconditions are only evaluated when output is provided. Without it, only preconditions and sandbox contracts are checked.
  • Synchronous. Unlike guard.run(), this method does not require asyncio.

Examples

Test a precondition:

result = guard.evaluate("read_file", {"path": ".env"})
assert result.verdict == "deny"
assert result.contracts[0].contract_id == "block-dotenv"

Test with principal context:

from edictum import Principal

result = guard.evaluate(
    "deploy_service",
    {"service": "api", "env": "production"},
    principal=Principal(role="sre", ticket_ref="JIRA-456"),
)
assert result.verdict == "allow"

Test postconditions by providing output:

result = guard.evaluate(
    "read_file",
    {"path": "data.txt"},
    output="SSN: 123-45-6789",
)
assert result.verdict == "warn"
assert len(result.warn_reasons) > 0

Test with environment override:

result = guard.evaluate(
    "deploy_service",
    {"service": "api"},
    environment="staging",
)

Test sandbox path allowlists:

# Sandbox contracts are evaluated during dry-run
result = guard.evaluate("read_file", {"path": "/etc/shadow"})
assert result.verdict == "deny"

# Sandbox contracts appear in results
sandbox_results = [c for c in result.contracts if c.contract_type == "sandbox"]
assert len(sandbox_results) == 1
assert sandbox_results[0].passed is False

evaluate_batch()

def evaluate_batch(
    self,
    calls: list[dict[str, Any]],
) -> list[EvaluationResult]

Evaluates multiple tool calls. Each call is evaluated independently via evaluate(). This method is synchronous.

Call Format

Each dict in the calls list accepts these keys:

KeyTypeRequiredDescription
toolstryesTool name
argsdictnoTool arguments (defaults to {})
principaldictnoPrincipal as a dict with keys: role, user_id, ticket_ref, claims
outputstr | dictnoSimulated output. Dicts are JSON-serialized automatically
environmentstrnoEnvironment override

Example

results = guard.evaluate_batch([
    {"tool": "read_file", "args": {"path": ".env"}},
    {"tool": "read_file", "args": {"path": "readme.txt"}},
    {"tool": "read_file", "args": {"path": "data.txt"}, "output": "SSN: 123-45-6789"},
    {
        "tool": "deploy_service",
        "args": {"service": "api"},
        "principal": {"role": "sre", "ticket_ref": "JIRA-123"},
    },
])

assert results[0].verdict == "deny"
assert results[1].verdict == "allow"
assert results[2].verdict == "warn"
assert results[3].verdict == "allow"

EvaluationResult

Returned by evaluate(). Contains the overall verdict and per-contract details.

FieldTypeDescription
verdictstr"allow", "deny", or "warn"
tool_namestrThe tool name that was evaluated
contractslist[ContractResult]Per-contract results
deny_reasonslist[str]Messages from failed preconditions
warn_reasonslist[str]Messages from failed postconditions
contracts_evaluatedintTotal number of contracts checked
policy_errorboolTrue if any contract raised an exception during evaluation

The verdict is determined by:

  • "deny" -- at least one precondition or sandbox contract failed (and was not in observe mode)
  • "warn" -- no precondition or sandbox failures, but at least one postcondition failed
  • "allow" -- all contracts passed

ContractResult

One entry per evaluated contract. Found in EvaluationResult.contracts.

FieldTypeDescription
contract_idstrThe contract's ID (from YAML id: or function __name__)
contract_typestr"precondition", "postcondition", or "sandbox"
passedboolWhether the contract passed
messagestr | NoneThe contract's message (from then.message in YAML)
tagslist[str]Tags attached to the contract
observedboolTrue if the contract is in observe mode and would have fired
effectstrPostcondition effect: "warn", "redact", or "deny"
policy_errorboolTrue if the contract raised an exception

evaluate() vs run() vs CLI

evaluate()run()edictum check / edictum test
Executes the toolNoYesNo
Session trackingNoYesNo
Audit eventsNoYesNo
Async requiredNoYesN/A
PreconditionsYesYesYes
Sandbox contractsYesYesYes
PostconditionsOnly with outputAlways--calls only
Short-circuitsNo (exhaustive)Yes (first deny)No

Use evaluate() for fast, synchronous contract logic testing. Use run() when you need the full pipeline including session state, hooks, and audit. Use the CLI for quick spot-checks and CI pipelines.


Next Steps

Last updated on

On this page