Dry-Run Evaluation

Test whether a tool call would be allowed, blocked, or warned without executing it.

evaluate() answers one question: what would Edictum do if this tool call happened right now?

It does not execute the tool, write audit events, advance workflow stages, or touch session counters.

Quick Example

from edictum import Edictum

guard = Edictum.from_yaml("rules.yaml")

result = guard.evaluate("read_file", {"path": ".env"})
print(result.decision)       # "block"
print(result.block_reasons)  # ["Sensitive file '.env' blocked."]

`evaluate()`

def evaluate(
    self,
    tool_name: str,
    args: dict[str, Any],
    *,
    principal: Principal | None = None,
    output: str | None = None,
    environment: str | None = None,
) -> EvaluationResult

Python evaluate() is synchronous. TypeScript evaluate() is async. Go Evaluate() is synchronous.

Parameters

Parameter	Type	Default	Description
`tool_name`	`str`	required	The tool being called
`args`	`dict[str, Any]`	required	Tool call arguments
`principal`	`Principal \| None`	`None`	Identity context for the call
`output`	`str \| None`	`None`	Simulated tool output. When provided, check_output rules are evaluated
`environment`	`str \| None`	`None`	Override the guard's default environment

Behavior

Exhaustive evaluation. All matching rules are evaluated. There is no short-circuit on the first block.
No tool execution. The tool function is never called.
No session state. Session rules are skipped because dry-run evaluation has no runtime session.
No workflow gates. Workflow runtime enforcement is skipped for the same reason.
Sandbox rules are evaluated. They are stateless, so dry-run includes them.
Check output rules require output. Without output, only check rules and sandbox rules run.

Examples

Test a check rule:

result = guard.evaluate("read_file", {"path": ".env"})
assert result.decision == "block"
assert result.rules[0].rule_id == "block-dotenv"

Test with principal context:

from edictum import Principal

result = guard.evaluate(
    "deploy_service",
    {"service": "api", "env": "production"},
    principal=Principal(role="sre", ticket_ref="JIRA-456"),
)
assert result.decision == "allow"

Test a check_output rule by providing output:

result = guard.evaluate(
    "read_file",
    {"path": "data.txt"},
    output="SSN: 123-45-6789",
)
assert result.decision == "warn"
assert len(result.warn_reasons) > 0

Test a sandbox boundary:

result = guard.evaluate("read_file", {"path": "/etc/shadow"})
assert result.decision == "block"

sandbox_results = [rule for rule in result.rules if rule.rule_type == "sandbox"]
assert len(sandbox_results) == 1
assert sandbox_results[0].passed is False

`evaluate_batch()`

def evaluate_batch(
    self,
    calls: list[dict[str, Any]],
) -> list[EvaluationResult]

Evaluates multiple tool calls. Each call is evaluated independently via evaluate().

Call Format

Each dict in the calls list accepts these keys:

Key	Type	Required	Description
`tool`	`str`	yes	Tool name
`args`	`dict`	no	Tool arguments (defaults to `{}`)
`principal`	`dict`	no	Principal as a dict with keys: `role`, `user_id`, `ticket_ref`, `claims`
`output`	`str \| dict`	no	Simulated output. Dicts are JSON-serialized automatically
`environment`	`str`	no	Environment override

Example

results = guard.evaluate_batch([
    {"tool": "read_file", "args": {"path": ".env"}},
    {"tool": "read_file", "args": {"path": "readme.txt"}},
    {"tool": "read_file", "args": {"path": "data.txt"}, "output": "SSN: 123-45-6789"},
    {
        "tool": "deploy_service",
        "args": {"service": "api"},
        "principal": {"role": "sre", "ticket_ref": "JIRA-123"},
    },
])

assert results[0].decision == "block"
assert results[1].decision == "allow"
assert results[2].decision == "warn"
assert results[3].decision == "allow"

`EvaluationResult`

Returned by evaluate(). It contains the overall decision and per-rule details.

Python fields

Field	Description
`decision`	`"allow"`, `"deny"`, or `"warn"`
`tool_name`	Tool name that was evaluated
`rules`	Per-rule results
`block_reasons`	Messages from failed check or sandbox rules
`warn_reasons`	Messages from failed check_output rules
`rules_evaluated`	Total number of evaluated rules
`policy_error`	`True` if any rule raised a policy error

TypeScript fields

Field	Description
`decision`	`"allow"`, `"block"`, or `"warn"`
`toolName`	Tool name that was evaluated
`rules`	Per-rule results
`denyReasons`	Messages from failed check or sandbox rules
`warnReasons`	Messages from failed check_output rules
`contractsEvaluated`	Total number of evaluated rules. This is a legacy TypeScript field name.
`workflowSkipped`	`true` when a workflow runtime is attached
`workflowReason`	Why workflow enforcement was skipped

Go fields

Field	Description
`Decision`	`"allow"`, `"block"`, or `"warn"`
`ToolName`	Tool name that was evaluated
`Rules`	Per-rule results
`BlockReasons`	Messages from failed check or sandbox rules
`WarnReasons`	Messages from failed check_output rules
`RulesEvaluated`	Total number of evaluated rules
`WorkflowSkipped`	`true` when a workflow runtime is attached
`WorkflowReason`	Why workflow enforcement was skipped

Each RuleResult contains the rule ID, rule type, whether it passed, the message, whether it was only observed, and whether a policy error occurred.

`evaluate()` vs `run()` vs CLI

	`evaluate()`	`run()`	`edictum check` / `edictum test`
Executes the tool	No	Yes	No
Session tracking	No	Yes	No
Workflow gates	No	Yes	No
Audit events	No	Yes	No
Async required	No	Yes	N/A
Check rules	Yes	Yes	Yes
Sandbox rules	Yes	Yes	Yes
Check output rules	Only with `output`	Always	`--calls` only
Short-circuits	No	Yes	No

Use evaluate() for fast rule debugging. Use run() when you need real runtime behavior. Use the CLI for ad hoc checks and CI.

On this page