Dry-Run Evaluation
Test whether a tool call would be allowed, blocked, or warned without executing it.
Right page if: you need to test a tool call without executing it -- `evaluate()` returns a decision and matching rules, but does not mutate runtime state. Wrong page if: you need the full runtime pipeline with workflow gates, session state, or audit -- use `run()`. Gotcha: dry-run evaluation skips session rules and workflow gates because they need runtime session state. Check output rules only run when you provide `output`.
evaluate() answers one question: what would Edictum do if this tool call happened right now?
It does not execute the tool, write audit events, advance workflow stages, or touch session counters.
Quick Example
from edictum import Edictum
guard = Edictum.from_yaml("rules.yaml")
result = guard.evaluate("read_file", {"path": ".env"})
print(result.decision) # "block"
print(result.block_reasons) # ["Sensitive file '.env' blocked."]evaluate()
def evaluate(
self,
tool_name: str,
args: dict[str, Any],
*,
principal: Principal | None = None,
output: str | None = None,
environment: str | None = None,
) -> EvaluationResultPython evaluate() is synchronous. TypeScript evaluate() is async. Go Evaluate() is synchronous.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
tool_name | str | required | The tool being called |
args | dict[str, Any] | required | Tool call arguments |
principal | Principal | None | None | Identity context for the call |
output | str | None | None | Simulated tool output. When provided, check_output rules are evaluated |
environment | str | None | None | Override the guard's default environment |
Behavior
- Exhaustive evaluation. All matching rules are evaluated. There is no short-circuit on the first block.
- No tool execution. The tool function is never called.
- No session state. Session rules are skipped because dry-run evaluation has no runtime session.
- No workflow gates. Workflow runtime enforcement is skipped for the same reason.
- Sandbox rules are evaluated. They are stateless, so dry-run includes them.
- Check output rules require output. Without
output, only check rules and sandbox rules run.
Examples
Test a check rule:
result = guard.evaluate("read_file", {"path": ".env"})
assert result.decision == "block"
assert result.rules[0].rule_id == "block-dotenv"Test with principal context:
from edictum import Principal
result = guard.evaluate(
"deploy_service",
{"service": "api", "env": "production"},
principal=Principal(role="sre", ticket_ref="JIRA-456"),
)
assert result.decision == "allow"Test a check_output rule by providing output:
result = guard.evaluate(
"read_file",
{"path": "data.txt"},
output="SSN: 123-45-6789",
)
assert result.decision == "warn"
assert len(result.warn_reasons) > 0Test a sandbox boundary:
result = guard.evaluate("read_file", {"path": "/etc/shadow"})
assert result.decision == "block"
sandbox_results = [rule for rule in result.rules if rule.rule_type == "sandbox"]
assert len(sandbox_results) == 1
assert sandbox_results[0].passed is Falseevaluate_batch()
def evaluate_batch(
self,
calls: list[dict[str, Any]],
) -> list[EvaluationResult]Evaluates multiple tool calls. Each call is evaluated independently via evaluate().
Call Format
Each dict in the calls list accepts these keys:
| Key | Type | Required | Description |
|---|---|---|---|
tool | str | yes | Tool name |
args | dict | no | Tool arguments (defaults to {}) |
principal | dict | no | Principal as a dict with keys: role, user_id, ticket_ref, claims |
output | str | dict | no | Simulated output. Dicts are JSON-serialized automatically |
environment | str | no | Environment override |
Example
results = guard.evaluate_batch([
{"tool": "read_file", "args": {"path": ".env"}},
{"tool": "read_file", "args": {"path": "readme.txt"}},
{"tool": "read_file", "args": {"path": "data.txt"}, "output": "SSN: 123-45-6789"},
{
"tool": "deploy_service",
"args": {"service": "api"},
"principal": {"role": "sre", "ticket_ref": "JIRA-123"},
},
])
assert results[0].decision == "block"
assert results[1].decision == "allow"
assert results[2].decision == "warn"
assert results[3].decision == "allow"EvaluationResult
Returned by evaluate(). It contains the overall decision and per-rule details.
Python fields
| Field | Description |
|---|---|
decision | "allow", "deny", or "warn" |
tool_name | Tool name that was evaluated |
rules | Per-rule results |
block_reasons | Messages from failed check or sandbox rules |
warn_reasons | Messages from failed check_output rules |
rules_evaluated | Total number of evaluated rules |
policy_error | True if any rule raised a policy error |
TypeScript fields
| Field | Description |
|---|---|
decision | "allow", "block", or "warn" |
toolName | Tool name that was evaluated |
rules | Per-rule results |
denyReasons | Messages from failed check or sandbox rules |
warnReasons | Messages from failed check_output rules |
contractsEvaluated | Total number of evaluated rules. This is a legacy TypeScript field name. |
workflowSkipped | true when a workflow runtime is attached |
workflowReason | Why workflow enforcement was skipped |
Go fields
| Field | Description |
|---|---|
Decision | "allow", "block", or "warn" |
ToolName | Tool name that was evaluated |
Rules | Per-rule results |
BlockReasons | Messages from failed check or sandbox rules |
WarnReasons | Messages from failed check_output rules |
RulesEvaluated | Total number of evaluated rules |
WorkflowSkipped | true when a workflow runtime is attached |
WorkflowReason | Why workflow enforcement was skipped |
Each RuleResult contains the rule ID, rule type, whether it passed, the message, whether it was only observed, and whether a policy error occurred.
evaluate() vs run() vs CLI
evaluate() | run() | edictum check / edictum test | |
|---|---|---|---|
| Executes the tool | No | Yes | No |
| Session tracking | No | Yes | No |
| Workflow gates | No | Yes | No |
| Audit events | No | Yes | No |
| Async required | No | Yes | N/A |
| Check rules | Yes | Yes | Yes |
| Sandbox rules | Yes | Yes | Yes |
| Check output rules | Only with output | Always | --calls only |
| Short-circuits | No | Yes | No |
Use evaluate() for fast rule debugging. Use run() when you need real runtime behavior. Use the CLI for ad hoc checks and CI.
Last updated on