Benchmarks

Benchmark source files:

1. Adapter overhead benchmark

Measures enforcement overhead without LLM latency.

cd edictum-demo
python benchmark/benchmark_adapters.py

Use this to compare overhead consistency across all 8 adapters.

2. End-to-end latency benchmark

Measures four phases: baseline tool call, enforcement only, LLM only, and full loop.

cd edictum-demo
python benchmark/benchmark_latency.py

Use this to quantify the enforcement share of total runtime in your own environment.

3. Prompt-vs-rule benchmark

Compares three stages:

prompt-only control
observe mode rollout
enforce mode rollout

cd edictum-demo
python benchmark/prompt_vs_contracts.py
python benchmark/prompt_vs_contracts.py --quick
python benchmark/prompt_vs_contracts.py --runs 3

Requires OPENAI_API_KEY in .env.

How to use results in rollout decisions

Validate that adapter overhead remains flat before/after rule changes.
Confirm enforce mode does not create unacceptable end-to-end latency regression.
Use prompt-vs-rule outputs to justify moving from advisory prompt controls to deterministic rulesets.

Related docs:

1. Adapter overhead benchmark

2. End-to-end latency benchmark

3. Prompt-vs-rule benchmark

How to use results in rollout decisions

On this page