Benchmarks
Reproduce adapter overhead, end-to-end latency, and prompt-vs-rule experiments.
Right page if: you need to measure Edictum's performance overhead or compare prompt-based control vs deterministic rulesets with hard numbers. Wrong page if: you need the adapter API docs -- see https://docs.edictum.ai/docs/adapters/overview. For observe mode concepts, see https://docs.edictum.ai/docs/concepts/observe-mode. Gotcha: the adapter overhead benchmark isolates enforcement latency without LLM calls. The prompt-vs-rule benchmark requires OPENAI_API_KEY in .env. Run benchmarks before and after rule changes to quantify regression.
Benchmark source files:
benchmark/README.mdbenchmark/benchmark_adapters.pybenchmark/benchmark_latency.pybenchmark/prompt_vs_contracts.py
1. Adapter overhead benchmark
Measures enforcement overhead without LLM latency.
cd edictum-demo
python benchmark/benchmark_adapters.pyUse this to compare overhead consistency across all 8 adapters.
2. End-to-end latency benchmark
Measures four phases: baseline tool call, enforcement only, LLM only, and full loop.
cd edictum-demo
python benchmark/benchmark_latency.pyUse this to quantify the enforcement share of total runtime in your own environment.
3. Prompt-vs-rule benchmark
Compares three stages:
- prompt-only control
- observe mode rollout
- enforce mode rollout
cd edictum-demo
python benchmark/prompt_vs_contracts.py
python benchmark/prompt_vs_contracts.py --quick
python benchmark/prompt_vs_contracts.py --runs 3Requires OPENAI_API_KEY in .env.
How to use results in rollout decisions
- Validate that adapter overhead remains flat before/after rule changes.
- Confirm enforce mode does not create unacceptable end-to-end latency regression.
- Use prompt-vs-rule outputs to justify moving from advisory prompt controls to deterministic rulesets.
Related docs:
Last updated on