Data Protection Patterns
Data protection contracts prevent sensitive information from leaking through agent tool calls.
Right page if: you need contracts to detect PII, secrets, or sensitive files in agent tool calls -- both input blocking and output scanning. Wrong page if: you need role-based access control -- see https://docs.edictum.ai/docs/contracts/patterns/access-control. Gotcha: `effect: redact` replaces matched patterns with [REDACTED] automatically, but only works on `pure` or `read` tools. For `write`/`irreversible` tools, it falls back to `warn` -- the sensitive data has already been written.
Data protection contracts prevent sensitive information from leaking through agent tool calls. They cover two sides: denying access to sensitive files (preconditions) and scanning tool output for sensitive patterns (postconditions).
PII Detection in Tool Output
Scan tool output for personally identifiable information using regex patterns. This is a postcondition because it inspects the result after the tool has run.
When to use: Your agent calls tools that return data from databases, APIs, or files that may contain personal data. You want an audit trail of PII exposure and a warning to the agent to redact before proceeding.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: pii-detection
defaults:
mode: enforce
contracts:
- id: pii-in-output
type: post
tool: "*"
when:
output.text:
matches_any:
- '\\b\\d{3}-\\d{2}-\\d{4}\\b'
- '\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b'
- '\\b\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}\\b'
- '\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b'
then:
effect: warn
message: "PII pattern detected in output. Redact before using in summaries or responses."
tags: [pii, compliance]import re
from edictum import Verdict
from edictum.contracts import postcondition
@postcondition("*")
def detect_pii_in_output(envelope, tool_response):
if not isinstance(tool_response, str):
return Verdict.pass_()
pii_patterns = {
"SSN": r"\b\d{3}-\d{2}-\d{4}\b",
"email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
"credit_card": r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b",
"phone": r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",
}
found = [name for name, pat in pii_patterns.items() if re.search(pat, tool_response)]
if found:
return Verdict.fail(
f"Tool output contains potential PII: {', '.join(found)}. "
"Do NOT include this data in summaries or outputs. "
"Redact before processing further.",
pii_types=found,
)
return Verdict.pass_()The patterns above detect:
| Pattern | Regex | Example Match |
|---|---|---|
| US SSN | \b\d{3}-\d{2}-\d{4}\b | 123-45-6789 |
| Email address | \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b | user@example.com |
| Credit card | \b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b | 4111-1111-1111-1111 |
| Phone number | \b\d{3}[-.]?\d{3}[-.]?\d{4}\b | 555-867-5309 |
Gotchas:
- With
effect: warn, postconditions detect but do not modify the output. Useon_postcondition_warncallbacks or switch toeffect: redactfor automatic pattern replacement on READ/PURE tools. - Regex-based PII detection is a baseline. Production deployments should use ML-based PII scanners (Presidio, Phileas, etc.) behind the same postcondition contract interface.
matches_anyshort-circuits on the first match. Order patterns from most common to least common for performance.- The phone number regex will match some non-phone patterns like version numbers (e.g.,
123.456.7890). Tune patterns based on your data.
Tip: For automatic redaction, change effect: warn to effect: redact. The pipeline uses the same matches_any patterns from the when clause to replace matched text with [REDACTED]. This works for READ/PURE tools; WRITE/IRREVERSIBLE tools fall back to warn.
Secret Scanning in Output
Detect credentials, tokens, and private keys in tool output. Even if a precondition allowed the read, the output may contain secrets that should not enter the conversation.
When to use: Defense in depth. Your agent reads files, calls APIs, or queries databases. Even if the input was allowed, the output may contain secrets leaked into logs, configs, or error messages.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: secret-scanning
defaults:
mode: enforce
contracts:
- id: secrets-in-output
type: post
tool: "*"
when:
output.text:
matches_any:
- 'AKIA[0-9A-Z]{16}'
- 'eyJ[A-Za-z0-9_-]+\\.eyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+'
- '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
then:
effect: warn
message: "Secret detected in output. Do not reference, log, or output this value."
tags: [secrets, dlp]
metadata:
severity: criticalimport re
from edictum import Verdict
from edictum.contracts import postcondition
@postcondition("*")
def detect_secrets_in_output(envelope, tool_response):
if not isinstance(tool_response, str):
return Verdict.pass_()
secret_patterns = {
"AWS Access Key": r"AKIA[0-9A-Z]{16}",
"JWT Token": r"eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+",
"Private Key": r"-----BEGIN (?:RSA |EC )?PRIVATE KEY-----",
}
found = [name for name, pat in secret_patterns.items() if re.search(pat, tool_response)]
if found:
return Verdict.fail(
f"Tool output contains secrets: {', '.join(found)}. "
"Do NOT reference, log, or output these values.",
secret_types=found,
)
return Verdict.pass_()The patterns above detect:
| Pattern | Regex | Example Match |
|---|---|---|
| AWS Access Key | AKIA[0-9A-Z]{16} | AKIAIOSFODNN7EXAMPLE |
| JWT Token | eyJ... (three dot-separated base64 segments) | eyJhbGciOiJ... |
| Private Key | PEM header format | -----BEGIN RSA PRIVATE KEY----- |
Gotchas:
- The AWS key pattern only matches access key IDs (starting with
AKIA). It does not detect secret access keys, which are harder to distinguish from random strings. Add a separate pattern foraws_secret_access_key\s*[:=]\s*\S+if needed. - JWT patterns match the structure but do not validate the token. Expired or invalid JWTs still trigger the warning, which is the desired behavior.
Sensitive File Blocking
Block reads of files that commonly contain secrets, credentials, or private keys. This is a precondition -- it runs before the tool executes, so no data is exposed.
When to use: Your agent has access to read_file and you want to prevent it from reading files that could expose secrets, even accidentally.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: sensitive-file-denial
defaults:
mode: enforce
contracts:
- id: block-secret-files
type: pre
tool: read_file
when:
args.path:
contains_any:
- ".env"
- ".secret"
- "credentials"
- ".pem"
- "id_rsa"
- ".key"
- "kubeconfig"
then:
effect: deny
message: "Reading sensitive file '{args.path}' is denied. Skip and continue with non-sensitive files."
tags: [secrets, dlp]
- id: block-config-with-secrets
type: pre
tool: read_file
when:
any:
- args.path: { ends_with: ".tfvars" }
- args.path: { ends_with: ".npmrc" }
- args.path: { ends_with: ".pypirc" }
- args.path: { ends_with: ".netrc" }
then:
effect: deny
message: "Config file '{args.path}' may contain credentials. Access denied."
tags: [secrets, dlp]from edictum import Verdict, precondition
@precondition("read_file")
def block_secret_files(envelope):
path = envelope.args.get("path", "")
sensitive = [".env", ".secret", "credentials", ".pem", "id_rsa", ".key", "kubeconfig"]
for s in sensitive:
if s in path:
return Verdict.fail(
f"Reading sensitive file '{path}' is denied. "
"Skip and continue with non-sensitive files."
)
return Verdict.pass_()
@precondition("read_file")
def block_config_with_secrets(envelope):
path = envelope.args.get("path", "")
secret_exts = [".tfvars", ".npmrc", ".pypirc", ".netrc"]
for ext in secret_exts:
if path.endswith(ext):
return Verdict.fail(
f"Config file '{path}' may contain credentials. Access denied."
)
return Verdict.pass_()Gotchas:
contains_anyis a substring match. A path like/reports/environment.logwould match on.env. Useends_withormatcheswith word boundaries for more precise matching.- This pattern only protects
read_file. If your agent has abashtool, it could read the same files withcat. Add corresponding contracts for all file-reading tools.
Output Size Monitoring
Warn when tool output is unusually large, which can waste context window tokens and cause the agent to lose track of its task.
When to use: Your agent reads files or queries databases where unbounded results are possible. Large outputs dilute the agent's focus and increase token costs.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: output-monitoring
defaults:
mode: enforce
contracts:
- id: large-output-warning
type: post
tool: "*"
when:
output.text:
matches: '.{50000,}'
then:
effect: warn
message: "Tool output is very large. Use pagination, head/tail, or more specific filters."
tags: [performance, output-size]from edictum import Verdict
from edictum.contracts import postcondition
@postcondition("*")
def monitor_output_size(envelope, tool_response):
if tool_response is None:
return Verdict.pass_()
size = len(str(tool_response))
if size > 50_000:
return Verdict.fail(
f"Tool output is very large ({size:,} chars). "
"Consider using head/tail, pagination, or more specific "
"filters to reduce the output before processing.",
output_size=size,
)
return Verdict.pass_()Gotchas:
- The
.{50000,}regex matches any string with 50,000 or more characters. This is a rough proxy for output size. Adjust the threshold based on your context window budget. - Large regex matches can be slow. If performance is a concern, consider implementing output size monitoring as a Python postcondition instead, where you can use
len()directly.
Last updated on