Edictum
Rulesets ReferencePatterns

Data Protection Patterns

Data protection rulesets prevent sensitive information from leaking through agent tool calls.

AI Assistance

Right page if: you need rulesets to detect PII, secrets, or sensitive files in agent tool calls -- both input blocking and output scanning. Wrong page if: you need role-based access control -- see https://docs.edictum.ai/docs/rulesets/patterns/access-control. Gotcha: `action: redact` replaces matched patterns with [REDACTED] automatically, but only works on `pure` or `read` tools. For `write`/`irreversible` tools, it falls back to `warn` -- the sensitive data has already been written.

Data protection rulesets prevent sensitive information from leaking through agent tool calls. They cover two sides: denying access to sensitive files (preconditions) and scanning tool output for sensitive patterns (postconditions).


PII Detection in Tool Output

Scan tool output for personally identifiable information using regex patterns. This is a postcondition because it inspects the result after the tool has run.

When to use: Your agent calls tools that return data from databases, APIs, or files that may contain personal data. You want an decision log of PII exposure and a warning to the agent to redact before proceeding.

apiVersion: edictum/v1
kind: Ruleset

metadata:
  name: pii-detection

defaults:
  mode: enforce

rules:
  - id: pii-in-output
    type: post
    tool: "*"
    when:
      output.text:
        matches_any:
          - '\\b\\d{3}-\\d{2}-\\d{4}\\b'
          - '\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b'
          - '\\b\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}\\b'
          - '\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b'
    then:
      action: warn
      message: "PII pattern detected in output. Redact before using in summaries or responses."
      tags: [pii, compliance]
import re
from edictum import Decision
from edictum.rulesets import postcondition

@postcondition("*")
def detect_pii_in_output(envelope, tool_response):
    if not isinstance(tool_response, str):
        return Decision.pass_()
    pii_patterns = {
        "SSN": r"\b\d{3}-\d{2}-\d{4}\b",
        "email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
        "credit_card": r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b",
        "phone": r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",
    }
    found = [name for name, pat in pii_patterns.items() if re.search(pat, tool_response)]
    if found:
        return Decision.fail(
            f"Tool output contains potential PII: {', '.join(found)}. "
            "Do NOT include this data in summaries or outputs. "
            "Redact before processing further.",
            pii_types=found,
        )
    return Decision.pass_()

The patterns above detect:

PatternRegexExample Match
US SSN\b\d{3}-\d{2}-\d{4}\b123-45-6789
Email address\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\buser@example.com
Credit card\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b4111-1111-1111-1111
Phone number\b\d{3}[-.]?\d{3}[-.]?\d{4}\b555-867-5309

Gotchas:

  • With action: warn, postconditions detect but do not modify the output. Use on_postcondition_warn callbacks or switch to action: redact for automatic pattern replacement on READ/PURE tools.
  • Regex-based PII detection is a baseline. Production deployments should use ML-based PII scanners (Presidio, Phileas, etc.) behind the same postcondition rule interface.
  • matches_any short-circuits on the first match. Order patterns from most common to least common for performance.
  • The phone number regex will match some non-phone patterns like version numbers (e.g., 123.456.7890). Tune patterns based on your data.

Tip: For automatic redaction, change action: warn to action: redact. The pipeline uses the same matches_any patterns from the when clause to replace matched text with [REDACTED]. This works for READ/PURE tools; WRITE/IRREVERSIBLE tools fall back to warn.


Secret Scanning in Output

Detect credentials, tokens, and private keys in tool output. Even if a precondition allowed the read, the output may contain secrets that should not enter the conversation.

When to use: Defense in depth. Your agent reads files, calls APIs, or queries databases. Even if the input was allowed, the output may contain secrets leaked into logs, configs, or error messages.

apiVersion: edictum/v1
kind: Ruleset

metadata:
  name: secret-scanning

defaults:
  mode: enforce

rules:
  - id: secrets-in-output
    type: post
    tool: "*"
    when:
      output.text:
        matches_any:
          - 'AKIA[0-9A-Z]{16}'
          - 'eyJ[A-Za-z0-9_-]+\\.eyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+'
          - '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
    then:
      action: warn
      message: "Secret detected in output. Do not reference, log, or output this value."
      tags: [secrets, dlp]
      metadata:
        severity: critical
import re
from edictum import Decision
from edictum.rulesets import postcondition

@postcondition("*")
def detect_secrets_in_output(envelope, tool_response):
    if not isinstance(tool_response, str):
        return Decision.pass_()
    secret_patterns = {
        "AWS Access Key": r"AKIA[0-9A-Z]{16}",
        "JWT Token": r"eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+",
        "Private Key": r"-----BEGIN (?:RSA |EC )?PRIVATE KEY-----",
    }
    found = [name for name, pat in secret_patterns.items() if re.search(pat, tool_response)]
    if found:
        return Decision.fail(
            f"Tool output contains secrets: {', '.join(found)}. "
            "Do NOT reference, log, or output these values.",
            secret_types=found,
        )
    return Decision.pass_()

The patterns above detect:

PatternRegexExample Match
AWS Access KeyAKIA[0-9A-Z]{16}AKIAIOSFODNN7EXAMPLE
JWT TokeneyJ... (three dot-separated base64 segments)eyJhbGciOiJ...
Private KeyPEM header format-----BEGIN RSA PRIVATE KEY-----

Gotchas:

  • The AWS key pattern only matches access key IDs (starting with AKIA). It does not detect secret access keys, which are harder to distinguish from random strings. Add a separate pattern for aws_secret_access_key\s*[:=]\s*\S+ if needed.
  • JWT patterns match the structure but do not validate the token. Expired or invalid JWTs still trigger the warning, which is the desired behavior.

Sensitive File Blocking

Block reads of files that commonly contain secrets, credentials, or private keys. This is a precondition -- it runs before the tool executes, so no data is exposed.

When to use: Your agent has access to read_file and you want to prevent it from reading files that could expose secrets, even accidentally.

apiVersion: edictum/v1
kind: Ruleset

metadata:
  name: sensitive-file-denial

defaults:
  mode: enforce

rules:
  - id: block-secret-files
    type: pre
    tool: read_file
    when:
      args.path:
        contains_any:
          - ".env"
          - ".secret"
          - "credentials"
          - ".pem"
          - "id_rsa"
          - ".key"
          - "kubeconfig"
    then:
      action: block
      message: "Reading sensitive file '{args.path}' is denied. Skip and continue with non-sensitive files."
      tags: [secrets, dlp]

  - id: block-config-with-secrets
    type: pre
    tool: read_file
    when:
      any:
        - args.path: { ends_with: ".tfvars" }
        - args.path: { ends_with: ".npmrc" }
        - args.path: { ends_with: ".pypirc" }
        - args.path: { ends_with: ".netrc" }
    then:
      action: block
      message: "Config file '{args.path}' may contain credentials. Access denied."
      tags: [secrets, dlp]
from edictum import Decision, precondition

@precondition("read_file")
def block_secret_files(envelope):
    path = envelope.args.get("path", "")
    sensitive = [".env", ".secret", "credentials", ".pem", "id_rsa", ".key", "kubeconfig"]
    for s in sensitive:
        if s in path:
            return Decision.fail(
                f"Reading sensitive file '{path}' is denied. "
                "Skip and continue with non-sensitive files."
            )
    return Decision.pass_()

@precondition("read_file")
def block_config_with_secrets(envelope):
    path = envelope.args.get("path", "")
    secret_exts = [".tfvars", ".npmrc", ".pypirc", ".netrc"]
    for ext in secret_exts:
        if path.endswith(ext):
            return Decision.fail(
                f"Config file '{path}' may contain credentials. Access denied."
            )
    return Decision.pass_()

Gotchas:

  • contains_any is a substring match. A path like /reports/environment.log would match on .env. Use ends_with or matches with word boundaries for more precise matching.
  • This pattern only protects read_file. If your agent has a bash tool, it could read the same files with cat. Add corresponding rulesets for all file-reading tools.

Output Size Monitoring

Warn when tool output is unusually large, which can waste context window tokens and cause the agent to lose track of its task.

When to use: Your agent reads files or queries databases where unbounded results are possible. Large outputs dilute the agent's focus and increase token costs.

apiVersion: edictum/v1
kind: Ruleset

metadata:
  name: output-monitoring

defaults:
  mode: enforce

rules:
  - id: large-output-warning
    type: post
    tool: "*"
    when:
      output.text:
        matches: '.{50000,}'
    then:
      action: warn
      message: "Tool output is very large. Use pagination, head/tail, or more specific filters."
      tags: [performance, output-size]
from edictum import Decision
from edictum.rulesets import postcondition

@postcondition("*")
def monitor_output_size(envelope, tool_response):
    if tool_response is None:
        return Decision.pass_()
    size = len(str(tool_response))
    if size > 50_000:
        return Decision.fail(
            f"Tool output is very large ({size:,} chars). "
            "Consider using head/tail, pagination, or more specific "
            "filters to reduce the output before processing.",
            output_size=size,
        )
    return Decision.pass_()

Gotchas:

  • The .{50000,} regex matches any string with 50,000 or more characters. This is a rough proxy for output size. Adjust the threshold based on your context window budget.
  • Large regex matches can be slow. If performance is a concern, consider implementing output size monitoring as a Python postcondition instead, where you can use len() directly.

Last updated on

On this page