Test policy rules against real historical decisions before promoting them to production. Zero risk, instant feedback.

Policy Studio — Simulate

Why this exists

Writing governance policies without a way to test them is dangerous. A rule that's too broad blocks legitimate agent actions. A rule that's too narrow misses the threats it was written to catch.

Policy Studio Simulate lets you run a proposed rule against your actual decision history — before it touches any live traffic.

How it works

The simulation endpoint fetches your last N real authorization decisions from decision_logs and evaluates your proposed rule against each one's recorded inputs:

Your proposed rule (name + pattern + action)
  ↓
Fetched: last 50 historical decisions (real inputs)
  ↓
Rule evaluated against each decision in isolation
  ↓
Result: how many decisions would change verdict

No production data is modified. No live agents are affected.

Run a simulation

POST /v1/policies/simulate
Authorization: Bearer {your-api-key}

{
  "name": "Block exec_cmd in production",
  "content": "exec_cmd|shell_exec|run_script",
  "action": "DENY",
  "sample_size": 100
}

Field	Required	Description
`name`		Human-readable name for this rule
`content`		Regex pattern matched against `tool_name` and `input_data`
`action`		`DENY`, `ALLOW`, or `FLAG`
`sample_size`	—	Decisions to test against (default: 50, max: 200)

Reading the results

{
  "rule": {
    "name": "Block exec_cmd in production",
    "content": "exec_cmd|shell_exec|run_script",
    "action": "DENY"
  },
  "simulation": {
    "sample_size": 100,
    "decisions_tested": 100,
    "matched": 3,
    "changed_to_deny": 3,
    "changed_to_allow": 0,
    "unchanged": 97
  },
  "impact": "LOW",
  "decisions": [
    {
      "decision_id": "dec_01JN8K...",
      "tool_name": "exec_cmd",
      "original_verdict": "ALLOWED",
      "simulated_verdict": "DENIED",
      "changed": true,
      "matched_pattern": true
    }
  ]
}

impact values: NONE (0 changes) | LOW (<5%) | MEDIUM (5-20%) | HIGH (>20%)

A high impact simulation means your rule would affect a significant portion of recent decisions. This isn't necessarily bad — it might be intentional — but it's a signal to review the matched decisions carefully before promoting.

Look for:

changed_to_deny on decisions that were legitimately allowed → rule is too broad
changed_to_allow on decisions that were correctly denied → rule conflicts with existing policy
Expected changes only → rule is ready for production

Promoting to production

Once satisfied with the simulation, create the policy via the standard policy API:

POST /v1/policies
{
  "name": "Block exec_cmd in production",
  "content": "exec_cmd|shell_exec|run_script",
  "action": "DENY",
  "active": true,
  "change_reason": "Simulation passed: 3/100 decisions affected, all expected"
}

The change_reason field is recorded in the policy accountability audit trail.

Policy Studio — Simulate

Policy Studio — Simulate

Why this exists

How it works

Run a simulation

Reading the results

Interpreting impact

Promoting to production

On this page