Policy Studio — Simulate
Test policy rules against real historical decisions before promoting them to production. Zero risk, instant feedback.
Policy Studio — Simulate
Why this exists
Writing governance policies without a way to test them is dangerous. A rule that's too broad blocks legitimate agent actions. A rule that's too narrow misses the threats it was written to catch.
Policy Studio Simulate lets you run a proposed rule against your actual decision history — before it touches any live traffic.
How it works
The simulation endpoint fetches your last N real authorization decisions from decision_logs and evaluates your proposed rule against each one's recorded inputs:
Your proposed rule (name + pattern + action)
↓
Fetched: last 50 historical decisions (real inputs)
↓
Rule evaluated against each decision in isolation
↓
Result: how many decisions would change verdictNo production data is modified. No live agents are affected.
Run a simulation
POST /v1/policies/simulate
Authorization: Bearer {your-api-key}
{
"name": "Block exec_cmd in production",
"content": "exec_cmd|shell_exec|run_script",
"action": "DENY",
"sample_size": 100
}| Field | Required | Description |
|---|---|---|
name | Human-readable name for this rule | |
content | Regex pattern matched against tool_name and input_data | |
action | DENY, ALLOW, or FLAG | |
sample_size | — | Decisions to test against (default: 50, max: 200) |
Reading the results
{
"rule": {
"name": "Block exec_cmd in production",
"content": "exec_cmd|shell_exec|run_script",
"action": "DENY"
},
"simulation": {
"sample_size": 100,
"decisions_tested": 100,
"matched": 3,
"changed_to_deny": 3,
"changed_to_allow": 0,
"unchanged": 97
},
"impact": "LOW",
"decisions": [
{
"decision_id": "dec_01JN8K...",
"tool_name": "exec_cmd",
"original_verdict": "ALLOWED",
"simulated_verdict": "DENIED",
"changed": true,
"matched_pattern": true
}
]
}impact values: NONE (0 changes) | LOW (<5%) | MEDIUM (5-20%) | HIGH (>20%)
Interpreting impact
A high impact simulation means your rule would affect a significant portion of recent decisions. This isn't necessarily bad — it might be intentional — but it's a signal to review the matched decisions carefully before promoting.
Look for:
changed_to_denyon decisions that were legitimately allowed → rule is too broadchanged_to_allowon decisions that were correctly denied → rule conflicts with existing policy- Expected changes only → rule is ready for production
Promoting to production
Once satisfied with the simulation, create the policy via the standard policy API:
POST /v1/policies
{
"name": "Block exec_cmd in production",
"content": "exec_cmd|shell_exec|run_script",
"action": "DENY",
"active": true,
"change_reason": "Simulation passed: 3/100 decisions affected, all expected"
}The change_reason field is recorded in the policy accountability audit trail.