Token Budget Guardian
Prevent runaway AI agent spending. Set cost limits in BRL, detect subagent loops, and receive alerts before budgets are exhausted.
Token Budget Guardian
Why this exists
A single autonomous agent using subagents can generate R$1,500+ in LLM costs in a single day without any control layer. This happened with a real customer running ClaudeBot with sub-agents — the session ran unchecked until the provider invoice arrived.
The Token Budget Guardian intercepts every tool call before execution, checking against configured limits. If a limit is exceeded, the call is blocked with reason code BUDGET.EXCEEDED before any tokens are spent.
How it works
Every POST /v1/authorize request now passes through the Budget Guardian before the policy engine:
Request → [Budget Check] → [Hallucination Shield] → [WASM Policy Engine] → DecisionIf any budget limit is exceeded:
- Response:
status: "DENIED",policy_matched: "budget_guardian:DAILY_COST_BRL" - No tokens are spent on the blocked tool call
- Violation is logged to the audit trail
Configure a budget
POST /v1/budget/{agent_id}
Authorization: Bearer {your-api-key}
{
"max_cost_brl_per_day": 50.00,
"max_cost_brl_per_hour": 10.00,
"max_tool_calls_per_minute": 30,
"max_sequential_same_tool": 5,
"alert_webhook_url": "https://hooks.slack.com/...",
"alert_threshold_pct": 0.8,
"action_on_exceed": "BLOCK"
}| Field | Default | Description |
|---|---|---|
max_cost_brl_per_day | none | Primary protection: block agent if daily BRL spend exceeds this |
max_cost_brl_per_hour | none | Hourly rolling cost cap |
max_tokens_per_day | none | Raw token cap (if you prefer token-based limits) |
max_tool_calls_per_minute | 60 | RPM cap — detects rapid subagent loops |
max_sequential_same_tool | 10 | Block if same tool called N consecutive times |
alert_webhook_url | none | POST alert to this URL when threshold is reached |
alert_threshold_pct | 0.8 | Alert at this % of budget (default: 80%) |
action_on_exceed | BLOCK | BLOCK, ALERT_ONLY, or THROTTLE |
Check current usage
GET /v1/budget/{agent_id}{
"agent_id": "my-agent",
"today": {
"cost_brl": 12.40,
"tokens": 145000,
"remaining_budget_pct": 0.752
},
"current_hour": {
"cost_brl": 2.10,
"tokens": 24500
},
"status": "OK"
}Status values: OK | WARNING (< 50% remaining) | CRITICAL (< 20%) | EXHAUSTED | NO_LIMIT
Subagent loop detection
The Guardian detects two loop patterns automatically:
RPM loop: Agent spawns subagents that each call tools rapidly. If the agent exceeds max_tool_calls_per_minute, all further calls are blocked.
Consecutive same-tool loop: If the same tool is called N consecutive times (default: 10), the Guardian assumes a loop and blocks. This catches the classic pattern of a broken agent calling search_web in an infinite retry loop.
When a loop is detected:
{
"status": "DENIED",
"reason": "Subagent loop detected: 'search_web' called 11 consecutive times (max: 10).",
"policy_matched": "budget_guardian:LOOP_DETECTED"
}Supported LLM models (BRL pricing)
The Guardian estimates costs automatically based on the model field in the request:
| Model | Input (per 1M tokens, BRL) | Output |
|---|---|---|
| Claude Opus 4 | R$87.00 | R$261.00 |
| Claude Sonnet 4 | R$17.40 | R$87.00 |
| Claude Haiku 4 | R$1.16 | R$5.80 |
| GPT-4o | R$14.50 | R$43.50 |
| Gemini 1.5 Pro | R$8.70 | R$26.10 |
| Other/Default | R$10.00 | R$30.00 |
Prices use approximate BRL conversion and are updated quarterly.
View violations
GET /v1/budget/{agent_id}/violations?limit=50Returns a full audit trail of all budget violations, including what was blocked, when, and at what value.
Persistence & schema
As of v10.1.5, the Token Budget Guardian uses three dedicated tables in the Shield database, initialized automatically on startup:
| Table | Purpose |
|---|---|
token_budgets | Stores budget configuration per agent_id |
token_usage | Records cumulative usage (tokens, cost, call counts) per time window |
budget_violations | Immutable log of all enforcement events with reason codes |
These tables are created via initSchema() and indexed for low-latency budget lookups on every /v1/authorize call. No manual migration is required — the schema is managed automatically by the Shield runtime.
Prior to v10.1.5, these tables were missing from initSchema() and all TokenBudgetGuardian queries silently failed. If you were running an earlier version, upgrade to v10.1.5 to activate budget enforcement.
Example: The R$1,500 scenario, prevented
Configure the agent before first run:
POST /v1/budget/claudebot-production
{
"max_cost_brl_per_day": 100.00,
"max_tool_calls_per_minute": 20,
"max_sequential_same_tool": 5,
"alert_webhook_url": "https://hooks.slack.com/services/...",
"action_on_exceed": "BLOCK"
}At 80% of the daily limit (R$80), your Slack receives:
Agent 'claudebot-production' used 80% of DAILY_COST budget — R$80.00 / R$100.00
If the agent continues, at R$100 it is fully blocked:
{
"status": "DENIED",
"reason": "Daily cost budget exceeded: R$100.42 / R$100.00",
"policy_matched": "budget_guardian:DAILY_COST_BRL"
}The R$1,500 bill becomes R$100 maximum.