Prevent runaway AI agent spending. Set cost limits in BRL, detect subagent loops, and receive alerts before budgets are exhausted.

Token Budget Guardian

Why this exists

A single autonomous agent using subagents can generate R$1,500+ in LLM costs in a single day without any control layer. This happened with a real customer running ClaudeBot with sub-agents — the session ran unchecked until the provider invoice arrived.

The Token Budget Guardian intercepts every tool call before execution, checking against configured limits. If a limit is exceeded, the call is blocked with reason code BUDGET.EXCEEDED before any tokens are spent.

How it works

Every POST /v1/authorize request now passes through the Budget Guardian before the policy engine:

Request → [Budget Check] → [Hallucination Shield] → [WASM Policy Engine] → Decision

If any budget limit is exceeded:

Response: status: "DENIED", policy_matched: "budget_guardian:DAILY_COST_BRL"
No tokens are spent on the blocked tool call
Violation is logged to the audit trail

Configure a budget

POST /v1/budget/{agent_id}
Authorization: Bearer {your-api-key}

{
  "max_cost_brl_per_day": 50.00,
  "max_cost_brl_per_hour": 10.00,
  "max_tool_calls_per_minute": 30,
  "max_sequential_same_tool": 5,
  "alert_webhook_url": "https://hooks.slack.com/...",
  "alert_threshold_pct": 0.8,
  "action_on_exceed": "BLOCK"
}

Field	Default	Description
`max_cost_brl_per_day`	none	Primary protection: block agent if daily BRL spend exceeds this
`max_cost_brl_per_hour`	none	Hourly rolling cost cap
`max_tokens_per_day`	none	Raw token cap (if you prefer token-based limits)
`max_tool_calls_per_minute`	60	RPM cap — detects rapid subagent loops
`max_sequential_same_tool`	10	Block if same tool called N consecutive times
`alert_webhook_url`	none	POST alert to this URL when threshold is reached
`alert_threshold_pct`	0.8	Alert at this % of budget (default: 80%)
`action_on_exceed`	`BLOCK`	`BLOCK`, `ALERT_ONLY`, or `THROTTLE`

Check current usage

GET /v1/budget/{agent_id}

{
  "agent_id": "my-agent",
  "today": {
    "cost_brl": 12.40,
    "tokens": 145000,
    "remaining_budget_pct": 0.752
  },
  "current_hour": {
    "cost_brl": 2.10,
    "tokens": 24500
  },
  "status": "OK"
}

Status values: OK | WARNING (< 50% remaining) | CRITICAL (< 20%) | EXHAUSTED | NO_LIMIT

Subagent loop detection

The Guardian detects two loop patterns automatically:

RPM loop: Agent spawns subagents that each call tools rapidly. If the agent exceeds max_tool_calls_per_minute, all further calls are blocked.

Consecutive same-tool loop: If the same tool is called N consecutive times (default: 10), the Guardian assumes a loop and blocks. This catches the classic pattern of a broken agent calling search_web in an infinite retry loop.

When a loop is detected:

{
  "status": "DENIED",
  "reason": "Subagent loop detected: 'search_web' called 11 consecutive times (max: 10).",
  "policy_matched": "budget_guardian:LOOP_DETECTED"
}

Supported LLM models (BRL pricing)

The Guardian estimates costs automatically based on the model field in the request:

Model	Input (per 1M tokens, BRL)	Output
Claude Opus 4	R$87.00	R$261.00
Claude Sonnet 4	R$17.40	R$87.00
Claude Haiku 4	R$1.16	R$5.80
GPT-4o	R$14.50	R$43.50
Gemini 1.5 Pro	R$8.70	R$26.10
Other/Default	R$10.00	R$30.00

Prices use approximate BRL conversion and are updated quarterly.

View violations

GET /v1/budget/{agent_id}/violations?limit=50

Returns a full audit trail of all budget violations, including what was blocked, when, and at what value.

Persistence & schema

As of v10.1.5, the Token Budget Guardian uses three dedicated tables in the Shield database, initialized automatically on startup:

Table	Purpose
`token_budgets`	Stores budget configuration per `agent_id`
`token_usage`	Records cumulative usage (tokens, cost, call counts) per time window
`budget_violations`	Immutable log of all enforcement events with reason codes

These tables are created via initSchema() and indexed for low-latency budget lookups on every /v1/authorize call. No manual migration is required — the schema is managed automatically by the Shield runtime.

Prior to v10.1.5, these tables were missing from initSchema() and all TokenBudgetGuardian queries silently failed. If you were running an earlier version, upgrade to v10.1.5 to activate budget enforcement.

Example: The R$1,500 scenario, prevented

Configure the agent before first run:

POST /v1/budget/claudebot-production
{
  "max_cost_brl_per_day": 100.00,
  "max_tool_calls_per_minute": 20,
  "max_sequential_same_tool": 5,
  "alert_webhook_url": "https://hooks.slack.com/services/...",
  "action_on_exceed": "BLOCK"
}

At 80% of the daily limit (R$80), your Slack receives:

Agent 'claudebot-production' used 80% of DAILY_COST budget — R$80.00 / R$100.00

If the agent continues, at R$100 it is fully blocked:

{
  "status": "DENIED",
  "reason": "Daily cost budget exceeded: R$100.42 / R$100.00",
  "policy_matched": "budget_guardian:DAILY_COST_BRL"
}

The R$1,500 bill becomes R$100 maximum.

Token Budget Guardian

On this page