ABS Core
Features

Token Budget Guardian

Prevent runaway AI agent spending. Set cost limits in BRL, detect subagent loops, and receive alerts before budgets are exhausted.

Token Budget Guardian

Why this exists

A single autonomous agent using subagents can generate R$1,500+ in LLM costs in a single day without any control layer. This happened with a real customer running ClaudeBot with sub-agents — the session ran unchecked until the provider invoice arrived.

The Token Budget Guardian intercepts every tool call before execution, checking against configured limits. If a limit is exceeded, the call is blocked with reason code BUDGET.EXCEEDED before any tokens are spent.

How it works

Every POST /v1/authorize request now passes through the Budget Guardian before the policy engine:

Request → [Budget Check] → [Hallucination Shield] → [WASM Policy Engine] → Decision

If any budget limit is exceeded:

  • Response: status: "DENIED", policy_matched: "budget_guardian:DAILY_COST_BRL"
  • No tokens are spent on the blocked tool call
  • Violation is logged to the audit trail

Configure a budget

POST /v1/budget/{agent_id}
Authorization: Bearer {your-api-key}

{
  "max_cost_brl_per_day": 50.00,
  "max_cost_brl_per_hour": 10.00,
  "max_tool_calls_per_minute": 30,
  "max_sequential_same_tool": 5,
  "alert_webhook_url": "https://hooks.slack.com/...",
  "alert_threshold_pct": 0.8,
  "action_on_exceed": "BLOCK"
}
FieldDefaultDescription
max_cost_brl_per_daynonePrimary protection: block agent if daily BRL spend exceeds this
max_cost_brl_per_hournoneHourly rolling cost cap
max_tokens_per_daynoneRaw token cap (if you prefer token-based limits)
max_tool_calls_per_minute60RPM cap — detects rapid subagent loops
max_sequential_same_tool10Block if same tool called N consecutive times
alert_webhook_urlnonePOST alert to this URL when threshold is reached
alert_threshold_pct0.8Alert at this % of budget (default: 80%)
action_on_exceedBLOCKBLOCK, ALERT_ONLY, or THROTTLE

Check current usage

GET /v1/budget/{agent_id}
{
  "agent_id": "my-agent",
  "today": {
    "cost_brl": 12.40,
    "tokens": 145000,
    "remaining_budget_pct": 0.752
  },
  "current_hour": {
    "cost_brl": 2.10,
    "tokens": 24500
  },
  "status": "OK"
}

Status values: OK | WARNING (< 50% remaining) | CRITICAL (< 20%) | EXHAUSTED | NO_LIMIT

Subagent loop detection

The Guardian detects two loop patterns automatically:

RPM loop: Agent spawns subagents that each call tools rapidly. If the agent exceeds max_tool_calls_per_minute, all further calls are blocked.

Consecutive same-tool loop: If the same tool is called N consecutive times (default: 10), the Guardian assumes a loop and blocks. This catches the classic pattern of a broken agent calling search_web in an infinite retry loop.

When a loop is detected:

{
  "status": "DENIED",
  "reason": "Subagent loop detected: 'search_web' called 11 consecutive times (max: 10).",
  "policy_matched": "budget_guardian:LOOP_DETECTED"
}

Supported LLM models (BRL pricing)

The Guardian estimates costs automatically based on the model field in the request:

ModelInput (per 1M tokens, BRL)Output
Claude Opus 4R$87.00R$261.00
Claude Sonnet 4R$17.40R$87.00
Claude Haiku 4R$1.16R$5.80
GPT-4oR$14.50R$43.50
Gemini 1.5 ProR$8.70R$26.10
Other/DefaultR$10.00R$30.00

Prices use approximate BRL conversion and are updated quarterly.

View violations

GET /v1/budget/{agent_id}/violations?limit=50

Returns a full audit trail of all budget violations, including what was blocked, when, and at what value.

Persistence & schema

As of v10.1.5, the Token Budget Guardian uses three dedicated tables in the Shield database, initialized automatically on startup:

TablePurpose
token_budgetsStores budget configuration per agent_id
token_usageRecords cumulative usage (tokens, cost, call counts) per time window
budget_violationsImmutable log of all enforcement events with reason codes

These tables are created via initSchema() and indexed for low-latency budget lookups on every /v1/authorize call. No manual migration is required — the schema is managed automatically by the Shield runtime.

Prior to v10.1.5, these tables were missing from initSchema() and all TokenBudgetGuardian queries silently failed. If you were running an earlier version, upgrade to v10.1.5 to activate budget enforcement.

Example: The R$1,500 scenario, prevented

Configure the agent before first run:

POST /v1/budget/claudebot-production
{
  "max_cost_brl_per_day": 100.00,
  "max_tool_calls_per_minute": 20,
  "max_sequential_same_tool": 5,
  "alert_webhook_url": "https://hooks.slack.com/services/...",
  "action_on_exceed": "BLOCK"
}

At 80% of the daily limit (R$80), your Slack receives:

Agent 'claudebot-production' used 80% of DAILY_COST budget — R$80.00 / R$100.00

If the agent continues, at R$100 it is fully blocked:

{
  "status": "DENIED",
  "reason": "Daily cost budget exceeded: R$100.42 / R$100.00",
  "policy_matched": "budget_guardian:DAILY_COST_BRL"
}

The R$1,500 bill becomes R$100 maximum.

On this page