Latency SLA & Performance Specifications
Formal SLO commitments, consistency model, idempotency guarantees, and circuit breaker behavior for ABS Core v10.1.5+.
Latency SLA & Performance Specifications
Effective: v10.1.5+ · Last updated: 2026-02-26
This document defines the formal service-level objectives (SLOs), consistency model, and failure behavior of the ABS Core governance engine. It is intended for risk committees, enterprise architects, and compliance officers evaluating ABS Core for regulated-environment deployment.
Formal SLO Commitments
Operational Load (up to 1,000 req/s)
| Metric | Commitment | Basis |
|---|---|---|
| P50 latency | < 200ms | Validated: 161ms at 1,000 req/s (60 min test) |
| P99 latency | < 300ms | Validated: 289ms at 1,000 req/s (60 min test) |
| Error rate (5xx) | 0% | Validated: 0 errors across 3.6M requests |
| Monthly availability | 99.95% | ~4.4h allowable downtime/year |
| Audit log completeness | 100% | Every request — passed or rejected — is recorded |
Overload Protection (above 1,000 req/s)
| Behavior | Commitment |
|---|---|
| Rate limit response | HTTP 429 with Retry-After header — never silent drop |
| 5xx under overload | 0 — validated at 5,000 req/s (15 min stress test) |
| Audit integrity under overload | 100% — all requests recorded including rejected ones |
| Degradation mode | Graceful shed via rate limiter, not crash |
Latency Budget by Component
| Component | Mode | Added Latency | Notes |
|---|---|---|---|
| WASM Policy Engine | Inline (blocking) | < 2ms | Pure in-memory evaluation — no I/O |
| CHI Semantic Analysis | Async (parallel) | < 150ms | Gemini 2.0 Flash — runs in parallel with LLM call |
| PII Redaction | Inline | < 1ms | Regex + entropy scan — deterministic |
| Audit Hash + L2 Queue | Async (non-blocking) | 0ms added to P50 | Hash computed inline; L2 anchor async |
| Full governance pipeline | Combined | ~153ms P50 | Validated across 9M+ requests |
LLM context: A standard LLM response (OpenAI GPT-4o, Anthropic Claude) takes 500ms–2,000ms. The 153ms P50 governance overhead represents <10% of total agent response time at P50, and is below the perceptual threshold for end users.
Consistency Model
This section addresses the consistency and transactional guarantees required for regulated financial environments.
Decision Consistency: Strong
Every POST /v1/decide or POST /v1/authorize call is:
- Synchronous — the caller receives a verdict before any downstream action executes
- Deterministic — same input + same policy version → same verdict, always
- Versioned — the active policy version is recorded in every decision envelope
There is no eventual consistency in the decision path. A decision is either ALLOW or DENY — never "pending" or "probably allow."
Audit Log Consistency: Write-once, append-only
| Property | Behavior |
|---|---|
| Write model | Append-only — no update or delete operations on audit records |
| Durability | Written to D1 (Cloudflare) before response is returned to caller |
| Hash chaining | Each record includes SHA-256 of the previous record — tamper-evident |
| L2 anchoring | Batches anchored to Polygon L2 asynchronously — does not block response (Enterprise tier) |
| Consistency on read | Strong consistency within a single region; eventual across regions (<500ms) |
Idempotency
All write operations in ABS Core are idempotent by event_id:
// Submitting the same event_id twice → same result, no duplicate record
const result = await abs.process({
event_id: "evt_TXN-4421-refund", // client-generated stable ID
tenant_id: "my-tenant",
event_type: "agent.action",
payload: { action: "WRITE", target: "accounts/acc_123/refund", amount: 250.00 },
}, { sync: true });
const decision = result.envelope;If the same event_id is submitted twice (e.g., due to network retry), the second call returns the original decision without creating a duplicate audit record. This guarantee is critical for payment systems where retry-on-failure is standard practice.
v10.1.5 fix: Prior to v10.1.5, the trace_id field was used as the idempotency key. This was incorrect — trace_id is assigned by the server per-request, not stable across retries. v10.1.5 introduced the client-controlled event_id as the correct idempotency key. Callers on <v10.1.5 must upgrade before relying on idempotency guarantees.
MTTR and Recovery Commitments
| Scenario | Target | Behavior |
|---|---|---|
| Cloudflare edge node failure | < 30s | Traffic automatically rerouted to adjacent PoP |
| CHI semantic engine timeout | < 200ms | Circuit breaker → Fail-Safe ALLOW + audit flag |
| D1 write latency spike | Transparent | Response returned first; D1 write retried async |
| Polygon L2 congestion | Transparent | L2 anchor queued; local hash written immediately (Enterprise) |
| Full region outage (rare) | < 15 min MTTR | Cloudflare multi-region failover |
RTO / RPO
| Parameter | Value | Notes |
|---|---|---|
| RTO (Recovery Time Objective) | < 15 minutes | Time to restore service after full outage |
| RPO (Recovery Point Objective) | 0 for decisions | No decision data is losable — written before response |
| RPO for L2 anchoring | < 1 block cycle (~2s) | L2 anchor may be delayed; local hash is never lost |
Circuit Breaker Behavior
The CHI semantic analysis engine has a 200ms hard timeout. If exceeded:
- The request proceeds with Fail-Safe ALLOW — the agent's action is not blocked
- The event is flagged as
chi_timeout: truein the audit record - A Sentry alert is fired for the workspace
- The CHI engine is bypassed for subsequent requests until it recovers (exponential backoff, max 30s)
This design ensures the governance layer never becomes a single point of failure that halts production systems — a hard requirement for financial infrastructure.
Shadow Mode (Non-Blocking Governance)
For high-frequency, low-stakes operations where even 153ms P50 overhead is unacceptable:
# policy.yaml
mode: shadow # analyze but do not block
enforcement: strict # when promoted, full blocking applies
alert_on_violation: trueIn shadow mode:
- All requests pass through regardless of verdict
- Violations are recorded in the audit log with
shadow: trueflag - Dashboards show violation rate for that operation class
- Teams can promote to
enforcementmode when confident in policy correctness
Shadow mode is the recommended entry point for new agent integrations and high-frequency read operations.
Benchmark Reference
All numbers in this document are derived from the Benchmark Report, which documents three test classes (endurance at 200 req/s for 2h, load at 1,000 req/s for 60 min, stress at 5,000 req/s for 15 min) with results anchored on the Bitcoin blockchain.