ABS Core Performance Benchmarks (Verified)

Last Updated: 2026-04-22
Environment: Production deployment (O-Bot Pilot / Banking Tier)
Version: v4.3.3 HARDENED Methodology: Real-world measurements over 30 days, 350,000+ requests (cumulative)

Executive Summary

ABS Core adds ~23ms median governance overhead to AI agent tool calls.

For LLM operations averaging 800-2000ms, this represents 1-4% overhead in exchange for non-repudiation proof (NRaaS).

[!IMPORTANT] Anti-AI Washing Proof: Engine vs. Loop Distinction To ensure maximum transparency for institutional auditors (BNDES/FINEP), we distinguish between:

Engine Latency (1.2ms Median): The isolated Rust/WASM kernel evaluation time.

Governance Loop (~23ms median): The total end-to-end time including I/O, persistence, and secrets.

We do not market the engine-only speed as the system's runtime overhead. The ~23ms median figure is the production truth. v4.3.3 HARDENED ensures this integrity.

Throughput Validated

Engine Throughput:
- Sustained Production Peak: 12,500 evaluations/second (governed agent actions)
- Maximum Tested Capacity: 235,000+ evaluations/second (isolated benchmark)

Note: Throughput varies based on policy complexity and deployment configuration. Peak values represent cluster benchmarks.

Latency Specification

Kernel Evaluation Speed (Sovereign WASM Kernel)

Component: Isolated evaluation cell (Rust/WASM)
Measurement: Hot path execution only

Median (p50):    1.2ms
p99:             3.8ms
p99.9:           4.5ms
max observed:    4.1ms (cold start evaluation)

Peak Engine Throughput: 235,000+ req/sec

What this measures:

Policy rule matching (deterministic)
JSON parsing/validation
Hash computation (SHA-256)
Decision output (ALLOW/DENY/ESCALATE)

Complete Governance Loop (End-to-End)

Component: Full ABS Core Gateway (production)
Measurement: Request in → Decision out + audit persisted

Median (p50):    23ms
p99:             45ms  
p99.9:           120ms (worst-case I/O spike)

Throughput: ~12.5K evaluations/sec (sustained peak)

Latency components:

  1.2ms   - WASM policy engine
  6.7ms   - Request parsing + validation
  6.8ms   - Audit log write (PostgreSQL/D1)
  4.8ms   - Secret vault lookup (Cloudflare KV)
  5.5ms   - Network overhead (sidecar → gateway)
-------
 ~23ms  Total (median)

Deployment Variations

Cloudflare Workers (Edge)

Median:  18ms
p95:     28ms

Benefit: Reduced network hops (edge-native)
Limitation: Cold starts (~40ms) on first request

Docker Sidecar (On-Premise)

Median:  23ms
p95:     42ms

Benefit: No cold starts
Limitation: Localhost network overhead

Kubernetes (Multi-Region)

Median:  30ms  
p95:     55ms

Benefit: High availability
Limitation: Cross-pod communication

Context: LLM Call Comparison

Typical LLM API latencies (GPT-4, Claude 3):

OpenAI GPT-4 Turbo:
  Median: ~1200ms (streaming start: ~400ms)
  
Anthropic Claude 3 Opus:
  Median: ~1800ms (streaming start: ~600ms)
  
OpenAI GPT-4o-mini:
  Median: ~600ms (streaming start: ~200ms)

ABS Core overhead as percentage:

GPT-4 Turbo:     23ms / 1200ms = 2.1% overhead
Claude 3 Opus:   23ms / 1800ms = 1.4% overhead  
GPT-4o-mini:     23ms / 600ms  = 4.1% overhead

Conclusion: Governance overhead is negligible compared to LLM latency.

Performance Optimization History

v1.0.0 (Initial Release)

Median: 62ms
p95: 98ms
Bottleneck: Synchronous database writes

v1.1.0 (PostgreSQL Optimization)

Median: 35ms (-43%)
p95: 58ms (-41%)
Fix: Async audit writes, connection pooling

v4.3.3 (Current - Edge Migration)

Median: 23ms
p95: 38ms
Fix: Migrated to Cloudflare D1, optimized WASM binary size

Roadmap (Q2 2026 - Target):

Median: <15ms
p95: <23ms
Planned: Native WASM deployment (remove JS shim), eBPF kernel hooks

How We Measure

Production Instrumentation

// Trace ID injected at gateway entry
const traceId = crypto.randomUUID();
const start = performance.now();

// Phase 1: Policy evaluation
const t1 = performance.now();
const decision = await policyEngine.evaluate(request);
const engineLatency = performance.now() - t1;

// Phase 2: Audit persistence  
const t2 = performance.now();
await auditLog.write({ traceId, decision });
const auditLatency = performance.now() - t2;

// Phase 3: Secret injection (if ALLOW)
const t3 = performance.now();
const secret = await vault.fetch(toolName);
const vaultLatency = performance.now() - t3;

const totalLatency = performance.now() - start;

// Export to metrics endpoint
await telemetry.record({
  traceId,
  totalLatency,
  engineLatency,
  auditLatency,
  vaultLatency
});

Metrics Aggregation

Tool: Prometheus + Grafana
Retention: 90 days rolling window
Sampling: 100% of production requests (O-Bot Pilot / Banking Tier)
Public Dashboard: https://metrics.abscore.app (coming Q2 2026)

Comparison with Alternatives

Baseline: No Governance

Direct LLM call (no ABS): 0ms overhead
Risk: No audit, no policy enforcement, secrets exposed

Alternative: Simple Reverse Proxy (nginx)

Median overhead: ~8ms
Capabilities: Routing only (no policy, no audit)

Alternative: Application-Level Logging

Median overhead: ~15ms (database write)
Capabilities: Post-hoc audit only (no enforcement)

ABS Core

Median overhead: ~23ms median
Capabilities: Real-time enforcement + audit + secrets + identity

Value proposition:
For +15ms over basic logging, you get proactive enforcement instead of reactive forensics.

SLA Commitment (Enterprise)

Production SLA (paid deployments):

p95 latency:  &lt; 50ms  (guaranteed)
p99 latency:  &lt; 100ms (target)
Availability: 99.9%   (uptime)

If SLA violated:

Monthly service credit (10% per hour of breach)
Incident root cause analysis (48h)
Performance optimization plan

Verification

Production Evidence:

O-Bot Pilot (Banking Tier): 350,000+ requests governed
Raw Prometheus [Link to sanitized export]
Grafana dashboard: [Screenshot in evidence/]

Independent Validation:

Security audit (Q2 2026): Third-party performance testing
Load testing: k6 scripts available in /evidence/benchmarks/

Reproduce locally:

cd /evidence/benchmarks/
./run_local_benchmark.sh

Honest Limitations

When ABS Core overhead matters:

Ultra-low-latency requirements (<100ms total)
High-frequency trading systems
Real-time voice/video agents
Embedded systems (resource-constrained)

When ABS Core overhead is negligible:

Standard LLM workflows (>500ms base latency)
Enterprise automation (human-in-loop)
Regulated industries (compliance > speed)
Asynchronous agent systems

Questions?

Technical details: [email protected]
Performance concerns: [email protected]
Benchmark data access: https://github.com/abs-core/benchmarks

Changelog:

2026-03-12: Corrected all documentation to reflect production measurements
2026-03-01: Added p99.9 data from O-Bot Pilot (Banking Tier) deployment
2026-02-15: Initial performance whitepaper

ABS Core Performance Benchmarks (Verified)

On this page