ABS Core Performance Benchmarks (Verified)
ABS Core Performance Benchmarks (Verified)
Last Updated: March 12, 2026
Environment: Production deployment (O-Bot platform)
Methodology: Real-world measurements over 30 days, 350k+ requests
Executive Summary
ABS Core adds ~25ms governance overhead to AI agent tool calls.
For LLM operations averaging 800-2000ms, this represents 1-3% overhead in exchange for:
- [OK] Cryptographic audit trail
- [OK] Policy-based enforcement
- [OK] Secret injection (JIT)
- [OK] Identity verification
Latency Breakdown (Production)
Engine Core (WASM)
Component: Policy Evaluation Engine (Rust/WASM)
Measurement: Hot path execution only
Median (p50): 1.2ms
p95: 1.8ms
p99: 2.4ms
Max observed: 4.1ms (cold start)
Throughput: ~12,500 evaluations/second (single core)What this measures:
- Policy rule matching (deterministic)
- JSON parsing/validation
- Hash computation (SHA-256)
- Decision output (ALLOW/DENY/ESCALATE)
What this excludes:
- Network I/O
- Database writes (audit log)
- Secret vault lookup
- TLS handshake
Complete Governance Loop (End-to-End)
Component: Full ABS Core Gateway (production)
Measurement: Request in → Decision out + audit persisted
Median (p50): 23ms
p95: 38ms
p99: 52ms
Max observed: 87ms (p99.9, database spike)
Throughput: ~850 governed requests/second (production load)Latency components:
1.2ms - WASM policy engine
3.5ms - Request parsing + validation
8.2ms - Audit log write (PostgreSQL)
4.8ms - Secret vault lookup (Cloudflare KV)
5.3ms - Network overhead (sidecar → gateway)
-------
~23ms Total (median)Deployment Variations
Cloudflare Workers (Edge)
Median: 18ms
p95: 28ms
Benefit: Reduced network hops (edge-native)
Limitation: Cold starts (~40ms) on first requestDocker Sidecar (On-Premise)
Median: 25ms
p95: 42ms
Benefit: No cold starts
Limitation: Localhost network overheadKubernetes (Multi-Region)
Median: 30ms
p95: 55ms
Benefit: High availability
Limitation: Cross-pod communicationContext: LLM Call Comparison
Typical LLM API latencies (GPT-4, Claude 3):
OpenAI GPT-4 Turbo:
Median: ~1200ms (streaming start: ~400ms)
Anthropic Claude 3 Opus:
Median: ~1800ms (streaming start: ~600ms)
OpenAI GPT-4o-mini:
Median: ~600ms (streaming start: ~200ms)ABS Core overhead as percentage:
GPT-4 Turbo: 23ms / 1200ms = 1.9% overhead
Claude 3 Opus: 23ms / 1800ms = 1.3% overhead
GPT-4o-mini: 23ms / 600ms = 3.8% overheadConclusion: Governance overhead is negligible compared to LLM latency.
Performance Optimization History
v1.0.0 (Initial Release)
- Median: 62ms
- p95: 98ms
- Bottleneck: Synchronous database writes
v1.1.0 (PostgreSQL Optimization)
- Median: 35ms (-43%)
- p95: 58ms (-41%)
- Fix: Async audit writes, connection pooling
v4.0.0 (Current - Edge Migration)
- Median: 23ms (-34%)
- p95: 38ms (-34%)
- Fix: Migrated to Cloudflare D1, optimized WASM binary size
Roadmap (Q2 2026 - Target):
- Median: <15ms
- p95: <25ms
- Planned: Native WASM deployment (remove JS shim), eBPF kernel hooks
How We Measure
Production Instrumentation
// Trace ID injected at gateway entry
const traceId = crypto.randomUUID();
const start = performance.now();
// Phase 1: Policy evaluation
const t1 = performance.now();
const decision = await policyEngine.evaluate(request);
const engineLatency = performance.now() - t1;
// Phase 2: Audit persistence
const t2 = performance.now();
await auditLog.write({ traceId, decision });
const auditLatency = performance.now() - t2;
// Phase 3: Secret injection (if ALLOW)
const t3 = performance.now();
const secret = await vault.fetch(toolName);
const vaultLatency = performance.now() - t3;
const totalLatency = performance.now() - start;
// Export to metrics endpoint
await telemetry.record({
traceId,
totalLatency,
engineLatency,
auditLatency,
vaultLatency
});Metrics Aggregation
- Tool: Prometheus + Grafana
- Retention: 90 days rolling window
- Sampling: 100% of production requests (O-Bot deployment)
- Public Dashboard: https://metrics.abscore.app (coming Q2 2026)
Comparison with Alternatives
Baseline: No Governance
Direct LLM call (no ABS): 0ms overhead
Risk: No audit, no policy enforcement, secrets exposedAlternative: Simple Reverse Proxy (nginx)
Median overhead: ~8ms
Capabilities: Routing only (no policy, no audit)Alternative: Application-Level Logging
Median overhead: ~15ms (database write)
Capabilities: Post-hoc audit only (no enforcement)ABS Core
Median overhead: 23ms
Capabilities: Real-time enforcement + audit + secrets + identityValue proposition:
For +15ms over basic logging, you get proactive enforcement instead of reactive forensics.
SLA Commitment (Enterprise)
Production SLA (paid deployments):
p95 latency: < 50ms (guaranteed)
p99 latency: < 100ms (target)
Availability: 99.9% (uptime)If SLA violated:
- Monthly service credit (10% per hour of breach)
- Incident root cause analysis (48h)
- Performance optimization plan
Verification
Production Evidence:
- O-Bot deployment: 350k+ requests over 30 days
- Raw Prometheus [Link to sanitized export]
- Grafana dashboard: [Screenshot in evidence/]
Independent Validation:
- Security audit (Q2 2026): Third-party performance testing
- Load testing: k6 scripts available in
/evidence/benchmarks/
Reproduce locally:
cd /evidence/benchmarks/
./run_local_benchmark.shHonest Limitations
When ABS Core overhead matters:
- Ultra-low-latency requirements (<100ms total)
- High-frequency trading systems
- Real-time voice/video agents
- Embedded systems (resource-constrained)
When ABS Core overhead is negligible:
- Standard LLM workflows (>500ms base latency)
- Enterprise automation (human-in-loop)
- Regulated industries (compliance > speed)
- Asynchronous agent systems
Questions?
Technical details: [email protected]
Performance concerns: [email protected]
Benchmark data access: https://github.com/abs-core/benchmarks
Changelog:
- 2026-03-12: Corrected all documentation to reflect production measurements
- 2026-03-01: Added p99.9 data from O-Bot deployment
- 2026-02-15: Initial performance whitepaper