Performance Benchmarks
Verified latency and throughput measurements for the ABS Core policy evaluation engine.
Performance Benchmarks
ABS Core is engineered for sub-30ms policy evaluation so it can intercept AI requests without adding perceptible overhead to your LLM stack.
All benchmarks are produced by running benchmark.mjs (included in packages/load-generator) against real engine instances. Results recorded on 2026-02-25.
Sandbox Environment Results
These are benchmarks measured against the local Next.js sandbox (/api/sandbox route), which runs a TypeScript-based policy engine using heuristic signature matching:
| Metric | Value |
|---|---|
| Iterations | 100 requests |
| Concurrency | 5 parallel threads |
| Total Time | 0.72s |
| Throughput | 139.5 req/s |
| Success Rate | 100% (0 failures) |
| Detection Rate | 12% blocked as DENY (attack payloads) |
Latency Distribution
| Percentile | Latency |
|---|---|
| P50 (Median) | 21.64 ms |
| P90 | 25.34 ms |
| P95 | 28.01 ms |
| P99 | 235.20 ms (first hot-path compile) |
| Min | 17.00 ms |
| Max | 254.80 ms |
| Mean | 32.26 ms |
The P99 spike of ~235ms is due to Next.js Turbopack's JIT compilation on the first few requests (cold start). After warm-up, consistent sub-30ms latency is maintained at P95.
Production WASM Engine (Cloudflare Edge)
These benchmarks were captured from the production WASM/Rust engine deployed as a Cloudflare Worker. The WASM core handles pure policy evaluation (no network, no I/O).
| Metric | VM (Local) | Edge (Prod) |
|---|---|---|
| P50 Latency | 21 ms | 3.8 ms |
| P95 Latency | 28 ms | 11 ms |
| P99 Latency | 235 ms (cold) | 22 ms |
| Throughput | 139 req/s | 860+ req/s |
| Memory RSS | ~45 MB | ~12 MB (Worker) |
The WASM Rust engine achieves <5ms P50 on the edge because it:
- Runs in WebAssembly — bytecode-level execution without JIT warmup cost.
- Is deployed to Cloudflare's edge network, co-located with LLM API gateways.
- Has no I/O in the hot path — pure in-memory computation against the compiled policy AST.
What We're Measuring
Each request to the policy engine evaluates the following pipeline:
1. Input Parsing — Schema validation via Zod
2. CHI Probe — Intent classification (heuristic / semantic)
3. Policy Evaluation — AST traversal against compiled YAML policies
4. Verdict Emission — ALLOW / DENY with trace token
5. Audit Chain Write — Async batch (off hot path)Steps 1-4 constitute the blocking latency (what the caller waits for). Step 5 is fire-and-forget.
Running Benchmarks Locally
# Start the local sandbox (must be running)
cd packages/web && npm run dev
# Run benchmark (in a new terminal)
node packages/load-generator/benchmark.mjs \
--url http://localhost:3001/api/sandbox \
--iterations 500 \
--concurrency 20You can configure:
--iterations— Total number of policy evaluations to run--concurrency— Number of parallel requests per batch--url— Target URL (local dev or production)
Load-Test Attack Payloads
The benchmark engine cycles through 5 payload categories to test detection accuracy under load:
| Payload Type | Expected Decision |
|---|---|
benign_text | ALLOW |
benign_list | ALLOW |
prompt_injection | DENY |
pii_extraction | DENY |
financial_fraud | DENY |
This ensures throughput measurements account for both ALLOW and DENY code paths.
Try the Sandbox
Interact with the policy engine in real-time — no signup, no sales call.