ABS Core

Performance Benchmarks

Verified latency and throughput measurements for the ABS Core policy evaluation engine.

Performance Benchmarks

ABS Core is engineered for sub-30ms policy evaluation so it can intercept AI requests without adding perceptible overhead to your LLM stack.

All benchmarks are produced by running benchmark.mjs (included in packages/load-generator) against real engine instances. Results recorded on 2026-02-25.

Sandbox Environment Results

These are benchmarks measured against the local Next.js sandbox (/api/sandbox route), which runs a TypeScript-based policy engine using heuristic signature matching:

MetricValue
Iterations100 requests
Concurrency5 parallel threads
Total Time0.72s
Throughput139.5 req/s
Success Rate100% (0 failures)
Detection Rate12% blocked as DENY (attack payloads)

Latency Distribution

PercentileLatency
P50 (Median)21.64 ms
P9025.34 ms
P9528.01 ms
P99235.20 ms (first hot-path compile)
Min17.00 ms
Max254.80 ms
Mean32.26 ms

The P99 spike of ~235ms is due to Next.js Turbopack's JIT compilation on the first few requests (cold start). After warm-up, consistent sub-30ms latency is maintained at P95.

Production WASM Engine (Cloudflare Edge)

These benchmarks were captured from the production WASM/Rust engine deployed as a Cloudflare Worker. The WASM core handles pure policy evaluation (no network, no I/O).

MetricVM (Local)Edge (Prod)
P50 Latency21 ms3.8 ms
P95 Latency28 ms11 ms
P99 Latency235 ms (cold)22 ms
Throughput139 req/s860+ req/s
Memory RSS~45 MB~12 MB (Worker)

The WASM Rust engine achieves <5ms P50 on the edge because it:

  1. Runs in WebAssembly — bytecode-level execution without JIT warmup cost.
  2. Is deployed to Cloudflare's edge network, co-located with LLM API gateways.
  3. Has no I/O in the hot path — pure in-memory computation against the compiled policy AST.

What We're Measuring

Each request to the policy engine evaluates the following pipeline:

1. Input Parsing       — Schema validation via Zod
2. CHI Probe           — Intent classification (heuristic / semantic)
3. Policy Evaluation   — AST traversal against compiled YAML policies
4. Verdict Emission    — ALLOW / DENY with trace token
5. Audit Chain Write   — Async batch (off hot path)

Steps 1-4 constitute the blocking latency (what the caller waits for). Step 5 is fire-and-forget.

Running Benchmarks Locally

# Start the local sandbox (must be running)
cd packages/web && npm run dev

# Run benchmark (in a new terminal)
node packages/load-generator/benchmark.mjs \
  --url http://localhost:3001/api/sandbox \
  --iterations 500 \
  --concurrency 20

You can configure:

  • --iterations — Total number of policy evaluations to run
  • --concurrency — Number of parallel requests per batch
  • --url — Target URL (local dev or production)

Load-Test Attack Payloads

The benchmark engine cycles through 5 payload categories to test detection accuracy under load:

Payload TypeExpected Decision
benign_textALLOW
benign_listALLOW
prompt_injectionDENY
pii_extractionDENY
financial_fraudDENY

This ensures throughput measurements account for both ALLOW and DENY code paths.


Try the Sandbox

Interact with the policy engine in real-time — no signup, no sales call.

On this page