skeg
benchmarks

slice B · efficiency frontier

recall vs latency, per engine

Slice A fixes the effort knob and compares engines at one operating point. Slice B does the opposite: it sweeps each engine's effort knob across its whole range, at N = 500K, and plots the recall/latency frontier. Where's the knee? How much latency do you pay for the last point of recall? Is the default knob value right?

data provenance

Where the numbers come from. Same source, same generator, same ground truth for every engine in the comparison.

corpus

Simple English Wikipedia, passages ≥ 500 chars truncated to ~400 chars. Public dump preprocessed once and frozen for the bench.

embedder

mxbai-embed-large-v1, 1024 dimensions. sentence-transformers via Apple Metal (MPS). Same model used to embed corpus and queries (no cross-model leakage).

queries

1 000 hold-out passages (same set as slice A) re-used at this scale.

ground truth

Top-100 nearest neighbours computed with exact brute-force cosine over float32 vectors. Computed once per scale, reused by every engine, frozen as a parquet next to the corpus.

recall @ 10 vs p99 latency

Each dot is a (knob-value, recall, latency) point. The line connects the dots in sweep order, tracing one engine's frontier. Lower-right is better: high recall at low latency. Hover any dot for the exact knob value; the steepness of the line tells you how aggressive the recall/latency trade is on that engine.

QPS as the knob opens

Throughput at single-client concurrency as the effort knob grows. Each engine has a different knob (skeg l_search 50–800, qdrant ef 32–512, qdrant-pq adds nprobes), so the lines start at different X positions · that's the knob's valid range, not a render glitch. Compare lines vertically at the same X for a fair "at this effort level" view, and use the frontier above for the apples-to-apples recall-vs-latency picture.

all numbers

Click headers to sort.

engine scale knob value recall@10 recall@100 p50 µs p99 µs qps rss MiB
chroma-hnsw 500k ef 32 0.9788 0.9145 3859 6565 255 2272.9
chroma-hnsw 500k ef 64 0.9805 0.9153 3871 6609 255 2273.9
chroma-hnsw 500k ef 128 0.9854 0.9382 4346 7365 227 2275.9
chroma-hnsw 500k ef 256 0.9958 0.9770 6478 11113 155 2287.3
chroma-hnsw 500k ef 512 0.9992 0.9925 10046 15256 102 2292.9
qdrant-hnsw 500k ef 32 0.9846 0.9321 2651 3588 370 2347.9
qdrant-hnsw 500k ef 64 0.9822 0.9245 2580 3297 384 2331.8
qdrant-hnsw 500k ef 128 0.9888 0.9480 2712 3533 367 2334.0
qdrant-hnsw 500k ef 256 0.9963 0.9809 3454 4717 287 2369.0
qdrant-hnsw 500k ef 512 0.9995 0.9939 4700 6482 212 2376.0
qdrant-pq 500k ef 32 0.7770 0.8257 2285 3103 424 2380.0
qdrant-pq 500k ef 64 0.7817 0.8324 2252 2867 435 2346.8
qdrant-pq 500k ef 128 0.7813 0.8406 2494 3183 397 2386.0
qdrant-pq 500k ef 256 0.7830 0.8421 2855 3730 346 2392.9
qdrant-pq 500k ef 512 0.7815 0.8426 3507 4704 282 2379.8
qdrant-sq 500k ef 32 0.9503 0.9242 1997 3240 484 2677.1
qdrant-sq 500k ef 64 0.9506 0.9317 2011 3058 487 2885.0
qdrant-sq 500k ef 128 0.9547 0.9547 2191 3066 447 2363.6
qdrant-sq 500k ef 256 0.9557 0.9679 2629 3513 372 2779.7
qdrant-sq 500k ef 512 0.9613 0.9703 3031 4269 326 2789.5
skeg-int8 500k l_search 50 0.9949 0.9652 754 1443 1265 630.7
skeg-int8 500k l_search 100 0.9950 0.9652 737 1508 1309 623.3
skeg-int8 500k l_search 150 0.9974 0.9881 994 1797 984 629.0
skeg-int8 500k l_search 200 0.9987 0.9927 1188 2110 825 632.1
skeg-int8 500k l_search 300 0.9994 0.9963 1643 2865 597 636.3
skeg-pq128 500k l_search 50 0.9909 0.7294 880 1357 1113 219.7
skeg-pq128 500k l_search 100 0.9910 0.7291 892 1376 1100 229.1
skeg-pq128 500k l_search 200 0.9979 0.9202 1456 2284 685 221.2
skeg-pq128 500k l_search 300 0.9994 0.9683 1829 3008 540 217.7
skeg-pq128 500k l_search 500 0.9999 0.9861 2620 4438 379 218.7
skeg-pq128 500k l_search 800 0.9999 0.9867 3543 5771 279 224.9