skeg
benchmarks

slice C · concurrency under load

does the engine scale with concurrent clients?

Same scale as slice A's smallest point (N = 100K), same effort knob, now with a sweep of 1 / 4 / 16 / 64 concurrent clients. Where does throughput saturate? How does p99 latency degrade as the queue depth grows? Each engine is single-process here; this is the single-machine ceiling, not a cluster benchmark.

data provenance

Where the numbers come from. Same source, same generator, same ground truth for every engine in the comparison.

corpus

Simple English Wikipedia, passages ≥ 500 chars truncated to ~400 chars. Public dump preprocessed once and frozen for the bench.

embedder

mxbai-embed-large-v1, 1024 dimensions. sentence-transformers via Apple Metal (MPS). Same model used to embed corpus and queries (no cross-model leakage).

queries

1 000 hold-out passages (same set as slice A), each issued concurrently by 1, 4, 16, or 64 simulated clients.

ground truth

Top-100 nearest neighbours computed with exact brute-force cosine over float32 vectors. Computed once per scale, reused by every engine, frozen as a parquet next to the corpus.

throughput as clients pile on

QPS as concurrency grows from 1 to 64. The plateau is the single-machine ceiling for that engine · adding more clients past that point only deepens the queue. Where each engine's line flattens tells you how much concurrency the single shard can usefully absorb.

p99 latency under contention

Log scale. Past the saturation point above, p99 grows roughly linearly with concurrency (more clients = longer queue). The absolute number depends on protocol overhead; what matters is the shape · a sharp inflection would mean contention on a shared resource, not orderly queueing.

recall stays put

Recall@10 across concurrency levels. A flat line is the desired outcome: throughput pressure should not bend the recall curve. All engines deliver on this · the search algorithm is deterministic, the only thing changing is how fast results come out.

all numbers

Click headers to sort.

engine scale knob value concurrency recall@10 p50 µs p99 µs qps rss MiB
chroma-hnsw 100k ef 128 1 0.9883 3989 5227 245 848.0
chroma-hnsw 100k ef 128 4 0.9875 12161 14185 319 856.8
chroma-hnsw 100k ef 128 16 0.9888 48206 54322 301 763.3
chroma-hnsw 100k ef 128 64 0.9880 103159 195039 296 706.3
qdrant-hnsw 100k ef 128 1 0.9963 2740 3186 358 908.5
qdrant-hnsw 100k ef 128 4 0.9960 5595 9653 623 897.9
qdrant-hnsw 100k ef 128 16 0.9912 18910 41414 536 776.9
qdrant-hnsw 100k ef 128 64 0.9940 25075 143002 261 906.8
qdrant-pq 100k ef 128 1 0.8380 2521 3340 370 646.2
qdrant-pq 100k ef 128 4 0.8405 5215 10574 656 651.0
qdrant-pq 100k ef 128 16 0.8273 19220 40293 511 676.4
qdrant-pq 100k ef 128 64 0.8402 20882 68766 268 662.8
qdrant-sq 100k ef 128 1 0.9657 2178 2727 448 1020.9
qdrant-sq 100k ef 128 4 0.9640 4994 10014 696 1014.0
qdrant-sq 100k ef 128 16 0.9660 17284 42184 528 1006.6
qdrant-sq 100k ef 128 64 0.9647 15742 56861 277 1029.3
skeg-int8 100k l_search 300 1 0.9998 1651 2307 601 130.8
skeg-int8 100k l_search 300 4 0.9998 6169 7497 644 131.1
skeg-int8 100k l_search 300 16 0.9995 23885 26145 658 132.0
skeg-int8 100k l_search 300 64 1.0000 93924 103949 642 133.8
skeg-pq128 100k l_search 300 1 0.9995 1681 2378 584 59.0
skeg-pq128 100k l_search 300 4 0.9995 6309 7542 624 55.2
skeg-pq128 100k l_search 300 16 0.9998 26462 33311 588 57.7
skeg-pq128 100k l_search 300 64 0.9998 96594 100199 640 58.5

methodology in one minute