skeg
benchmarks

slice A · competitive matrix

six engines, three scales, four metrics

How does each engine scale on the same corpus as the dataset grows? Slice A fixes the effort knob (skeg l_search=300, qdrant ef=128, chroma ef=128), runs each engine three times at 100K, 500K, and 1M vectors, and reports the median. The same single corpus is used by everyone: Simple English Wikipedia passages, embedded with mxbai-embed-large-v1 1024d.

data provenance

Where the numbers come from. Same source, same generator, same ground truth for every engine in the comparison.

corpus

Simple English Wikipedia, passages ≥ 500 chars truncated to ~400 chars. Public dump preprocessed once and frozen for the bench.

embedder

mxbai-embed-large-v1, 1024 dimensions. sentence-transformers via Apple Metal (MPS). Same model used to embed corpus and queries (no cross-model leakage).

queries

1 000 hold-out passages, disjoint from corpus

ground truth

Top-100 nearest neighbours computed with exact brute-force cosine over float32 vectors. Computed once per scale, reused by every engine, frozen as a parquet next to the corpus.

recall @ 10

Of the engine's top-10 nearest neighbours, how many appear in the brute-force exact top-10? Higher is better. Above 0.99 the engines are functionally identical; below 0.95 you're starting to miss material.

recall @ 100

A tougher recall test: of the top-100, how many are real top-100? This punishes engines whose quantization is too aggressive (qdrant-pq in particular bleeds quality here).

disk vs RAM · how much of the index is actually loaded

Same dataset on disk for everyone (mxbai 1024d, ~4 GB at 1M). The question is: how much of that ends up resident in RAM during queries? HNSW backends (qdrant, chroma) keep the full index in RAM by design: their RSS tracks disk size almost 1:1. skeg is SSD-primary: PQ-128 codes are 32× smaller than f32 vectors, only hot pages of the Vamana graph are kept warm, and the OS evicts the rest. At 1M, skeg-pq128 holds about 10% of the index in RAM while qdrant-hnsw holds 96%. Same dataset, same recall, ten times less memory budget consumed.

RSS at steady state

Resident memory of the engine process during the query loop. On a personal-AI machine the LLM owns most of the RAM · the vector store should not contend for it. Lower is better, and the gap matters because each MiB the store doesn't use is one the LLM can keep warm.

query latency p99

99th-percentile single-query latency at concurrency=1. Includes protocol overhead, not just search; see slice B for the recall/latency frontier per engine.

throughput at concurrency = 1

Sustained QPS in a tight single-client loop. Higher is better. skeg's binary protocol gives it a transport edge here; for fairness, treat these numbers as the single-machine ceiling · see also slice C for how each engine scales under concurrent load.

build time

Time to ingest the corpus and build the index. One-shot cost, paid once per dataset. Less interesting in production (you do it at prep-time) but useful for reproducibility.

all numbers

Click headers to sort.

engine scale n vectors recall@10 recall@100 p50 µs p99 µs qps rss MiB build s disk MiB
chroma-hnsw 100k-minilm 100,000 0.9765 0.9247 2736 3622 354 501.8 25.8 178.5
qdrant-hnsw 100k-minilm 100,000 0.9855 0.9666 2006 2627 483 536.9 26.6 625.0
qdrant-pq 100k-minilm 100,000 0.7945 0.8279 2155 3218 449 389.2 37.3 511.0
qdrant-sq 100k-minilm 100,000 0.9580 0.9481 1421 1937 677 561.2 25.7 596.2
skeg-int8 100k-minilm 100,000 0.9975 0.9950 1236 1600 806 73.4 32.1 172.0
skeg-pq128 100k-minilm 100,000 0.9970 0.9960 1842 2418 542 50.4 31.6 172.0
chroma-hnsw 100k 100,000 0.9900 0.9594 3897 4928 253 734.6 41.8 425.1
qdrant-hnsw 100k 100,000 0.9950 0.9820 2631 3275 370 877.8 63.4 942.5
qdrant-pq 100k 100,000 0.8395 0.8722 2523 3103 385 647.9 95.0 757.6
qdrant-sq 100k 100,000 0.9630 0.9677 2147 2684 455 1004.4 61.8 1074.6
skeg-int8 100k 100,000 1.0000 0.9973 1688 2344 590 130.8 51.9 416.2
skeg-pq128 100k 100,000 0.9995 0.9833 1681 2362 586 62.9 52.2 416.2
chroma-hnsw 500k 500,000 0.9857 0.9379 4408 7438 226 2388.4 353.5 2109.4
qdrant-hnsw 500k 500,000 0.9880 0.9475 2890 3686 343 2337.1 324.8 2351.8
qdrant-pq 500k 500,000 0.7812 0.8413 2501 3172 395 2405.1 347.8 2453.8
qdrant-sq 500k 500,000 0.9547 0.9497 2221 3175 432 3013.7 309.6 3064.8
skeg-int8 500k 500,000 0.9994 0.9963 1724 2963 572 635.2 364.1 2080.9
skeg-pq128 500k 500,000 0.9994 0.9683 1906 3060 519 227.6 365.0 2080.9
chroma-hnsw 1m 1,000,000 0.9831 0.9278 5576 19581 160 3345.4 1976.8 4214.8
qdrant-hnsw 1m 1,000,000 0.9922 0.9697 3737 43817 159 4146.8 649.2 4340.0
qdrant-pq 1m 1,000,000 0.7755 0.8272 2648 4267 367 2920.9 700.6 4569.7
qdrant-sq 1m 1,000,000 0.9521 0.9516 2411 5276 390 3125.4 627.6 5290.5
skeg-int8 1m 1,000,000 0.9989 0.9944 4535 15612 193 1254.6 891.7 4161.8
skeg-pq128 1m 1,000,000 0.9990 0.9631 2001 3624 471 419.0 903.8 4161.8

more