slice A · competitive matrix
six engines, three scales, four metrics
How does each engine scale on the same corpus as the dataset grows? Slice
A fixes the effort knob (skeg l_search=300, qdrant
ef=128, chroma ef=128), runs each engine three
times at 100K, 500K, and 1M vectors, and reports the median. The same
single corpus is used by everyone: Simple English Wikipedia passages,
embedded with mxbai-embed-large-v1 1024d.
data provenance
Where the numbers come from. Same source, same generator, same ground truth for every engine in the comparison.
Simple English Wikipedia, passages ≥ 500 chars truncated to ~400 chars. Public dump preprocessed once and frozen for the bench.
mxbai-embed-large-v1, 1024 dimensions. sentence-transformers via Apple Metal (MPS). Same model used to embed corpus and queries (no cross-model leakage).
1 000 hold-out passages, disjoint from corpus
Top-100 nearest neighbours computed with exact brute-force cosine over float32 vectors. Computed once per scale, reused by every engine, frozen as a parquet next to the corpus.
recall @ 10
Of the engine's top-10 nearest neighbours, how many appear in the brute-force exact top-10? Higher is better. Above 0.99 the engines are functionally identical; below 0.95 you're starting to miss material.
recall @ 100
A tougher recall test: of the top-100, how many are real top-100? This punishes engines whose quantization is too aggressive (qdrant-pq in particular bleeds quality here).
disk vs RAM · how much of the index is actually loaded
Same dataset on disk for everyone (mxbai 1024d, ~4 GB at 1M). The question is: how much of that ends up resident in RAM during queries? HNSW backends (qdrant, chroma) keep the full index in RAM by design: their RSS tracks disk size almost 1:1. skeg is SSD-primary: PQ-128 codes are 32× smaller than f32 vectors, only hot pages of the Vamana graph are kept warm, and the OS evicts the rest. At 1M, skeg-pq128 holds about 10% of the index in RAM while qdrant-hnsw holds 96%. Same dataset, same recall, ten times less memory budget consumed.
RSS at steady state
Resident memory of the engine process during the query loop. On a personal-AI machine the LLM owns most of the RAM · the vector store should not contend for it. Lower is better, and the gap matters because each MiB the store doesn't use is one the LLM can keep warm.
query latency p99
99th-percentile single-query latency at concurrency=1. Includes protocol overhead, not just search; see slice B for the recall/latency frontier per engine.
throughput at concurrency = 1
Sustained QPS in a tight single-client loop. Higher is better. skeg's binary protocol gives it a transport edge here; for fairness, treat these numbers as the single-machine ceiling · see also slice C for how each engine scales under concurrent load.
build time
Time to ingest the corpus and build the index. One-shot cost, paid once per dataset. Less interesting in production (you do it at prep-time) but useful for reproducibility.
all numbers
Click headers to sort.
| engine | scale | n vectors | recall@10 | recall@100 | p50 µs | p99 µs | qps | rss MiB | build s | disk MiB |
|---|---|---|---|---|---|---|---|---|---|---|
| chroma-hnsw | 100k-minilm | 100,000 | 0.9765 | 0.9247 | 2736 | 3622 | 354 | 501.8 | 25.8 | 178.5 |
| qdrant-hnsw | 100k-minilm | 100,000 | 0.9855 | 0.9666 | 2006 | 2627 | 483 | 536.9 | 26.6 | 625.0 |
| qdrant-pq | 100k-minilm | 100,000 | 0.7945 | 0.8279 | 2155 | 3218 | 449 | 389.2 | 37.3 | 511.0 |
| qdrant-sq | 100k-minilm | 100,000 | 0.9580 | 0.9481 | 1421 | 1937 | 677 | 561.2 | 25.7 | 596.2 |
| skeg-int8 | 100k-minilm | 100,000 | 0.9975 | 0.9950 | 1236 | 1600 | 806 | 73.4 | 32.1 | 172.0 |
| skeg-pq128 | 100k-minilm | 100,000 | 0.9970 | 0.9960 | 1842 | 2418 | 542 | 50.4 | 31.6 | 172.0 |
| chroma-hnsw | 100k | 100,000 | 0.9900 | 0.9594 | 3897 | 4928 | 253 | 734.6 | 41.8 | 425.1 |
| qdrant-hnsw | 100k | 100,000 | 0.9950 | 0.9820 | 2631 | 3275 | 370 | 877.8 | 63.4 | 942.5 |
| qdrant-pq | 100k | 100,000 | 0.8395 | 0.8722 | 2523 | 3103 | 385 | 647.9 | 95.0 | 757.6 |
| qdrant-sq | 100k | 100,000 | 0.9630 | 0.9677 | 2147 | 2684 | 455 | 1004.4 | 61.8 | 1074.6 |
| skeg-int8 | 100k | 100,000 | 1.0000 | 0.9973 | 1688 | 2344 | 590 | 130.8 | 51.9 | 416.2 |
| skeg-pq128 | 100k | 100,000 | 0.9995 | 0.9833 | 1681 | 2362 | 586 | 62.9 | 52.2 | 416.2 |
| chroma-hnsw | 500k | 500,000 | 0.9857 | 0.9379 | 4408 | 7438 | 226 | 2388.4 | 353.5 | 2109.4 |
| qdrant-hnsw | 500k | 500,000 | 0.9880 | 0.9475 | 2890 | 3686 | 343 | 2337.1 | 324.8 | 2351.8 |
| qdrant-pq | 500k | 500,000 | 0.7812 | 0.8413 | 2501 | 3172 | 395 | 2405.1 | 347.8 | 2453.8 |
| qdrant-sq | 500k | 500,000 | 0.9547 | 0.9497 | 2221 | 3175 | 432 | 3013.7 | 309.6 | 3064.8 |
| skeg-int8 | 500k | 500,000 | 0.9994 | 0.9963 | 1724 | 2963 | 572 | 635.2 | 364.1 | 2080.9 |
| skeg-pq128 | 500k | 500,000 | 0.9994 | 0.9683 | 1906 | 3060 | 519 | 227.6 | 365.0 | 2080.9 |
| chroma-hnsw | 1m | 1,000,000 | 0.9831 | 0.9278 | 5576 | 19581 | 160 | 3345.4 | 1976.8 | 4214.8 |
| qdrant-hnsw | 1m | 1,000,000 | 0.9922 | 0.9697 | 3737 | 43817 | 159 | 4146.8 | 649.2 | 4340.0 |
| qdrant-pq | 1m | 1,000,000 | 0.7755 | 0.8272 | 2648 | 4267 | 367 | 2920.9 | 700.6 | 4569.7 |
| qdrant-sq | 1m | 1,000,000 | 0.9521 | 0.9516 | 2411 | 5276 | 390 | 3125.4 | 627.6 | 5290.5 |
| skeg-int8 | 1m | 1,000,000 | 0.9989 | 0.9944 | 4535 | 15612 | 193 | 1254.6 | 891.7 | 4161.8 |
| skeg-pq128 | 1m | 1,000,000 | 0.9990 | 0.9631 | 2001 | 3624 | 471 | 419.0 | 903.8 | 4161.8 |
more
- Dataset: Wikipedia Simple English, passages ≥500 chars, first ~400 chars per chunk.
mxbai-embed-large-v1via sentence-transformers MPS. - Engines & knobs: skeg-pq128 / skeg-int8 at
l_search=300; qdrant-hnsw / qdrant-sq atef=128; qdrant-pq atef=128 nprobes=64; chroma-hnsw at defaultef=128. - LanceDB: excluded from the public charts due to a known config issue that produced badly degraded recall in this run (~0.68). The tier-mode and disk-layout knobs need a careful rerun before LanceDB numbers can be compared fairly.
- Sweep: 3 runs × (engine × scale), median reported. Concurrency 1.
- Replication: the same matrix was also run with the smaller
minilm-l6-v2embedder (384d); results are consistent · skeg holds the RAM advantage at lower dimensionality too.