benchmarks

frozen methodology · open data

Evaluations of skeg against established vector stores and inside itself. Single-machine personal-AI hardware (Apple M1 Pro · 16 GiB). Each slice has a pre-registered methodology that was decided before the run; configurations are not re-tuned after results come in. Every chart below is rendered from the same parquet/CSV that drives the harness.

where skeg lands

skeg is the memory-frugal vector store for personal-AI machines. At 500K vectors with the same corpus, skeg-pq128 uses ~228 MiB with recall@10 = 0.9994. The HNSW competitors (Qdrant, Chroma) sit around 2 300–3 000 MiB for the same recall: a 10× memory gap.

The gap comes from architecture, not tuning: skeg's working set lives on SSD (PQ-128 codes + Vamana graph), only hot pages are kept in RAM, and the rest can be evicted by the OS as needed. On a 16 GiB laptop where a 3B LLM owns 2.2 GiB and macOS itself takes several more, this is the difference between "the laptop is breathing" and "everything starts to swap".

What skeg is not: not the lowest-latency single-query engine (Qdrant is comparable), not horizontally scalable out of the box (single-shard ceiling ~640 QPS), not as battle-tested in production as Qdrant. Where it wins is the regime it was built for: laptops, DGX Spark, edge boxes · anywhere RAM is contested.

slice A data live

competitive

Recall + RSS comparison at scale

Six engines, one corpus, three scales (100K → 1M). Each engine in its default production-like configuration. One row per (engine, scale) on the same metrics: recall@10, build time, steady RSS, query p99.

slice B data live

efficiency

Recall/latency frontier

Per-engine sweep of the query-time effort knob (l_search for skeg, ef for HNSW backends, nprobes for PQ variants). Traces the Pareto frontier at 500K and exposes each engine's "knee" · where buying more recall starts costing too much latency.

slice C data live

concurrency

How each engine scales with concurrent clients

Same scale (100K), concurrency from 1 to 64. All single-machine engines saturate around the same throughput (~640 QPS); past that point you need multi-process, not more cores. Recall stays flat under load · the search is deterministic.

slice D data live

co-residence

Vector store + LLM on the same machine

The realistic case: a 3B local LLM (Llama 3.2 Q4_K_M) is answering RAG questions while a vector store serves retrievals. Sweep 10K → 1M corpus. The killer metric is backend RSS · skeg stays <80 MiB at 1M, qdrant peaks above 3 GiB.

slice E data live

internals

skeg tier menu, low-ram mode, cold-start

skeg-internal sweep: int8 vs pq:128 vs turboquant-{1,2,4}. Same engine, different compression tiers · the recall/RAM/disk tradeoff inside the system. Plus low-ram mode (M0 prototype) and cold-start RSS (first-query memory cost).

0.5 release data live

0.5 release

The 0.5 story: tq2 default, multi-tenant wedge, cost

The full 0.5 release narrative. tq2 becomes the default tier (recall-neutral, leaner than int8 across 100d-784d), the high-density multi-tenant wedge (per-tenant isolation, 0 leaks, ~200 MB for 5 tenants), survival under a 256 MB container cap where qdrant OOMs, the 10x cost gap at 50M vectors, single-tenant scaling tiers, and the new tiering cold-start bulkhead.

about these benchmarks

Frozen methodology: configurations, metrics, and pass criteria are decided before the run. No re-tuning after results.
Real datasets: Simple English Wikipedia for slices A/B/C, enwiki chunked for slice D. Embedded with mxbai-embed-large-v1 (1024d) or all-minilm-l6-v2 (384d) where noted.
Native, not Docker: every engine runs as a native macOS process so RSS comparisons are apples-to-apples (Docker on macOS hides the engine inside a Linux VM and gives misleading RSS).
Median over runs: each (engine, configuration) cell is 2–3 runs minimum, median tabulated. Outliers visible in the per-turn parquet.
Excluded: LanceDB was in the original matrix but its run hit a known config issue producing degraded recall (~0.68). The numbers exist in the raw CSV but are not shown here until a clean re-run lands.