skeg
benchmarks

benchmarks

frozen methodology · open data

Evaluations of skeg against established vector stores and inside itself. Single-machine personal-AI hardware (Apple M1 Pro · 16 GiB). Each slice has a pre-registered methodology that was decided before the run; configurations are not re-tuned after results come in. Every chart below is rendered from the same parquet/CSV that drives the harness.

slice A data live

competitive

Recall + RSS comparison at scale

Six engines, one corpus, three scales (100K → 1M). Each engine in its default production-like configuration. One row per (engine, scale) on the same metrics: recall@10, build time, steady RSS, query p99.

slice B data live

efficiency

Recall/latency frontier

Per-engine sweep of the query-time effort knob (l_search for skeg, ef for HNSW backends, nprobes for PQ variants). Traces the Pareto frontier at 500K and exposes each engine's "knee" · where buying more recall starts costing too much latency.

slice C data live

concurrency

How each engine scales with concurrent clients

Same scale (100K), concurrency from 1 to 64. All single-machine engines saturate around the same throughput (~640 QPS); past that point you need multi-process, not more cores. Recall stays flat under load · the search is deterministic.

slice D data live

co-residence

Vector store + LLM on the same machine

The realistic case: a 3B local LLM (Llama 3.2 Q4_K_M) is answering RAG questions while a vector store serves retrievals. Sweep 10K → 1M corpus. The killer metric is backend RSS · skeg stays <80 MiB at 1M, qdrant peaks above 3 GiB.

slice E data live

internals

skeg tier menu, low-ram mode, cold-start

skeg-internal sweep: int8 vs pq:128 vs turboquant-{1,2,4}. Same engine, different compression tiers · the recall/RAM/disk tradeoff inside the system. Plus low-ram mode (M0 prototype) and cold-start RSS (first-query memory cost).

about these benchmarks