The technology, in depth
CRBRL is a compression-native, disk-persistent vector database. The compression is not a flag you switch on after the fact: the index is built compressed, and search runs directly on it. This page walks through how that works — the quantization, the codec layer, the tiering, and the deployment surface — at the level of detail an engineer would want before adopting it.
8× fewer bytes per vector · ≈0.98 cosine fidelity · zero calibration.
When the first vector databases appeared, embeddings were small and datasets were tiny, so the obvious representation was the one the model already emitted: float32 — 4 bytes per dimension. A 1,536-dimension vector is 6,144 bytes. That choice was reasonable then. It was also never revisited.
At production scale, that representation dominates the bill. Storage and the I/O around it account for 55–80% of a typical vector-database cost. Every record stored, every replica kept for availability, every snapshot held for recovery — all of it carries the full float32 footprint. The line that grows fastest is the line nobody chose deliberately.
The questions that follow are simple, and they are the ones CRBRL is built to answer: how few bytes can a vector occupy without changing the answers a search returns, and can the search run on those bytes directly — without ever expanding them back to float32 in the query path?
Most engines that offer compression treat it as a storage layer bolted under an architecture designed around full-precision vectors. The data is shrunk at rest, but every query has to expand it back to float32 before distances can be computed. That decompress step sits on the hot path of every search, and it is why bolt-on compression was rarely adopted where retrieval latency matters.
CRBRL is built the other way around. The index is compressed from the first vector loaded, and the distance computations operate in the compressed domain. Nothing is expanded back to float32 to answer a query. Because compression is the native representation rather than an add-on, the footprint reduction and the search both hold at the same time — which is the combination bolt-on designs never reached.
// 1,536 dims × 4 bytes = 6,144 bytes → 768 bytes. The same ratio carries through to a fleet: 100M vectors at 1,536 dims is 614 GB at float32, 77 GB with CRBRL.
The compression rests on TurboQuant, a quantization method published and peer-reviewed (arXiv:2504.19874). Quantization is the act of replacing high-precision numbers with a small set of discrete levels — fewer bits per dimension. The risk is always the same: naive quantization concentrates error wherever a few dimensions carry most of the magnitude, and that error distorts the distances a search depends on.
TurboQuant's intuition is to remove that concentration before it can do damage. A randomized rotation is applied to each vector, which redistributes its energy roughly evenly across all dimensions; quantization then acts on a signal where no single dimension dominates. Spreading the information first means the quantization error spreads too — instead of piling up in a few coordinates, it averages out. The result is quantization that operates near the information-theoretic limit for how few bits a vector can carry while preserving its geometry.
Two properties matter in practice. First, the fidelity: retrieval under TurboQuant holds at ≈0.98 cosine against full precision — close enough that which neighbours come back, and in what order, is effectively unchanged. Second, there is no training and no calibration: the rotation and the quantization are defined by the method, not learned from your distribution, so CRBRL works correctly on the first vector loaded, with no warm-up pass over a sample set.
We state the result and cite the paper rather than reproduce its proofs or benchmark tables here. The claim is checkable at the source: arXiv:2504.19874.
Quantization is not one-size-fits-all. CRBRL exposes the codec as a selectable layer so an operator can pick the representation that suits the data and the latency budget, rather than inheriting whatever a single hard-coded scheme imposes.
The randomized-rotation method above (arXiv:2504.19874). Near information-theoretic optimal, zero-calibration, with fidelity that holds at ≈0.98 cosine. The default starting point for general embedding workloads where retrieval quality is the first concern.
An established binary / scalar quantization approach with a strong published track record. It offers a different point in the trade-off space — useful where a workload's distribution or latency profile favours its characteristics. Having it alongside TurboQuant gives robustness across the range of data CRBRL meets in practice.
The reason to ship both is honest engineering, not abundance for its own sake. Every codec sits somewhere in a three-way trade-off between fidelity (how faithfully distances survive), footprint (how few bytes per vector), and speed (how fast the compressed-domain distance computes). No single point on that surface is best for every dataset. Offering a selectable codec lets the operator move along the surface to match the workload — and gives CRBRL a fallback when one scheme suits a distribution better than the other. We describe the trade-off qualitatively here because the right balance is workload-dependent; the docs cover how to evaluate it on your own data.
CRBRL is disk-first, not a memory store that happens to spill to disk. Because the native representation is already 8× smaller, far more of a working set fits in a given amount of disk and cache, and the data persists durably without holding the whole index in RAM. That is what makes large archives searchable on hardware that would never hold them at full precision.
On top of that, tiering is automatic. Recent data is held at higher fidelity where it is queried most; as data ages and is touched less often, it compresses further and moves down through warm and cold tiers. The movement is managed by the engine — there is no manual lifecycle job to write — so the footprint of old data keeps shrinking while recent data stays sharp.
The data queried most, held at the highest fidelity and closest to the query path for fast retrieval.
Touched less often. Compressed further as access falls, trading a little headroom for footprint — automatically.
Rarely queried but still online and searchable. The smallest footprint, kept durable on disk rather than parked offline.
Retrieval in production is rarely pure vector search. CRBRL supports semantic similarity, classic full-text, and hybrid retrieval that fuses the two, all through one interface — and all of it runs in the compressed domain, with no decompression on the query path.
Nearest-neighbour search over the compressed embeddings. Distances are computed directly on the codec's representation, preserving the geometry TurboQuant maintains at ≈0.98 cosine fidelity.
Keyword and full-text retrieval for the cases where exact terms, identifiers, and rare tokens matter more than semantic proximity.
Semantic and lexical signals combined in one query, so recall benefits from meaning and precision benefits from exact terms — without standing up two systems.
CRBRL is provider-neutral on embeddings — it works with 9 embedding models across providers, so the choice of model is yours, not the database's. It ships in two deployment shapes, and the embeddings are stored compressed in both.
Run CRBRL as its own service behind a Chroma-compatible API. Teams already building against that interface point at CRBRL and keep their client code, gaining the compressed-native footprint and compressed-domain search underneath without an application rewrite.
Install CRBRL as an extension inside Postgres. The vectors live next to your relational data, under the same backups, the same operational tooling, and the same disaster-recovery procedures your team already runs — no separate system to back up, monitor, or fail over.
Compression and search are the substance; governance is what makes them safe to run with real data and real teams. CRBRL carries the operational and access controls expected of a production database.
Authentication on the database surface and role-based access control, so what each identity can read and write is scoped explicitly.
Tenant isolation for serving many customers or teams from one deployment, keeping each tenant's vectors and queries separated.
Audit logging of access and changes, so activity against the store is recorded and reviewable.
Point-in-time snapshots for backup and restore — and, under crbrl-pg, folded into the database's own backup and DR flow.
Visibility into what the engine is doing, so operators can monitor behaviour and capacity in production rather than guess at it.
Inside Postgres, governance inherits the controls already approved for your relational data — one operational model, not two.
The same method, the same codec layer, the same deployment surface — applied to the workloads teams actually run. Or go straight to the reference.
// built on peer-reviewed mathematics — TurboQuant · arXiv:2504.19874 · a Precog Labs product