CRBRL

Talk to us →

Technical site Business case Technology Use cases Docs About Precog Labs Developer community Social impact Pressroom Blog Careers Talk to us →

The technology, in depth

Compression as the architecture — not a feature.

CRBRL is a compression-native, disk-persistent vector database. The compression is not a flag you switch on after the fact: the index is built compressed, and search runs directly on it. This page walks through how that works — the quantization, the codec layer, the tiering, and the deployment surface — at the level of detail an engineer would want before adopting it.

8× fewer bytes per vector · ≈0.98 cosine fidelity · zero calibration.

Jump to the math → How it deploys ↓

▲ 01 / The data-layer problema default left unexamined

Embeddings have been stored as 32-bit floats since the category began.

When the first vector databases appeared, embeddings were small and datasets were tiny, so the obvious representation was the one the model already emitted: float32 — 4 bytes per dimension. A 1,536-dimension vector is 6,144 bytes. That choice was reasonable then. It was also never revisited.

At production scale, that representation dominates the bill. Storage and the I/O around it account for 55–80% of a typical vector-database cost. Every record stored, every replica kept for availability, every snapshot held for recovery — all of it carries the full float32 footprint. The line that grows fastest is the line nobody chose deliberately.

The questions that follow are simple, and they are the ones CRBRL is built to answer: how few bytes can a vector occupy without changing the answers a search returns, and can the search run on those bytes directly — without ever expanding them back to float32 in the query path?

▲ 02 / Compression-native vs bolt-oncompressed-domain search

Search runs on the compressed index — there is no decompression in the query path.

Most engines that offer compression treat it as a storage layer bolted under an architecture designed around full-precision vectors. The data is shrunk at rest, but every query has to expand it back to float32 before distances can be computed. That decompress step sits on the hot path of every search, and it is why bolt-on compression was rarely adopted where retrieval latency matters.

CRBRL is built the other way around. The index is compressed from the first vector loaded, and the distance computations operate in the compressed domain. Nothing is expanded back to float32 to answer a query. Because compression is the native representation rather than an add-on, the footprint reduction and the search both hold at the same time — which is the combination bolt-on designs never reached.

One 1,536-dim vector — float32 vs CRBRL

// 1,536 dims × 4 bytes = 6,144 bytes → 768 bytes. The same ratio carries through to a fleet: 100M vectors at 1,536 dims is 614 GB at float32, 77 GB with CRBRL.

▲ 03 / TurboQuantarXiv:2504.19874 · near information-theoretic optimal

A quantization method that spreads information evenly before it rounds.

The compression rests on TurboQuant, a quantization method published and peer-reviewed (arXiv:2504.19874). Quantization is the act of replacing high-precision numbers with a small set of discrete levels — fewer bits per dimension. The risk is always the same: naive quantization concentrates error wherever a few dimensions carry most of the magnitude, and that error distorts the distances a search depends on.

TurboQuant's intuition is to remove that concentration before it can do damage. A randomized rotation is applied to each vector, which redistributes its energy roughly evenly across all dimensions; quantization then acts on a signal where no single dimension dominates. Spreading the information first means the quantization error spreads too — instead of piling up in a few coordinates, it averages out. The result is quantization that operates near the information-theoretic limit for how few bits a vector can carry while preserving its geometry.

Two properties matter in practice. First, the fidelity: retrieval under TurboQuant holds at ≈0.98 cosine against full precision — close enough that which neighbours come back, and in what order, is effectively unchanged. Second, there is no training and no calibration: the rotation and the quantization are defined by the method, not learned from your distribution, so CRBRL works correctly on the first vector loaded, with no warm-up pass over a sample set.

We state the result and cite the paper rather than reproduce its proofs or benchmark tables here. The claim is checkable at the source: arXiv:2504.19874.

8×

Fewer bytes per vector versus the float32 default — the native footprint, not a lossy afterthought.

≈0.98

Cosine fidelity against full precision — retrieval is effectively unchanged at the level of which results return.

Training and calibration. The method is closed-form; it is correct on the first vector you load.

▲ 04 / The codec layertwo codecs · selectable

Two codecs in one layer — TurboQuant and RaBitQ, chosen per workload.

Quantization is not one-size-fits-all. CRBRL exposes the codec as a selectable layer so an operator can pick the representation that suits the data and the latency budget, rather than inheriting whatever a single hard-coded scheme imposes.

Codec A

TurboQuant

The randomized-rotation method above (arXiv:2504.19874). Near information-theoretic optimal, zero-calibration, with fidelity that holds at ≈0.98 cosine. The default starting point for general embedding workloads where retrieval quality is the first concern.

Codec B

RaBitQ

An established binary / scalar quantization approach with a strong published track record. It offers a different point in the trade-off space — useful where a workload's distribution or latency profile favours its characteristics. Having it alongside TurboQuant gives robustness across the range of data CRBRL meets in practice.

The reason to ship both is honest engineering, not abundance for its own sake. Every codec sits somewhere in a three-way trade-off between fidelity (how faithfully distances survive), footprint (how few bytes per vector), and speed (how fast the compressed-domain distance computes). No single point on that surface is best for every dataset. Offering a selectable codec lets the operator move along the surface to match the workload — and gives CRBRL a fallback when one scheme suits a distribution better than the other. We describe the trade-off qualitatively here because the right balance is workload-dependent; the docs cover how to evaluate it on your own data.

▲ 05 / Disk-first architecturehot / warm / cold · automatic

Disk-persistent by design, with fidelity that ages gracefully.

CRBRL is disk-first, not a memory store that happens to spill to disk. Because the native representation is already 8× smaller, far more of a working set fits in a given amount of disk and cache, and the data persists durably without holding the whole index in RAM. That is what makes large archives searchable on hardware that would never hold them at full precision.

On top of that, tiering is automatic. Recent data is held at higher fidelity where it is queried most; as data ages and is touched less often, it compresses further and moves down through warm and cold tiers. The movement is managed by the engine — there is no manual lifecycle job to write — so the footprint of old data keeps shrinking while recent data stays sharp.

Hot

Recent

The data queried most, held at the highest fidelity and closest to the query path for fast retrieval.

Warm

Aging

Touched less often. Compressed further as access falls, trading a little headroom for footprint — automatically.

Cold

Semantic, full-text, and hybrid — through a single API.

Retrieval in production is rarely pure vector search. CRBRL supports semantic similarity, classic full-text, and hybrid retrieval that fuses the two, all through one interface — and all of it runs in the compressed domain, with no decompression on the query path.

Semantic

Vector similarity

Nearest-neighbour search over the compressed embeddings. Distances are computed directly on the codec's representation, preserving the geometry TurboQuant maintains at ≈0.98 cosine fidelity.

Full-text

Lexical match

Keyword and full-text retrieval for the cases where exact terms, identifiers, and rare tokens matter more than semantic proximity.

Hybrid

Fused ranking

Semantic and lexical signals combined in one query, so recall benefits from meaning and precision benefits from exact terms — without standing up two systems.

▲ 07 / Deploymentstandalone or inside Postgres

Drop it in standalone, or run it inside the database you already operate.

CRBRL is provider-neutral on embeddings — it works with 9 embedding models across providers, so the choice of model is yours, not the database's. It ships in two deployment shapes, and the embeddings are stored compressed in both.

Standalone

Chroma-compatible API

Run CRBRL as its own service behind a Chroma-compatible API. Teams already building against that interface point at CRBRL and keep their client code, gaining the compressed-native footprint and compressed-domain search underneath without an application rewrite.

In-database

crbrl-pg · a Postgres extension

Install CRBRL as an extension inside Postgres. The vectors live next to your relational data, under the same backups, the same operational tooling, and the same disaster-recovery procedures your team already runs — no separate system to back up, monitor, or fail over.

▲ 08 / Governancecontrols for production

The controls a production data store is expected to carry.

Compression and search are the substance; governance is what makes them safe to run with real data and real teams. CRBRL carries the operational and access controls expected of a production database.

Access

Auth & RBAC

Authentication on the database surface and role-based access control, so what each identity can read and write is scoped explicitly.

Isolation

Multi-tenancy

Tenant isolation for serving many customers or teams from one deployment, keeping each tenant's vectors and queries separated.

Accountability

Audit logs

Audit logging of access and changes, so activity against the store is recorded and reviewable.

Recovery

Snapshots

Point-in-time snapshots for backup and restore — and, under crbrl-pg, folded into the database's own backup and DR flow.

Operations

Observability

Visibility into what the engine is doing, so operators can monitor behaviour and capacity in production rather than guess at it.

Continuity

Same ops surface

Inside Postgres, governance inherits the controls already approved for your relational data — one operational model, not two.

▲ Where to next

Compression-native, end to end — now see it in the field.

The same method, the same codec layer, the same deployment surface — applied to the workloads teams actually run. Or go straight to the reference.

See where it's used → Read the docs →

// built on peer-reviewed mathematics — TurboQuant · arXiv:2504.19874 · a Precog Labs product