Why compression belongs in the architecture, not bolted on

Most teams meet vector compression the same way: their database fills up, the storage line climbs, and someone reaches for a setting that shrinks the index after the fact. It helps the bill. It also quietly changes the shape of every query. We think that order is backwards — compression should be a property of the architecture, not a patch applied to it.

The distinction is not academic. Where you put the compression decides whether search stays fast, whether retrieval quality holds, and whether you ever have to retrain anything. Here is the case for building it in from the start.

What bolt-on compression actually does

A vector database answers a query by comparing a query vector against the vectors in the index and returning the closest matches. When the stored vectors are compressed by a layer bolted on top, the comparison cannot happen on the compressed form. The system has to decompress first — reconstruct the full-precision vectors, or a working approximation of them, and only then run the distance maths.

That decompress step sits squarely on the query path. Every search pays for it. At small scale it hides inside the noise. At production scale, where a single request may touch a large slice of the index, it becomes a tax on latency that grows with the corpus — the exact moment you most wanted the savings.

So bolt-on compression trades one cost for another: it lowers the storage line but raises the cost of every read. That trade is why compression has stayed an afterthought in most vector stacks. The teams who tried it where it counts felt the query slow down and turned it back off.

Bolt-on compression lowers the storage line and raises the cost of every read. Compressed-domain search refuses that trade.

Compressed-domain search removes the step

CRBRL is built the other way around. The index is stored compressed from the first record, and search runs directly on the compressed representation. There is no reconstruction stage in the query path because the distance computation is defined on the compressed form itself. The decompress step is not optimised — it is absent.

This is the part that older approaches never solved, and it is the whole reason compression-native is a different category from compression-as-a-feature. When the search operates in the compressed domain, density stops being a trade against speed. You keep the smaller footprint and the read stays cheap.

The fidelity question, answered with a number

The obvious worry is quality. If you are comparing compressed vectors, are the answers still right? This is where the work has to be honest, and where a single number does the arguing: CRBRL holds ≈0.98 cosine fidelity against the full-precision baseline. In plain terms, the ranked results you get from the compressed index match what you would get from the uncompressed one closely enough that, for retrieval, the difference does not show up in practice.

That number is not a marketing figure pulled from a good run. The compression method underneath — TurboQuant — is peer-reviewed and published (arXiv:2504.19874). The fidelity is a property you can check against the paper, not a claim you have to take on faith. The codec layer is selectable — TurboQuant or RaBitQ — so the same compressed-domain design holds across more than one method.

Zero retraining, and nothing to migrate

There is a second cost that bolt-on approaches and learned-compression schemes often hide: the model. Some methods need you to train a codec on your data, or retrain an embedding model to play nicely with the compression. That turns a storage decision into a machine-learning project.

Compression-native does not ask for that. CRBRL works on the first record you load — no codec training run, no embedding retrain, no model to keep in sync with your data as it drifts. It is Chroma-compatible at the API and ships a Postgres extension (crbrl-pg), so the change is a swap at the storage layer rather than a rebuild of everything above it.

Why this is the right default

Put the three pieces together and the case is straightforward. Bolt-on compression forces a decompress step that taxes every query. Compressed-domain search removes that step. The fidelity holds at ≈0.98 cosine, on peer-reviewed mathematics. And there is no retraining and no migration to absorb the change.

Compression was treated as an optional optimisation because, the old way, it always cost you something on read. Build it into the architecture and that cost goes away — which is exactly why it should not be bolted on.