When teams budget for an AI feature, they tend to think about model calls and compute. Those are the lines that feel like AI. But the line that actually decides whether a feature survives contact with production is quieter and easier to miss: the storage underneath the vectors. It is the line the whole system runs on, and it behaves differently from the rest of the bill.
The default nobody revisits
Embeddings are stored as float32 by default — four bytes per dimension, because that is what the tools emit and nobody changes it. A single 1,536-dimension vector at full precision is 1,536 × 4 = 6,144 bytes. That sounds trivial. It is trivial, until you have a lot of them, and production means having a lot of them.
The trouble with float32 as a default is that it was set when datasets were small and storage was a rounding error. The default never got a second look, so it carries straight into production at full size — multiplied by every record, every replica, every backup.
Why storage dominates the bill
On a typical vector-database bill, storage and the I/O around it is 55–80% of the total. Not a footnote — the majority. And unlike query compute, which you can cache and amortise, storage scales linearly with the corpus. Double the documents, double the bytes. There is no economy of scale waiting to rescue you; the line just keeps climbing in step with the data.
That linearity is what turns storage into the planning constraint. In a sandbox, with a few hundred thousand vectors, none of this shows up — the footprint is small and the bill is noise. The moment a team moves from sandbox to production, the corpus grows by orders of magnitude, and the storage line stops being noise and starts being the number that gates the roadmap.
A concrete example
Take a real production shape: 100 million vectors at 1,536 dimensions. At float32, that is 6,144 bytes each.
614 GB drops to 77 GB. Same vectors, same number of records — eight times less to store, on disk, in every replica, in every backup. The reduction is exact because the byte count is exact: 6,144 → 768 bytes per vector. And because storage is most of the bill, taking that line down by 8× moves the total far more than trimming any other line could.
Price the line like cold storage
The usual way to make storage cheaper is to move it somewhere slower — push older vectors to a cold tier and accept that querying them gets painful. That lowers the price by lowering the service. CRBRL takes a different route: it reduces the data itself, then keeps it fast. You get cold-storage economics on the footprint without the cold-storage penalty on retrieval, because the index is searched in its compressed form.
That is the whole pitch in one sentence: CRBRL prices the storage line like cold storage, at 8× density, with the same retrieval quality. Retrieval stays at ≈0.98 cosine fidelity against full precision — effectively unchanged — and the compression rests on peer-reviewed mathematics (TurboQuant, arXiv:2504.19874), so the density is not bought with quality. CRBRL is disk-persistent by design, with hot, warm, and cold tiering layered on top of a footprint that is already small.
Plan for the line that grows
Most AI cost conversations focus on the lines that look like AI. The one that quietly governs what you can build is storage — because it is the majority of the bill, it grows linearly with success, and float32 leaves it eight times larger than it needs to be. Get that line under control early and the question stops being which features can we afford to keep and goes back to which features are worth building.
Want the numbers run against your own scale, and the case in full?