DField SolutionsLoading · Töltődik
Skip to content

BRIN (Block Range INdex) has been in Postgres since 9.5. Most teams we audit have never used it · they stamp B-trees on everything and then complain about WAL size and bloat. Some teams have tried BRIN on the wrong column once, decided it is slow, and never went back. Both are wrong on the same table.

The rule we apply, with numbers below: BRIN for append-mostly, time-ordered or otherwise physically clustered columns (events, telemetry, audit logs, IoT readings). B-tree for everything else. Anything not in either bucket gets an index only after we measure.

What BRIN actually is

A B-tree stores one entry per row. A BRIN stores a summary (min, max) per range of pages · 128 pages by default. So instead of 200M index entries it stores roughly 200M / (128 * rows-per-page) entries. The lookup is: 'which page ranges might contain values in [a,b]?' and then sequential-scan those pages. It only works well if values in a range actually cluster · in practice that means you insert in order and never update.

The benchmark · 200M rows of telemetry

Real table from a customer engagement (numbers anonymised, shape preserved). A `metrics` table, 200M rows, partitioned daily · for the index test we focused on a single 30-day partition with about 60M rows. Columns of interest: `created_at` (insert time, monotonic), `user_id` (random UUID), `metric` (low-cardinality string), `value` (double).

-- B-tree on created_at
CREATE INDEX btree_created ON metrics_p USING btree (created_at);

-- BRIN on created_at, default pages_per_range = 128
CREATE INDEX brin_created ON metrics_p USING brin (created_at)
  WITH (pages_per_range = 128);

-- BRIN on user_id (will lose, included for honesty)
CREATE INDEX brin_user ON metrics_p USING brin (user_id)
  WITH (pages_per_range = 128);

Sizes · the obvious win for BRIN

btree_created      ~1.7 GB
brin_created       ~560 KB    (3000x smaller)
brin_user          ~560 KB    (same shape, different story below)

The size delta alone is the reason BRIN exists. On a hot append-only partition, the B-tree on `created_at` was about 3% the size of the table. The BRIN on the same column was 0.001%. WAL volume on inserts dropped roughly proportionally · which on a busy ingest pipeline is the actual production benefit, not query speed.

Range queries · the case BRIN was built for

A typical query · 'give me the last 6 hours of metrics for a dashboard'.

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*), avg(value) FROM metrics_p
WHERE created_at >= now() - interval '6 hours';
B-tree:  Bitmap heap scan, 28 ms, ~2.1M rows scanned
BRIN:    Bitmap heap scan, 31 ms, ~2.3M rows scanned (slightly wider page range)

Within ~10% on wall time. Within negligible margin on rows touched. For an index that is 3000x smaller and barely costs anything to maintain, that is the trade we want.

Wider range queries · BRIN catches up further

When the range gets wider (say a 7-day window, returning ~14M rows), the cost is dominated by the heap scan, not the index lookup. BRIN and B-tree converge to within a couple of percent. The B-tree's 'precise' index entries are wasted work · we are going to read most of those pages anyway.

Where BRIN loses · low correlation columns

The same query, but filtering by `user_id` (random UUIDs).

EXPLAIN ANALYZE
SELECT * FROM metrics_p WHERE user_id = '...';
B-tree on user_id:  ~5 ms, 60 rows
BRIN on user_id:    ~3.4 s, 60 rows  (had to scan most of the table)

BRIN does not help when the index column is uncorrelated with physical row order. The summary ranges all overlap the predicate, so the index recommends scanning everything. This is the case people hit, write off BRIN, and never come back. Do not put BRIN on UUIDs, hashes, or anything where the values are scattered across the heap.

Ingest cost · the silent BRIN win

On a 60M-row insert pass into a daily partition, the B-tree took roughly 11 minutes; with BRIN instead, the same load took under 8 minutes, mostly because BRIN does not have to rebalance pages on every insert. WAL volume dropped 40%. For an ingest pipeline that runs every minute, that is a noticeable reduction in replication lag and backup size, not just a microbenchmark trinket.

When to use which

  • BRIN: append-mostly, time-ordered or otherwise physically clustered columns. Audit logs, IoT readings, event tables, partition-time columns.
  • BRIN: any column where you mostly run range queries, not point lookups.
  • B-tree: anything you point-look up · primary keys, foreign keys, user_id, email.
  • B-tree: anything that backs an `ORDER BY` you actually use (BRIN cannot serve sort).
  • B-tree: anything with high update churn that breaks physical order over time.

Operational notes

  1. Set `pages_per_range` thoughtfully. Default 128 is fine for OLTP-mixed; on a pure ingest table with very narrow ranges, drop to 32.
  2. Run `VACUUM` regularly. BRIN summaries are updated by autovacuum; if you turn it off, BRIN gets stale and the planner stops trusting it.
  3. Pair BRIN with declarative partitioning. Constraint exclusion plus BRIN inside the partition is the cheapest combo for time-series.
  4. Re-summarise after a bulk reload: `SELECT brin_summarize_new_values('brin_created');`
  5. Do not mix BRIN with `INCLUDE` columns expecting an index-only scan · BRIN does not support index-only scans the way B-tree does.

If your largest table is event-shaped and you have never tried BRIN on its time column, that is an afternoon experiment with a real chance of cutting WAL volume by 30-50%. Run it on a partition copy first, then deploy if the numbers hold.

BRIN is not a B-tree replacement. It is a different tool for a different shape of data. On the right column it is unbeatable on disk, WAL, and ingest cost while staying within a few percent on read latency. On the wrong column it is an embarrassment. The decision is mechanical · ask 'are values in this column physically ordered?', and if yes, BRIN, otherwise B-tree.

ShareXLinkedIn#
Dezso Mezo
By

Dezso Mezo

Founder, DField Solutions

I've shipped production products from fintech to creator-tooling · for startups and enterprises, from Budapest to San Francisco.

Keep reading
RELATED PROJECTS
Let's talk

Would rather build together?

Let's talk about your project. 30 minutes, no strings.