PostgreSQL at Scale: Saving Space (Basically) for Free
Braintree Payments operates dozens of PostgreSQL clusters with over 100 terabytes of data. At this scale, even a few percentage points change in disk space growth rate can meaningfully impact the writable lifespan of a database cluster. Unfortunately, many ideas to save disk space require application changes and therefore need to be slotted into product timelines.
But today I want to focus on a technique that saved us approximately 10% of our disk space with very little effort beyond existing processes. In short, carefully choosing column order when creating a table can eliminate padding that would otherwise be needed.
This technique isn’t revolutionary: it’s been well-documented by 2ndQuadrant in On Rocks and Sand, EDB in Data Alignment in PostgreSQL, GitLab in Ordering Table Columns in PostgreSQL, the classic “Column Tetris” answer on a StackOverflow question, and I’m sure many more. What I hope we’re bringing to the table is tooling encoding these ideas so that you don’t have to re-invent the wheel (or apply the technique manually).
Below I’ll describe the rules and heuristics we apply to determine an ideal column ordering. But a list of rules sounds a lot like the definition for an algorithm. And that implies a problem space we can tackle at the systems, not people, level. Instead of sending a mass email to every engineer writing database DDL changes and expecting them to remember these rules, we authored a Ruby gem called [pg_column_byte_packer](https://github.com/braintree/pg_column_byte_packer)
to automate the solution in our development cycle. We'll talk more about that soon, but first let's take a more in-depth look a the problem space.
Photo by Pickawood on Unsplash
Data Alignment, Padding, and Waste
PostgreSQL’s heap storage, much like fields in C-language structs, writes columns guaranteeing alignment boundaries. For example, a column having 8-byte alignment is guaranteed to start at a byte index evenly divisible by 8 (zero-indexed). The heap storage engine automatically introduces any padding necessary to maintain this alignment.
We can introspect all kinds of system behavior and objects using PostgreSQL’s catalog tables, and alignment is no exception. Each datatype is listed in pg_catalog.pg_type
, and you can determine the alignment required for any data type in the typalign
column of that catalog table. PostgreSQL's documentation provides an excellent summary of how to interpret this column.
What Can We Do About It?
From a high-level perspective we can minimize the amount of space that will be lost to alignment padding by ordering each table’s columns in descending order of their data type’s alignment.
For example, suppose on our 64-bit system we have a table with two columns: a bigint
column (which requires 8-byte alignment) and an integer
column (which requires 4-byte alignment). If we put the integer
column first we'll have the following data layout that takes up 16 bytes:
However if we put the bigint
column first our data layout will only take up 12 bytes:
But there are a few other cases we want to handle at the same time:
- Variable length data types like
TEXT
can have variable alignment requirements depending on their size. While we obviously can't look at the data when creating the table, we do want to take hints based on length constraints on the column, if any. - Binary (
BYTEA
data type) columns are similarly variable in length (and therefore variable alignment), we assume as a heuristic that binary data is usually "long" length. -
NOT NULL
columns are definitionally more likely to contain data than a random nullable column, so it makes sense to order them earlier (for faster unpacking during reads). - Columns with a
DEFAULT
are more likely to contain data (though slightly less so thanNOT NULL
), so it makes sense to order them earlier (also for faster unpacking during reads). -
PRIMARY KEY
columns not only always have data, but are also often the most frequently accessed columns (because they tend to beJOIN
conditions), so we order then at the beginning of their alignment group.
Bundled Up in a Ruby Gem
I mentioned earlier that we’ve incorporated this set of column ordering rules into our recently open-sourced Ruby Gem [pg_column_byte_packer](https://github.com/braintree/pg_column_byte_packer)
. We implemented two complementary approaches to try to solve the problem holistically.
First, we automatically patch ActiveRecord’s migration code to re-order columns on-the-fly when all of those columns are included in a single create_table
(or, safe_create_table
if you're using our [pg_ha_migrations](https://github.com/braintree/pg_ha_migrations)
gem to maintain uptime guarantees when running migrations!) call.
Second, we provide an API to re-order the columns found in CREATE TABLE
statements in a SQL file generated by PostgreSQL's pg_dump
utility.
Beginning to use the tool in your applications now will immediately benefit new tables. But we didn’t want to stop there, because we have many existing large tables! Of course re-ordering existing columns meant we needed to re-write tables. So we created entirely new databases and applied schemas files updated using the pg_dump
SQL file modification feature described above. Finally we logically replicated all data to these new databases and transparently cutover from the old databases. This is also the foundation of how we achieve zero-downtime major version upgrades in PostgreSQL, but that's a topic for a future post!
We hope many of you will benefit from our work here. And we’d also love to see any ideas you might have for improving it.