Safely Adding NOT NULL Columns to Your Database Tables

The Database Migrations team is in charge of creating and maintaining all the internal tools for Shopify teams to carry out schema changes safely and with minimal downtime. One of our latest investigations involved using the popular Large Hadron Migrator (LHM) gem to perform schema changes in MySQL databases, after safety-checking that the migrations can be performed without losing any data. In particular, when adding a NOT NULL columns to an existing table.

In this post, I'll share what we learned and our recommendations for doing so to your own database tables.

Defining Schema Change Safety

First, it’s important to understand how schema changes safety is defined.

For this purpose, we should be aware of the procedure LHM uses to execute migrations, as it uses the shadow-table mechanism to ensure there is minimal downtime while the migrations are being carried out. In a nutshell, LHM creates a new table (known as shadow table) with the schema change applied, and sets up triggers on the original table to populate any data related operations (INSERT, UPDATE and DELETE) to the shadow table. Then, it starts copying records, in batches, from the original table to the shadow one. When all records have been copied to the shadow table, LHM automatically renames the two tables and then drop the triggers from the original one.

This procedure ensures there’s minimal downtime (see MySQL rename table limitations), while migrations are taking place, but it introduces a new set of potential problems, as we need to make sure that the batched insertions don’t drop records in the process, which may happen depending on the schema change applied.

Therefore, schema changes are identified as safe if:

After the migration has started: INSERT, UPDATE and DELETE operations targeting the original table can populate data to the shadow one (via MySQL triggers), without crashing. This is known as backward compatibility.
Once the migration has finished: the number of records in the shadow table must be equal to the number of records in the original table (via MySQL triggers and LHM batched insertions).

Research Case: Adding a NOT NULL Column

As stated in the title, this investigation focuses on one specific set of schema changes, those adding NOT NULL constrained columns to a table. Considering how LHM works internally, one can only guess what values are going to be populated to the newly added column, for all the original table records, which lack a value for that column in the first place.

Therefore, the investigation of these schema changes safety will consider the following factors:

The inclusion of a DEFAULT value for the new column definition, in the same migration.
The inclusion of a UNIQUE INDEX for the new column, in the same migration.
The mode the MySQL instance is configured to (strict or non-strict).

Setting Up the Experiment

In order to carry out the investigation, multiple steps are defined to simulate how LHM will carry out the migrations while iterating on the considered factors. For demonstration purposes, let’s assume the migration adds a NOT NULL column called “last_name” to a table named “users” .

1. Initialization: The MySQL mode is set and the original table created. 2. Table creation: Simulate how LHM would apply the migration. 3. Triggers definition: Simulate how LHM would set up MySQL triggers.

Running the Experiment

Once the experiment setup is defined, we can execute data-related SQL operations to check how MySQL triggers will populate them from the original to the shadow table. These statements are dependent on the SQL operation we are testing (INSERT, UPDATE and DELETE).

1. Populate initial data: Create initial records to execute the investigation queries on.

2. Activate the SQL triggers: Run specific queries to activate the LHM simulated triggers.

3. Compare the results: Execute a SELECT query both in the original and in the shadow table, in order to compare their records (both in length and content).

Concatenating all the Experiment Setup section steps alongside these ones, we can determine the safety of adding NOT NULL column schema changes, depending on the factors stated in section 2 ( DEFAULT clause, UNIQUE INDEX existence, and MySQL mode). A complete back-to-back experiment, for INSERT operations, would look like this:

Screenshot showing data loss in a shadow table.

Data loss is shown by missing records in the shadow table on the right.

As the number of records between the original and shadow tables is different, we conclude that performing INSERT operations, when there is a NOT NULL, DEFAULT defined schema-change, that also introduces a UNIQUE INDEX on that column, can produce data loss when the MySQL instance is configured with an strict mode.

Results

Similar to how we built an experiment case in the previous section, we can iterate on the experiment factors (operation type, DEFAULT value inclusion, UNIQUE INDEX presence, and MySQL mode) to build a matrix of schema change safety for all the resulting combinations.

As a reminder, schema change safety is determined by answering two questions:

Will the migration be backwards compatible? In other words, whether INSERT, UPDATE and DELETE operations targeting the original table can populate data to the shadow one (via MySQL triggers), without crashing.
Will the migration introduce data loss? In other words, whether the number of records in the shadow table equals the one in the original table, once the migration has finished.

Column spec	Schema change	MySQL mode	Operation	Backward compatible?	Data loss?
NOT NULL, with DEFAULT value	Not includes UNIQUE INDEX	STRICT_ALL_TABLES	INSERT	Yes	No
UPDATE	Yes	No
DELETE	Yes	No
NO_ENGINE_SUBSTITUTION	INSERT	Yes	No
UPDATE	Yes	No
DELETE	Yes	No
Includes UNIQUE INDEX	STRICT_ALL_TABLES	INSERT	Yes	Yes
UPDATE	Yes	Yes
DELETE	Yes	No
NO_ENGINE_SUBSTITUTION	INSERT	Yes	Yes
UPDATE	Yes	Yes
DELETE	Yes	No
NOT NULL without DEFAULT value	Not includes UNIQUE INDEX	STRICT_ALL_TABLES	INSERT	No	-
UPDATE	No	-
DELETE	No	-
NO_ENGINE_SUBSTITUTION	INSERT	Yes	No*
UPDATE	Yes	No*
DELETE	Yes	No*
Includes UNIQUE INDEX	STRICT_ALL_TABLES	INSERT	No	-
UPDATE	No	-
DELETE	No	-
NO_ENGINE_SUBSTITUTION	INSERT	Yes	Yes
UPDATE	Yes	Yes
DELETE	Yes	No*

* The number of records in the shadow table matches the one in the original table, but an implicit DEFAULT value is chosen for the new column. For this experiment, the value was the empty string (“”), but it will vary depending on the data type (check MySQL implicit defaults).

Conclusions

Considering the matrix of cases from previous section:

**Avoid adding a NOT NULL column without a DEFAULT value.**Schema changes introducing NOT NULL columns must define a DEFAULT value to avoid unexpected results when the migrations are taking place.

In the worst case (when the MySQL instance is configured with a strict mode), the table affected by the migration will break compatibility for existing applications, as previously used data-related operations (INSERT / UPDATE / DELETE) could not be populated to the shadow table, after the migrations starts.

In the best case (when the MySQL instance is configured with a non-strict mode), the records populated from the original to the shadow table, either by the MySQL triggers or LHM batched insertions, will receive an implicit DEFAULT value for the new column, which is probably undesirable.
**Be extremely cautious when adding a UNIQUE INDEX.**The introduction of UNIQUE indexes in schema-changes engines that use the shadow table mechanism to carry out their migrations proved to be dangerous as it can lead to data loss when there are duplicate values, for the index covered columns, prior to the migrations.

It’s recommended that developers check for the existence of duplicates before they add a UNIQUE index on a set of columns.

Acknowledgements

This investigation was completed thanks to the feedback provided by all the DB Migrations team members: Bastian Bartmann, Sergey Fedorov, Anya Zenkina, Xiaoli Liang; and the dedicated guidance of Shuhao Wu.

Sinclert Pérez is a Production Engineer at Shopify. You can connect with him on Twitter and Github.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.