InnoDB bugs found during research on InnoDB data storage
During the process of researching InnoDB’s storage formats and building the innodb_ruby and innodb_diagrams projects discussed in my series of InnoDB blog posts, Davi Arnaut and I found a number of InnoDB bugs. I thought I’d bring up a few of them, as they are fairly interesting.
These bugs were largely discoverable due to the innodb_space utility making important internal information visible in a way that it had never been visible in the past. Using it to examine production tables provided many leads to go on to find the bugs responsible. When we initially looked at a graphical plot of free space by page produced from innodb_space data, we were quite surprised to see so many pages less than half filled (including many nearly empty). After much research we were able to track down all of the causes for the anomalies we discovered.
Bug #67718: InnoDB drastically under-fills pages in certain conditions
Due to overly-aggressive attempts to optimize page split based on insertion order during insertion, InnoDB could leave pages under-filled with as few as one record in each page. This was observed in several production systems in two cases which I believe could be quite common for others:
- Mostly-increasing keys — Twitter uses Snowflake for ID generation in a distributed way. Overall it’s quite nice. Snowflake generates 64-bit mostly-incrementing IDs that contain a timestamp component. Insertion is typically happening via queues and other non-immediate mechanisms, so IDs will find their way to the database slightly out of order.
- Nearly-ordered keys — Another schema has a Primary Key and Secondary Key which are similarly—but not exactly— ordered. Insertion into a table to copy data in either order ends up nearly ordered by the other key.
Both of these circumstances ended up tripping over this bug and causing drastically under-filled pages to appear in production databases, consuming large amounts of disk space.
Bug #67963: InnoDB wastes 62 out of every 16384 pages
InnoDB needs to occasionally allocate some internal bookkeeping pages; two for every 256 MiB of data. In order to do so, it allocates an extent (64 pages), allocates the two pages it needed, and then adds the remainder of the extent (62 free pages) to a list of extents to be used for single page allocations called FREE_FRAG. Almost nothing allocates pages from that list, so these pages go to waste.
This is fairly subtle, wasting only 0.37% of disk space in any large InnoDB table, but nonetheless interesting and quite fixable.
Bug #68023: InnoDB reserves an excessive amount of disk space for write operations
InnoDB attempts to ensure write operations will always succeed after they’ve reached a certain point by pre-reserving 1% of the tablespace size for the write operation. This is an excessive amount; 1% of every large table in a production system really adds up. This should be capped at some reasonable amount.
Bug #68501: InnoDB fails to merge under-filled pages depending on deletion order
Depending on the order that records are deleted from pages, InnoDB may not merge multiple adjacent under-filled pages together, wasting disk space.
Bug #68545: InnoDB should check left/right pages when target page is full to avoid splitting
During an insertion operation, only one of two outcomes is currently possible:
- The record fits in the target page and is inserted without splitting the page.
- The record does not fit in the target page and the page is then split into two pages, each with half of the records on the original page. After the page is split, the insertion will happen into one of the two resulting pages two pages.
This misses a very common case in practice, when the target page is full but one or more of its adjacent pages have free space or may even be nearly empty. A more intelligent alternative would be to consider merging the adjacent pages in order to make free space on the target page, rather than split the target page, creating a completely new half-full page.
Bug #68546: InnoDB stores unnecessary PKV fields in unique SK non-leaf pages
Non-leaf pages in Secondary Keys need a key that is guaranteed to be unique even though there may be many child pages with the same minimum key value. InnoDB adds all Primary Key fields to the key, but when the Secondary Key is already unique this is unnecessary. For systems with unique Secondary Keys and a large Primary Key, this can add up to a lot of disk space to store the unnecessary fields. Fixing this in a compatible way would be complex, and most users are unaffected, so I’d say it’s unlikely to be fixed.
Bug #68868: Documentation for InnoDB tablespace flags for file format incorrect
As I wrote in How InnoDB accidentally reserved only 1 bit for table format, InnoDB purportedly reserved 6 bits of a field for storing the table format (Antelope, Barracuda, etc.), but due to a bug in the C #defines only reserved 1 bit.