By: Clark Wright
These days, as the volume of data collected by companies grows exponentially, we’re all realizing that more data is not always better. In fact, more data, especially if you can’t rely on its quality, can hinder a company by slowing down decision-making or causing poor decisions.
With 1.4 billion cumulative guest arrivals as of year-end 2022, Airbnb’s growth pushed us to an inflection point where diminishing data quality began to hinder our data practitioners. Weekly metric reports were difficult to land on time. Seemingly basic metrics like “Active Listings” relied on a web of upstream dependencies. Conducting meaningful data work required significant institutional knowledge to overcome hidden caveats in our data.
To meet this challenge, we introduced the “Midas” process to certify our data. Starting in 2020, the Midas process, along with the work to re-architect our most critical data models, has brought a dramatic increase in data quality and timeliness to Airbnb’s most critical data. However, achieving the full data quality criteria required by Midas demands significant cross-functional investment to design, develop, validate, and maintain the necessary data assets and documentation.
While this made sense for our most critical data, pursuing such rigorous standards at scale presented challenges. We were approaching a point of diminishing returns on our data quality investments. We had certified our most critical assets, restoring their trustworthiness. However, for all of our uncertified data, which remained the majority of our offline data, we lacked visibility into its quality and didn’t have c...