Airbnb如何建立 "墙 "来防止数据错误的发生

Gaining trust in data with extensive data quality, accuracy and anomaly checks

通过广泛的数据质量、准确性和异常情况检查获得对数据的信任

As shared in our Data Quality Initiative post, Airbnb has embarked on a project of massive scale to ensure trustworthy data across the company. To enable employees to make faster decisions with data and provide better support for business metric monitoring, we introduced Midas, an analytical data certification process that certifies all important metrics and data sets. As part of that process, we made robust data quality checks and anomaly detection mandatory requirements to prevent data bugs propagating through the data warehouse. We also created guidelines on which specific data quality checks need to be implemented as part of the data model certification process. Adding data quality checks in the pipeline has become a standard practice in our data engineering workflow, and has helped us detect many critical data quality issues earlier in the pipelines.

正如我们在《数据质量倡议》一文中所分享的,Airbnb已经开始了一个大规模的项目,以确保整个公司的数据是可信的。为了使员工能够利用数据做出更快的决策,并为业务指标监测提供更好的支持,我们引入了Midas,这是一个分析性数据认证过程,对所有重要的指标和数据集进行认证。作为该流程的一部分,我们将强大的数据质量检查和异常检测作为强制性要求,以防止数据错误在数据仓库中传播。我们还创建了指南,说明哪些具体的数据质量检查需要作为数据模型认证过程的一部分来实施。在管道中添加数据质量检查已经成为我们数据工程工作流程的标准做法,并帮助我们在管道中更早地发现许多关键的数据质量问题。

In this blog post we will outline the challenges we faced while adding a massive number of data checks (i.e. data quality, accuracy, completeness and anomaly checks) to prevent data bugs company-wide, and how that motivated us to build a new framework to easily add data checks at scale.

在这篇博文中,我们将概述我们在增加大量的数据检查(即数据质量、准确性、完整性和异常检查)以防止全公司的数据错误时所面临的挑战,以及这如何促使我们建立一个新的框架来轻松地大规模增加数据检查。

Challenges

挑战

When we first introduced the Midas analytical data certification process, we created recommendations on what kind of data quality checks need to be added, but we did not enforce how they were to be implemented. As a result, each data engineering team adopted their own approach, which presented the following challenges:

当我们第一次引入Midas分析性数据认证流程时,我们就需要增加什么样的数据质量检查提出了建议,但我们没有强制要求如何实施这些检查。因此,每个...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.131.0. UTC+08:00, 2024-09-18 03:48
浙ICP备14020137号-1 $访客地图$