在企业数据湖中构建数据质量

Photo by metamorworks on Shutterstock

图片来源:Shutterstock上的metamorworks

This article describes how an Enterprise Data Lake team (EDL) at PayPal built the Rule Execution Framework (REF) to address an enterprise-level opportunity: creating a centralized, enterprise-level generic rule configuration system for defining, managing, controlling, and deploying the data quality framework’s rules and rulesets.

本文介绍了PayPal的企业数据湖团队(EDL)如何建立规则执行框架(REF)来解决企业级的机会:创建一个集中的、企业级的通用规则配置系统来定义、管理、控制和部署数据质量框架的规则和规则集。

Why We Needed a Rule Execution Framework Team

为什么我们需要一个规则执行框架团队?

Teams that are engaged in data transformation at PayPal must meet specific criteria regarding data quality:

在PayPal从事数据转换的团队必须满足有关数据质量的特定标准。

  • Guarantee that 100% of data that is passed to dependent systems downstream is certified, while providing complete transparency to any exceptions
  • 保证传递到下游依赖系统的数据100%得到认证,同时对任何例外情况提供完全的透明度
  • Ensure zero data loss
  • 确保数据零损失
  • Proactively identify quality issues and remediate quickly to reduce risk and costs, including reducing the number of critical alerts
  • 主动识别质量问题并迅速进行补救,以降低风险和成本,包括减少关键警报的数量
  • Improve the quality and accuracy of data
  • 提高数据的质量和准确性
  • Keep data secure
  • 保持数据安全

A key element of meeting these requirements is the ability to define and execute data quality rules. A rule contains a formula or expression to perform an operation, such as a validation or a transformation. For data quality rules, data is validated against the rule. If it fails, the failure details are persisted in exception tables.

满足这些要求的一个关键因素是定义和执行数据质量规则的能力。一个规则包含一个公式或表达式来执行一个操作,如验证或转换。对于数据质量规则,数据会根据规则进行验证。如果它失败了,失败的细节将被保存在异常表中。

The REF team was formed to build this rule execution framework.

REF团队的成立是为了建立这个规则执行框架。

The original requirement for REF was to provide capabilities for managing and executing different types of rules for teams addressing data going into the Enterprise Data Lake (EDL). We’ve expanded the capabilities to support any enterprise teams that need to manage, control, and deploy the data quality framework’s rules.

REF的最初要求是为处理进入企业数据湖(EDL)的数据的团队提供管理和执行不...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.123.1. UTC+08:00, 2024-03-29 22:57
浙ICP备14020137号-1 $访客地图$