From Big Data to Better Data: Ensuring Data Quality with Verity

摘要

High-quality data is necessary for the success of every data-driven company. It enables everything from reliable business logic to insightful decision-making and robust machine learning modeling. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. As such, we have reached a point where the quantity of data is no longer a boundary. Yet this has come at the cost of quality.

In this post we will define data quality at a high-level and explore our motivation to achieve better data quality. We will then introduce our in-house product, Verity, and showcase how it serves as a central platform for ensuring data quality in our Hive Data Warehouse. In future posts we will discuss how Verity addresses data quality elsewhere in our data platform.

欢迎在评论区写下你对这篇文章的看法。

评论

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.2. UTC+08:00, 2024-05-18 20:01
浙ICP备14020137号-1 $访客地图$