阿尔琼:Myntra的数据真实性框架
The world of Big Data is generally characterized by the big 3V’s - Volume, Velocity, and Variety. But here is one more V which is getting increasingly spotlighted in the world of big data which is Veracity [1].
大数据世界通常以三个V来描述 - 体积(Volume)、速度(Velocity)和多样性(Variety)。但在大数据世界中,还有一个V越来越受到关注,那就是准确性 [1]。
What is Veracity?
什么是准确性?
Veracity is one of the characteristics of big data related to consistency, accuracy, quality, and trustworthiness. Data Veracity refers to the biasedness, noise, and abnormality in data. It also refers to incomplete data or the presence of errors, outliers, and missing values. [2]
准确性是与一致性、准确性、质量和可信度相关的大数据特征之一。数据准确性指的是数据中的偏见、噪音和异常性。它还指不完整的数据或存在错误、异常值和缺失值。[2]
What are the sources of error in data?
数据错误的来源是什么?
- Software or application bugs can transform or miscalculate the data.
- 软件或应用程序错误可能会转换或错误计算数据。
- Anomaly and Ambiguity of data
- 数据的异常和模糊性
- Human error
- 人为错误
Why is it important?
为什么重要?
Big data is extremely complex and it is yet to be discovered how to unleash its complete potential. It is fruitless to use big data without validating and explaining it. Only trustworthy data can add value to any analysis.
大数据非常复杂,尚未发现如何充分发挥其潜力。在没有验证和解释的情况下使用大数据是徒劳的。只有可信的数据才能为任何分析增加价值。
Origin of Arjun in Myntra
Arjun在Myntra的起源
The Data Platform at Myntra has the capability to capture data from different data sources, relational as well as non-relational, and store it on a data lake or data warehouse for further analysis. The minimum requirement of any pipeline which works as a bridge between source and target needs to be consistent at the data level and schema level. Every component in the pipeline should capture data correctly. Data platforms should also have their own rigorous checks & balances to become a source of truth in analytical & transactional data. This solves the problem of data disputes & mismatches which reflect only after a long time & fixing this requires a lot of manual intervention and developer effort. Here is the tale to create a framework that can increase trust in the consistency, accuracy, and qual...