Sourcer。规模化的MySQL摄取

In the first blog published earlier, we discussed Sourcerer — a Data Ingestion framework, part of the Myntra Data Platform (MDP), at a 10 thousand foot view and how it evolved to support large scale ingestion. In this second blog on Sourcerer, we will dive deeper into transactional data ingestion from MySQL databases into MDP.

在之前发表的第一篇博客中,我们从万丈高楼平地起的角度讨论了数据摄取框架Sourcer--Myntra数据平台(MDP)的一部分,以及它是如何发展到支持大规模的摄取的。在关于Sourcer的第二篇博客中,我们将深入探讨从MySQL数据库到MDP的事务性数据摄取。

At Myntra, data related to Orders, Logistics, Warehouse Management, Coupons, Returns, Catalog and many more systems are stored in MySQL databases. Analytics on these datasets is critical and is used to generate several monthly financial reports, optimise delivery (using ML), enhance catalog and inventory management, etc. The first step towards enabling analytics for these functional areas is to make the transactional data available as-is in our Data lake to enable further transactional analytics or historical analysis through snapshots and aggregates. This is where Sourcerer comes in.

在Myntra,与订单、物流、仓库管理、优惠券、退货、目录和许多其他系统相关的数据都存储在MySQL数据库中。对这些数据集的分析是至关重要的,它被用来生成一些月度财务报告,优化交付(使用ML),加强目录和库存管理,等等。为这些功能区实现分析的第一步是使交易数据在我们的数据湖中可用,以便通过快照和聚合实现进一步的交易分析或历史分析。这就是Sourcer的作用。

MySQL Ingestion is one of the most prominent flows of Sourcerer, ingesting hundreds of tables, amounting to terabytes of Data per day. We will talk through the architecture of the MySQL Ingestion first and then go through some of the challenges we faced at Myntra’s scale.

MySQL摄取是Sourcer最突出的流程之一,每天摄取数以百计的表,达到数兆字节的数据。我们将首先讨论MySQL Ingestion的架构,然后再讨论我们在Myntra的规模下所面临的一些挑战。

Architecture

架构

Sourcerer holds the responsibility to keep the exact snapshot of source in the Data Platform. This ingested raw Data set is then cleansed and filtered via different processing jobs and is made available to business users on the Data Warehouse for near real time or operational/transactional analytics. The architectural diagram is as follows:

Sourcer有责任在数据平台中保持源的精确快照。摄取的原始数据集然后通过不同的处理工作进行清理和过滤,并提供给数据仓库中的业务用户进行近乎实时的或操作/交易的分析。架构图...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-22 14:55
浙ICP备14020137号-1 $访客地图$