确保连续修改数据的一致性(英尺:拓扑排序)

At Mixpanel, data trust and reliability are of utmost importance. In order to safely roll out code and infrastructure changes, we maintain two independent zones of ingestion infrastructure per region. However, having two zones is not sufficient to ensure safety and we also need a way to detect changes that cause data corruption. Previously, we have written about ensuring data consistency across replicas for real-time data. This post is a follow-up that covers the work the storage infrastructure team at Mixpanel has done to guarantee that all write and modification operations on the data we store are checked for cross-replica consistency.

在 Mixpanel 中,数据的信任和可靠性至关重要。为了安全地推出代码和基础架构更改,我们每个区域维护两个独立的摄取基础架构区域。但是,拥有两个区域不足以确保安全性,我们还需要一种方法来检测导致数据损坏的更改。以前,我们已经写过确保副本之间数据一致性的文章,本文是对 Mixpanel 存储基础架构团队所做的工作的跟进,以保证我们存储的数据的所有编写和修改操作都已检查跨副本的一致性。

The high-level architecture of our data system

我们数据系统的高级架构

blue boxes represent services, and pink boxes represent databases

蓝色框表示服务,粉色框表示数据库

Above is a simplified diagram illustrating the different write (insert or update) operations for Mixpanel’s distributed, multi-tenant database, known as ‘ARB’. We replicate the above setup in two clusters in different zones, per each region (US and EU). Each zone is designed to ingest, transform, and store customer data independently from the other zone.

上面是一个简化的图示,说明了 Mixpanel 分布式多租户数据库(称为“ARB”)的写入(插入或更新)操作。我们在每个区域(美国和欧洲)的不同区域中复制上述设置的两个集群。每个区域都旨在独立于其他区域摄取、转换和存储客户数据。

Actual data that is sent in by the customer, and used for analysis within Mixpanel, is transformed by our ingestion pipelines, pushed to Kafka, then pulled off of Kafka by our Tailers services and stored in GCS long-term. Manifester keeps an index of all files belonging to each customer project and updates metadata such as the start and end Kafka offsets associated with each file. Files created from these tailing operations are periodically closed and compacted to switch from an append-only file, optimized for writes (for real-time data availability) to a columnar format, which is a more...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.0. UTC+08:00, 2024-05-06 21:51
浙ICP备14020137号-1 $访客地图$