重新存储PB级数据,为我们最大的客户提高性能

As our customers grow, their data volume grows as well. A customer that sends us 10 Million events per day today might send us 100 Million or 1 Billion events a year from now. Mixpanel allocates a default number of resources per customer to ingest, transform, and query their data. Growing data volumes put pressure on these resources, leading to increased latencies and higher unit costs. This can lead to a poor customer experience. In this post, we’re going to talk about the work we did recently to seamlessly increase resource allocation for our largest customer, reducing query latencies by 65% and data transformation costs by 30%.

随着我们客户的增长,他们的数据量也在增长。今天每天向我们发送1000万个事件的客户,一年后可能向我们发送1亿或10亿个事件。Mixpanel为每个客户分配了默认的资源数量,以摄入、转换和查询他们的数据。不断增长的数据量给这些资源带来压力,导致延迟增加,单位成本提高。这可能导致客户体验不佳。在这篇文章中,我们将谈论我们最近所做的工作,为我们最大的客户无缝增加资源分配,将查询延迟减少65%,数据转换成本减少30%。

Background

背景介绍

Mixpanel’s database, Arb, splits up customer data into smaller buckets called shards. Data in each shard is stored separately, and processes like ingestion (one-time processing and row-based storage), compaction (conversion to a columnar format), and queries are performed independently on each shard. Arb is a multi-tenant database, and this allows us to provide a consistent experience to all customers regardless of their data volume. Customers are given 200 shards by default, but larger customers can be given 800 or even 3200 shards. Each customer has a sharding spec, a piece of metadata describing how many shards they are assigned and where the shards live in Arb.

Mixpanel的数据库Arb将客户数据分割成更小的桶,称为 分片.每个分片中的数据都是单独存储的,像摄取(一次性处理和基于行的存储)、压缩(转换为列格式)和查询等过程都在每个分片上独立进行。Arb是一个多租户数据库,这使我们能够为所有客户提供一致的体验,无论其数据量如何。客户默认获得200个分片,但大客户可以获得800甚至3200个分片。每个客户都有一个 分片规格,这是一段元数据,描述了他们被分配了多少个分片,以及分片在Arb中的位置。

Within a customer’s instance (called a project), data is sharded by user id, meaning that all the data for each customer’s end user will live in the same place, allowing for quick user-based analysis. For example, you can use the following formula to determine which shard each datum shoul...

开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2024 iteam. Current version is 2.134.0. UTC+08:00, 2024-09-28 10:18
浙ICP备14020137号-1 $お客様$