用于低延迟离线表分析的Pinot
Apache Pinot™ is a real-time OLAP database capable of ingesting data from streams like Apache Kafka® and offline data sources like Apache Hive™. At Uber, Pinot has proven to be really versatile in handling a wide spectrum of use cases: from real-time use cases with over one million writes per second, 100+ QPS, and <500 ms latency, to use cases which require low-latency analytics on offline data.
Apache Pinot™是一个实时OLAP数据库,能够从流(如Apache Kafka®)和离线数据源(如Apache Hive™)中摄取数据。在Uber,Pinot在处理各种用例方面表现出了极高的灵活性:从每秒超过一百万次写入、每秒超过100次查询和低于500毫秒的延迟的实时用例,到需要对离线数据进行低延迟分析的用例。
Pinot tables fall in three broad categories: real-time, offline and hybrid. Real-time tables support ingesting data from streams like Kafka, offline tables allow uploading pre-built “segments” via Pinot Controller’s HTTP APIs, and hybrid tables have both real-time and offline parts. Hybrid tables allow a single logical table (same name and schema) to ingest data from real-time streams as well as batch sources.
Pinot表分为三个广泛的类别:实时、离线和混合。实时表支持从流(如Kafka)中摄取数据,离线表允许通过Pinot Controller的HTTP API上传预构建的“segments”,而混合表既有实时部分又有离线部分。混合表允许从实时流和批处理源摄取数据的同一个逻辑表(相同的名称和模式)。
This article shares how Uber uses Pinot’s offline tables to serve 100+ low-latency analytics use cases spanning all lines of businesses.
本文介绍了Uber如何使用Pinot的离线表来提供100多个低延迟的分析用例,涵盖所有业务线。
Uber has a huge data lake with more than 100 PB of data, and we have been using Presto®, Apache Spark™ and Apache Hive™ since almost a decade to serve many of our internal analytics use-cases. Presto® in particular is quite good at handling use-cases with low QPS (in the low 10s) and latencies on the order of a few seconds. However, there are a lot of use-cases where our users need sub-second p99 latency at a higher QPS. Most users also want dedicated resources for their use-cases to avoid noisy neighbors.
Uber拥有一个超过100 PB的大型数据湖,我们几乎十年来一直使用Presto®、Apache Spark™和Apache Hive™来服务我们内部的许多分析用例。特别是Presto®非常擅长处理低QPS(在低10个查询/秒)和几秒钟级别的延迟的用例。然而,有很多用例需要我们的用户在更高的QPS下实现次秒级的p99延迟。大多数用户还希望为他们的用例提供专用资源,以避免嘈杂的邻居。
Apache Pinot’s ability to run low-...