Presto® Express:以最少的资源加速查询处理

Presto® is an open-source, distributed SQL query engine designed for running interactive analytic queries on data sources of any size, from gigabytes to petabytes.

Presto® 是一个开源的分布式 SQL 查询引擎,旨在对任何规模的数据源(从千兆字节到拍字节)运行交互式分析查询。

At Uber, Presto is a critical engine for data analytics across various departments. The Operations team relies on it for dashboarding, while Uber Eats and marketing teams use its query results for pricing decisions. Presto is also essential to Uber’s compliance, growth marketing, and ad-hoc data analytics, making it a cornerstone of the company’s data-driven operations.

在Uber,Presto是各个部门数据分析的关键引擎。运营团队依赖它进行仪表盘展示,而Uber Eats和营销团队使用其查询结果进行定价决策。Presto对于Uber的合规、增长营销和临时数据分析也至关重要,使其成为公司数据驱动运营的基石。

Image

Figure 1: Uber Presto operational overview.

图1:Uber Presto操作概述。

Uber operates around 20 Presto clusters across over 10,000 nodes in 2 regions, supporting approximately 12,000 weekly active users. These users run about 500,000 queries daily, reading around 100 PB from HDFS. Presto is used to query multiple data sources, including Apache Hive™, Apache Pinot™, MySQL_®_, and Apache Kafka®, through its extensible data source connectors.

Uber 在两个地区运营大约20个 Presto 集群,跨越超过10,000个节点,支持大约12,000名每周活跃用户。这些用户每天运行大约500,000个查询,从 HDFS 读取大约100 PB 的数据。Presto 被用于查询多个数据源,包括 Apache Hive™、Apache Pinot™、MySQL_®_ 和 Apache Kafka®,通过其可扩展的数据源连接器。

This blog describes how Uber designed Presto express to reduce end-to-end SLA for fast-running Presto queries.

这篇博客描述了Uber如何设计Presto express以减少快速运行的Presto查询的端到端SLA。

Earlier last year, we observed Presto experiencing query slowness for multiple months. To work around this, we had to add more capacity. The problem of query slowness was caused by throttling. To keep ‌Presto clusters from getting overloaded, we have concurrency limits that limit the number of queries that can run concurrently on the cluster. This creates a fixed pipe, and all the queries have to contend for a spot in that pipe. 

去年早些时候,我们观察到 Presto 在多个月内查询速度缓慢。为了解决这个问题,我们不得不增加更多的容量。查询速度缓慢的问题是由限制引起的。为了防止 Presto 集群过载,我们设置了并发限制,限制了集群上可以...

开通本站会员,查看完整译文。

Home - Wiki
Copyright © 2011-2024 iteam. Current version is 2.139.0. UTC+08:00, 2024-12-22 18:18
浙ICP备14020137号-1 $Map of visitor$