Uber是如何将ETL工作负载从Hive迁移到Spark SQL的

Apache Hive™ on Apache Spark™ has been the preferred engine for ETL workloads at Uber. Hive on Spark supports a wide range of use cases across various verticals like compliance, financial reporting, planning, forecasting, fraud, and risk analysis. Before the migration, there were about 18,000 Hive ETL workflows generating around 5 million queries per month, contributing to significant percentage of Uber’s total Yarn usage. Additionally, Hive was used for interactive use cases, handling around 150,000 interactive queries monthly.

Apache Hive™ 在 Apache Spark™ 上一直是 Uber ETL 工作负载的首选引擎。Hive on Spark 支持各种垂直领域的广泛用例,如合规性、财务报告、规划、预测、欺诈和风险分析。在迁移之前,大约有 18,000 个 Hive ETL 工作流每月生成约 500 万个查询,占据了 Uber 总 Yarn 使用量的显著百分比。此外,Hive 还用于交互式用例,每月处理约 150,000 个交互式查询。

This blog talks about our migration journey from Hive to Apache Spark SQL™ and the challenges faced on the way.

本博客讨论了我们从 Hive 迁移到 Apache Spark SQL™ 的旅程以及过程中面临的挑战。

We decided to move from Hive to Spark SQL because of compute efficiency and modernization. Spark SQL offers better performance in Spark 3 than Hive for the same query due to features like adaptive query execution, dynamic partition pruning, and more. When we compared Spark SQL with Hive, initial workload results showed up to 4x performance benefits from Spark SQL.

我们决定从 Hive 转向 Spark SQL,因为计算效率和现代化。由于自适应查询执行、动态分区修剪等功能,Spark SQL 在 Spark 3 中对相同查询提供了比 Hive 更好的性能。当我们将 Spark SQL 与 Hive 进行比较时,初始工作负载结果显示 Spark SQL 的性能提升可达 4 倍。

Spark SQL also offers a robust and active OSS community. In contrast, Hive is becoming obsolete in the OSS community. At Uber, we use Hive on Spark, which has been discontinued in OSS since Hive 3.

Spark SQL 还提供了一个强大而活跃的 OSS 社区。相比之下,Hive 在 OSS 社区中正变得过时。在 Uber,我们使用 Hive on Spark,自 Hive 3 起在 OSS 中已被停止支持。

This change also helps us simplify batch analytics at Uber. A lot of Uber teams have shifted Hive workloads to Spark for better efficiency.

这一变化还帮助我们简化了 Uber 的批量分析。许多 Uber 团队已将 Hive 工作负载转移到 Spark,以提高效率。

To understand where we started, Figure 1 shows the components of and around the Hive ecosystem at Uber...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.144.0. UTC+08:00, 2025-06-14 08:27
浙ICP备14020137号-1 $mapa de visitantes$