Apache Hudi™ 在 Uber:万亿记录规模数据湖操作的工程实践

Uber operates one of the most diverse and demanding data ecosystems in the world. Every trip taken, order delivered, ad served, or real-time arrival time recalculated generates an unending stream of data. These data points come from hundreds of microservices, thousands of cities, and millions of riders, each with its own velocity, shape, and business-criticality. At the heart of this ecosystem lies Uber’s data lake: a multi-hundred-petabyte repository that fuels operational decisions, machine learning models, experimentation platforms, and real-time business intelligence.

Uber 运营着世界上最多样化和要求最高的数据生态系统之一。每一次行程、每一次订单交付、每一次广告投放或实时到达时间重新计算都会产生源源不断的数据流。这些数据点来自数百个微服务、数千个城市和数百万乘客,每个都有其自身的速度、形状和业务关键性。在这个生态系统的核心是 Uber 的数据湖:一个多数百 PB 的存储库,为运营决策、机器学习模型、实验平台和实时商业智能提供动力。

Powering this data lake requires far more than storing large volumes of data. Constant mutation, high cardinality, fast-changing schemas, and a relentless requirement for data freshness characterize Uber’s workloads. Many of Uber’s teams require data that isn’t only large in scale (trillions of rows in a single dataset), but also accurate within minutes, not hours.

为这个数据湖提供动力需要的远不止存储海量数据。持续变异、高基数、快速变化的模式以及对数据新鲜度的无情要求 характеризу了 Uber 的工作负载。Uber 的许多团队需要的数据不仅规模庞大(单个数据集中有万亿行),而且要在几分钟内准确,而不是几小时。

This combination, unprecedented scale, relentless freshness, and high operational rigor, created a unique challenge that existing data lake technologies couldn’t meet at the time. And this challenge led to the birth of Apache Hudi™, a data lake storage engine designed and built at Uber. Hudi introduced the industry to a new paradigm: bringing database-like primitives (ACID transactions, indexing, incremental processing) directly to the data lake while retaining its scalability and flexibility. Since then, Hudi has grown into a critical pillar of Uber’s data platform, powering tens of thousands of datasets, handling massive daily ingestion volumes, and providing the consistency and freshness guarantees required by virtually every business line at Uber, from mobility and del...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2026 iteam. Current version is 2.148.4. UTC+08:00, 2026-01-27 05:22
浙ICP备14020137号-1 $访客地图$