Uber大规模PB级数据湖的I/O可观察性

As Uber’s data infrastructure evolves toward a hybrid cloud architecture, understanding data access patterns across our platform is more critical than ever. This data I/O (Input/Output) observability plays a crucial role in the journey to CloudLake (Uber’s hybrid cloud architecture). As part of the CloudLake migration, Uber is expanding its compute and storage capacity in the cloud, while gradually decommissioning on-prem capacity. This opens up a new set of problem statements. First, the cross-service provider network link is a bottleneck. Second, colocating workloads with datasets for efficient execution is envisaged, but the challenge arises due to a lot of experimental workloads with no fixed read pattern. Third, efficiently tiering our very large data footprint is crucial for cost reasons, and having a heat-map view of the I/O pattern is essential. We process millions of Apache Spark™, Presto™, and Anyscale Ray® workloads daily across Apache HDFS™ and cloud object storage, but until recently we had no visibility into how much data each job read or wrote, and from where.

随着Uber的数据基础设施向混合云架构演进,理解我们平台上的数据访问模式比以往任何时候都更加重要。这种数据I/O(输入/输出)可观察性在通往CloudLake(Uber的混合云架构)的旅程中发挥着至关重要的作用。作为CloudLake迁移的一部分,Uber正在扩大其在云中的计算和存储能力,同时逐步退役本地能力。这带来了新的问题陈述。首先,跨服务提供商的网络链接是一个瓶颈。其次,设想将工作负载与数据集共同放置以实现高效执行,但由于许多实验性工作负载没有固定的读取模式,挑战随之而来。第三,高效分层我们非常庞大的数据足迹对成本至关重要,并且拥有I/O模式的热图视图是必不可少的。我们每天在Apache HDFS™和云对象存储上处理数百万个Apache Spark™、Presto™和Anyscale Ray®工作负载,但直到最近,我们对每个作业读取或写入的数据量以及来源没有任何可见性。

This blog shares how we filled this critical observability gap and powered storage paradigms like HDFS, GCS (Google Cloud Storage™), Amazon S3®, and Oracle Cloud Infrastructure® without requiring any application code changes. The system we built now powers Uber-wide insights into:

本博客分享了我们如何填补这一关键的可观察性空白,并在不需要任何应用程序代码更改的情况下,支持 HDFS、GCS (Google Cloud Storage™)、Amazon S3® 和 Oracle Cloud Infrastructure® 等存储范式。我们构建的系统现在为 Uber 提供了全方位的洞察:

  • Cloud provider network egress attribution
  • 云服务提供商网络出口归因
  • Cross-zone traffic monitoring
  • 跨区域流量监控
  • Dataset placement for CloudLake
  • C...
开通本站会员,查看完整译文。

- 위키
Copyright © 2011-2025 iteam. Current version is 2.148.0. UTC+08:00, 2025-11-15 02:47
浙ICP备14020137号-1 $방문자$