DataCentral:Uber的大数据可观测性和费用分摊平台

In this blog, we will walk you through DataCentral, Uber’s homegrown Big Data Observability, Attribution, and Governance platform. This blog gives a high-level overview of DataCentral’s key features. Before we get into the what and why of DataCentral, let’s do a quick primer of Uber’s Data ecosystem and its challenges.

在本博客中,我们将为您介绍DataCentral,Uber自主开发的大数据可观测性、归因和治理平台。本博客对DataCentral的关键功能进行了高级概述。在深入了解DataCentral的内容之前,让我们快速了解一下Uber的数据生态系统及其面临的挑战。

Image

Figure 1: Uber’s Big Data Landscape.

图1:Uber的大数据景观。

Uber’s data infrastructure is composed of a wide variety of compute engines, scheduling/execution solutions, and storage solutions. Compute engines such as Apache Spark™, Presto®, Apache Hive™, Neutrino, Apache Flink®, etc., allow Uber to run petabyte-scale operations on a daily basis. Further, scheduling and execution engines such as Piper (Uber’s fork of Apache Airflow™), Query Builder (user platform for executing compute SQLs), Query Runner (proxy layer for execution of workloads), and Cadence (workflow orchestration engine, open-sourced by Uber) exist to allow scheduling and execution of compute workloads. Finally, a significant portion of storage is supported by HDFS, Google Cloud Storage (GCS), AWS S3, Apache Pinot™, ElasticSearch®, etc. Each engine supports thousands of executions, which are owned by multiple owners (uOwn) and sub-teams.

Uber的数据基础设施由各种计算引擎、调度/执行解决方案和存储解决方案组成。计算引擎如Apache Spark™、Presto®、Apache Hive™、Neutrino、Apache Flink®等,使Uber能够每天运行PB级别的操作。此外,调度和执行引擎如Piper(Uber的Apache Airflow™分支)、Query Builder(用于执行计算SQL的用户平台)、Query Runner(用于执行工作负载的代理层)和Cadence(由Uber开源的工作流编排引擎)存在,以允许调度和执行计算工作负载。最后,大部分存储由HDFS、Google Cloud Storage(GCS)、AWS S3、Apache Pinot™、ElasticSearch®等支持。每个引擎支持数千个执行,由多个所有者(uOwn)和子团队拥有。

With such a complex and diverse big data landscape operating at petabyte-scale and around a million applications/queries running each day, it’s imperative to provide the stakeholders a holistic view of the right performance and resource consumption insights. 

在如此复杂和多样化的大数据环境中,每天运行着PB级别的应用程序/查询,为利益相关者提供全面的性能和资源消耗洞察是至...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-22 15:04
浙ICP备14020137号-1 $访客地图$