在Google Cloud Storage上为Hadoop数据湖提供安全性

As part of Uber’s cloud journey, we are migrating the on-prem Apache Hadoop® based data lake along with analytical and machine learning workloads to GCP™ infrastructure platform. The strategy involves replacing the storage layer, HDFS, with GCS Object Storage (PaaS) and running the rest of the tech stack (YARN, Apache Hive™, Apache Spark™, Presto®, etc.) on GCP Compute Engine (IaaS). 

作为Uber的云之旅的一部分,我们正在将本地基于Apache Hadoop®的数据湖以及分析和机器学习工作负载迁移到GCP™基础设施平台。该策略涉及将存储层HDFS替换为GCS对象存储(PaaS),并在GCP计算引擎(IaaS)上运行其余技术栈(YARN,Apache Hive™,Apache Spark™,Presto®等)。

A typical cloud adoption strategy involves using cloud-native components and integrating existing IAM with cloud IAM (e.g., federation, identity sync, etc.) (Figure 1.ii). Our strategy is somewhat unique: we continue to leverage part of the existing stack as is (except HDFS) and integrate with GCS (Figure 1.iii). This introduces technical challenges in the following two areas from security perspective:

典型的云采用策略涉及使用云原生组件并将现有的身份和访问管理(IAM)与云IAM集成(例如,联合身份验证、身份同步等)(图1.ii)。我们的策略有些独特:我们继续使用现有的一部分堆栈(除了HDFS),并与GCS集成(图1.iii)。从安全角度来看,这在以下两个方面引入了技术挑战:

  1. Moving to the public cloud requires a different approach to security than on-premise deployments. Hence, we have to develop new IAM controls around storage of data in GCS during the integration.   
  2. 迁移到公共云需要与本地部署不同的安全方法。因此,在集成过程中,我们必须围绕GCS中数据的存储开发新的IAM控制。   
  3. The tech stack to be migrated onto GCP IaaS continues to use the Hadoop security model (Kerberos-based auth, Delegation Tokens and ACLs). We would need to make this work with GCS Object Storage by bridging the differences between HDFS and GCS (GCP IAM) security models.
  4. 需要将技术堆栈迁移到GCP IaaS,继续使用Hadoop安全模型(基于Kerberos的身份验证、委派令牌和ACL)。我们需要通过弥合HDFS和GCS(GCP IAM)安全模型之间的差异来使其与GCS对象存储兼容。

Image

Figure 1: Cloud Data Lake IAM Systems compared to Existing and GCP Native Stack.

图1:云数据湖IAM系统与现有和GCP原生堆栈的比较。

We have built several systems and integrations to support Uber’s data lake migration to GCP. Fast forward to today, we run over 19% of analytical workloads on GCP. In this article, we will explore deta...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.146.0. UTC+08:00, 2025-10-25 18:47
浙ICP备14020137号-1 $访客地图$