在Google Cloud Storage上为Hadoop数据湖提供安全性
As part of Uber’s cloud journey, we are migrating the on-prem Apache Hadoop® based data lake along with analytical and machine learning workloads to GCP™ infrastructure platform. The strategy involves replacing the storage layer, HDFS, with GCS Object Storage (PaaS) and running the rest of the tech stack (YARN, Apache Hive™, Apache Spark™, Presto®, etc.) on GCP Compute Engine (IaaS).
作为Uber的云之旅的一部分,我们正在将本地基于Apache Hadoop®的数据湖以及分析和机器学习工作负载迁移到GCP™基础设施平台。该策略涉及将存储层HDFS替换为GCS对象存储(PaaS),并在GCP计算引擎(IaaS)上运行其余技术栈(YARN,Apache Hive™,Apache Spark™,Presto®等)。
A typical cloud adoption strategy involves using cloud-native components and integrating existing IAM with cloud IAM (e.g., federation, identity sync, etc.) (Figure 1.ii). Our strategy is somewhat unique: we continue to leverage part of the existing stack as is (except HDFS) and integrate with GCS (Figure 1.iii). This introduces technical challenges in the following two areas from security perspective:
典型的云采用策略涉及使用云原生组件并将现有的身份和访问管理(IAM)与云IAM集成(例如,联合身份验证、身份同步等)(图1.ii)。我们的策略有些独特:我们继续使用现有的一部分堆栈(除了HDFS),并与GCS集成(图1.iii)。从安全角度来看,这在以下两个方面引入了技术挑战:
- Moving to the public cloud requires a different approach to security than on-premise deployments. Hence, we have to develop new IAM controls around storage of data in GCS during the integration.
- 迁移到公共云需要与本地部署不同的安全方法。因此,在集成过程中,我们必须围绕GCS中数据的存储开发新的IAM控制。
- The tech stack to be migrated onto GCP IaaS continues to use the Hadoop security model (Kerberos-based auth, Delegation Tokens and ACLs). We would need to make this work with GCS Object Storage by bridging the differences between HDFS and GCS (GCP IAM) security models.
- 需要将技术堆栈迁移到GCP IaaS,继续使用Hadoop安全模型(基于Kerberos的身份验证、委派令牌和ACL)。我们需要通过弥合HDFS和GCS(GCP IAM)安全模型之间的差异来使其与GCS对象存储兼容。
Figure 1: Cloud Data Lake IAM Systems compared to Existing and GCP Native Stack.
图1:云数据湖IAM系统与现有和GCP原生堆栈的比较。
We have built several systems and integrations to support Uber’s data lake migration to GCP. Fast forward to today, we run over 19% of analytical workloads on GCP. In this article, we will explore deta...