Modernizing Uber’s Batch Data Infrastructure with Google Cloud Platform

Modernizing Uber’ s Batch Data Infrastructure with Google Cloud Platform

出处：www.uber.com

存档：存档

译文：中文

摘要

Uber计划将其批量数据分析和机器学习训练堆栈迁移到Google Cloud Platform（GCP）。他们将使用HiveSync和Hudi库来实现在两个区域之间保持数据湖同步，并将本地数据湖的数据复制到云端数据湖和对应的Hive Metastore。迁移后，他们将在GCP上为YARN和Presto集群提供新的IaaS，并通过现有的数据访问代理将流量路由到云端堆栈。迁移过程中可能会面临性能、成本管理、非分析/机器学习应用使用HDFS和未知挑战等问题，但他们计划通过改进开源连接器、利用云的弹性、迁移其他文件存储用例以及积极解决问题来解决这些挑战。

阅读原文

xiaozi 于 2024-05-31 分享

8499

关联话题： #Google #Uber #Apache Hadoop

欢迎在评论区写下你对这篇文章的看法。

Modernizing Uber’s Batch Data Infrastructure with Google Cloud Platform

Modernizing Uber’ s Batch Data Infrastructure with Google Cloud Platform

摘要

评论

文库