Uber的AI/ML基础设施扩展

Machine Learning (ML) is celebrating its 8th year at Uber since we first started using complex rule-based machine learning models for driver-rider matching and pricing teams in 2016. Since then, our progression has been significant, with a shift towards employing deep learning models at the core of most business-critical applications today, while actively exploring the possibilities offered by Generative AI models. As the complexity and scale of AI/ML models continue to surge, there’s a growing demand for highly efficient infrastructure to support these models effectively. Over the past few years, we’ve strategically implemented a range of infrastructure solutions, both CPU- and GPU-centric, to scale our systems dynamically and cater to the evolving landscape of ML use cases. This evolution has involved tailored hardware SKUs, software library enhancements, integration of diverse distributed training frameworks, and continual refinements to our end-to-end Michaelangelo platform. These iterative improvements have been driven by our learnings along the way, and continuous realignment with industry trends and Uber’s trajectory, all aimed at meeting the evolving requirements of our partners and customers.

机器学习(ML)自2016年我们首次在司机-乘客匹配和定价团队中使用复杂的基于规则的机器学习模型以来,在Uber已经有了8年的历史。从那时起,我们的进展是显著的,核心业务关键应用程序的转向使用深度学习模型,同时积极探索生成式AI模型所提供的可能性。随着AI/ML模型的复杂性和规模不断增长,对高效基础设施的需求也越来越大,以有效支持这些模型。在过去几年中,我们战略性地实施了一系列基础设施解决方案,既有CPU为中心的,也有GPU为中心的,以动态扩展我们的系统并满足ML用例的不断变化的需求。这一演变涉及定制的硬件SKU、软件库增强、多样化分布式训练框架的集成以及对我们的端到端Michaelangelo平台的持续改进。这些迭代改进是由我们一路上的学习和与行业趋势以及Uber的轨迹的持续调整推动的,旨在满足合作伙伴和客户不断变化的需求。

As we embarked on the transition from on-premise to cloud infrastructure that we announced in February 2023, our HW/SW co-design and collaboration across teams was driven by the specific objectives of: 

当我们在2023年2月宣布从本地基础设施过渡到云基础设施时,我们的硬件/软件协同设计和团队合作是由以下具体目标驱动的:

  1. Maximizing the utilization of current infrastructure
  2. 最大化当前基础设施的利用率
  3. Establishing new systems for emerging workloads, such as Generative AI
  4. 建立新的用于新兴工作负载(如生成式AI)的系统

In pursuit of these goals, we ou...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-23 06:20
浙ICP备14020137号-1 $访客地图$