LyftLearn 进化:重新思考 ML 平台架构

Written by Yaroslav Yatsiuk

撰写者 Yaroslav Yatsiuk

At Lyft, machine learning (ML) is the engine behind our most critical business functions — from dispatch and pricing optimization to fraud detection and support automation. Our ML infrastructure serves thousands of production models making hundreds of millions of real-time predictions per day, supported by thousands of daily training jobs that keep ML models fresh and accurate.

在Lyft,机器学习(ML)是我们最关键业务功能的引擎——从调度和定价优化到欺诈检测和支持自动化。我们的ML基础设施支持数千个生产模型,每天进行数亿次实时预测,依靠数千个每日训练作业保持ML模型的新鲜和准确。

As our scale grew, we faced a classic engineering challenge: the very complexity that powered our platform was becoming a bottleneck to its future growth. We needed to answer a fundamental question: How could we evolve our platform to accelerate innovation for our users while simplifying its underlying architecture?

随着我们的规模增长,我们面临一个经典的工程挑战:推动我们平台的复杂性正成为其未来增长的瓶颈。我们需要回答一个根本性的问题:我们如何能够发展我们的平台,以加速用户的创新,同时简化其底层架构?

This post explores how we rethought LyftLearn’s architecture to solve this problem. We’ll walk through our transition from a fully Kubernetes-based system to a hybrid platform, combining the simplicity of managed compute on AWS SageMaker for offline workloads with the flexibility of Kubernetes for online model serving. Afterwards, we’ll share the key technical decisions and trade-offs that made this evolution possible.

这篇文章探讨了我们如何重新思考LyftLearn的架构以解决这个问题。我们将介绍从完全基于Kubernetes的系统过渡到混合平台的过程,结合AWS SageMaker上离线工作负载的托管计算的简单性与Kubernetes在线模型服务的灵活性。之后,我们将分享使这一演变成为可能的关键技术决策和权衡。

LyftLearn Overview

LyftLearn 概述

LyftLearn is Lyft’s end-to-end machine learning platform, managing the complete ML lifecycle from model development to production serving. Built to support hundreds of data scientists and ML engineers, it handles the full spectrum of ML workloads at scale. The platform is composed of three integrated products:

LyftLearn 是 Lyft 的端到端机器学习平台,管理从模型开发到生产服务的完整 ML 生命周期。该平台旨在支持数百名数据科学家和机器学习工程师,能够大规模处理全方位的 ML 工作负载。该平台由三个集成产品组成:

Figure 1: LyftLearn Components

图 1:LyftLearn 组件

LyftLearn Compute (Offline S...

开通本站会员,查看完整译文。

trang chủ - Wiki
Copyright © 2011-2025 iteam. Current version is 2.148.1. UTC+08:00, 2025-11-20 16:03
浙ICP备14020137号-1 $bản đồ khách truy cập$