Slack AI:通往多云之路

Shaurya KethireddyLead Member of Technical Staff

Shaurya KethireddyLead Member of Technical Staff

In early 2023, Slack faced a foundational challenge: serving Large Language Models (LLMs) at enterprise scale with the security, reliability, and performance our customers expect. Over three years, we evolved from basic infrastructure to orchestrating a sophisticated multi-cloud architecture. We didn’t just want shiny new models; we needed a system resilient to regional outages and GPU scarcity. Our journey moved through four distinct phases, shifting from reactive infrastructure management to proactive, multi-vendor orchestration.

2023年初,Slack 面临着一个基础性挑战:以我们客户期望的安全性、可靠性和性能,在企业级规模上提供大语言模型(LLM)服务。在三年多的时间里,我们从基础设施演进到编排复杂的多云架构。我们不仅仅想要闪亮的新模型;我们需要一个能够抵御区域性中断和 GPU 短缺的系统。我们的历程经历了四个不同的阶段,从被动的基础设施管理转向主动的多供应商编排。


Phase 1: The SageMaker Era

第一阶段:SageMaker时代

When we built the initial stages of Slack AI, AWS SageMaker was the natural starting point. It was a managed ML Serving platform that offered the key things that we were looking for: Security, FedRamp compliance, model availability and control. We were able to leverage a sophisticated escrow virtual private cloud (VPC) strategy to establish a strict zero-knowledge environment: our data remained private to Slack, and the provider’s proprietary model weights remained inaccessible to us.

当我们构建Slack AI的初始阶段时,AWS SageMaker是自然的起点。它是一个托管的ML Serving平台,提供了我们正在寻找的关键要素:安全性、FedRamp合规性、模型可用性和控制力。我们能够利用复杂的托管虚拟私有云(VPC)策略来建立严格的零知识环境:我们的数据对Slack保持私密,而提供商的专有模型权重对我们保持不可访问。

To maximize uptime for a global user base, we deployed these containers across multiple AWS regions. This required our teams to manage the operational lifecycle, including cross-region IAM roles, balanced routing across model endpoints, proactive capacity planning, and auto-scaling logic.

为了最大化全球用户的正常运行时间,我们在多个AWS区域部署了这些容器。这要求我们的团队管理运维生命周期,包括跨区域IAM角色、跨模型端点的均衡路由、主动容量规划以及自动扩缩容逻辑。

The Operational Reality

运营现状

While SageMaker provided the necessary security, the overhead was immense. We faced three primary taxes:

虽然 SageMaker 提供了必要...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.2. UTC+08:00, 2026-06-26 23:34
浙ICP备14020137号-1 $mapa de visitantes$