大规模下的无缝 Istio 升级
[
[
Zoom image will be displayed
将显示缩放图像
Airbnb has been running Istio® at scale since 2019. We support workloads running on both Kubernetes and virtual machines (using Istio’s mesh expansion). Across these two environments, we run tens of thousands of pods, dozens of Kubernetes clusters, and thousands of VMs. These workloads send tens of millions of QPS at peak through Istio. Our IstioCon 2021 talk describes our journey onto Istio and our KubeCon 2021 talk goes into further detail on our architecture.
Airbnb 自 2019 年以来一直在大规模运行 Istio®。我们支持在 Kubernetes 和虚拟机(使用 Istio 的 mesh expansion)上运行的工作负载。在这两种环境中,我们运行了数万个 pod、数十个 Kubernetes 集群和数千台 VM。这些工作负载在峰值时通过 Istio 发送数千万 QPS。我们的 IstioCon 2021 演讲 描述了我们采用 Istio 的历程,而我们的 KubeCon 2021 演讲 进一步介绍了我们的架构。
Istio is a foundational piece of our architecture, which makes ongoing maintenance and upgrades a challenge. Despite that, we have upgraded Istio a total of 14 times. This blog post will explore how the Service Mesh team at Airbnb safely upgrades Istio while maintaining high availability.
Istio 是我们架构的基础组件,这使得持续维护和升级成为一项挑战。尽管如此,我们总共已升级 Istio 14 次。这篇博文将探讨 Airbnb 的 Service Mesh 团队如何在保持高可用的同时安全地升级 Istio。
Challenges
Challenges
Airbnb engineers collectively run thousands of different workloads. We cannot reasonably coordinate the teams that own these, so our upgrades must function independently of individual teams. We also cannot monitor all of these at once, and so we must minimize risk through gradual rollouts.
Airbnb 的工程师共同运行着数千种不同的工作负载。我们无法合理地协调这些工作负载所属的团队,因此我们的升级必须独立于各个团队运行。我们也无法一次性监控所有这些工作负载,因此必须通过渐进式发布来降低风险。
With that in mind, we designed our upgrade process with the following goals:
基于这一点,我们设计的升级流程遵循以下目标:
- Zero downtime for workloads and users. This is the seamless part of the upgrade — a workload owner doesn’t need to be in the loop for Istio upgrades.
- 工作负载和用户零停机。这就是升级...