改善Istio的传播延迟

by: Ying Zhu

作者朱颖

Introduction

简介

In this article, we’ll showcase how we identified and addressed a service mesh performance problem at Airbnb, providing insights into the process of troubleshooting service mesh issues.

在这篇文章中,我们将展示我们是如何识别和解决Airbnb的一个服务网状结构性能问题的,提供对服务网状结构问题的故障排除过程的深入了解。

Background

背景介绍

At Airbnb, we use a microservices architecture, which requires efficient communication between services. Initially, we developed a homegrown service discovery system called Smartstack exactly for this purpose. As the company grew, however, we encountered scalability issues¹. To address this, in 2019, we invested in a modern service mesh solution called AirMesh, built on the open-source Istio software. Currently, over 90% of our production traffic has been migrated to AirMesh, with plans to complete the migration by 2023.

在Airbnb,我们使用微服务架构,这需要服务之间的有效沟通。最初,我们开发了一个自制的服务发现系统,叫做Smartstack,正是为了这个目的。然而,随着公司的发展,我们遇到了可扩展性问题¹。为了解决这个问题,在2019年,我们投资了一个名为AirMesh的现代服务网格解决方案,它建立在开源的Istio软件上。目前,我们90%以上的生产流量已经迁移到AirMesh,并计划在2023年前完成迁移。

The Symptom: Increased Propagation Delay

症状:传播延时增加

After we upgraded Istio from 1.11 to 1.12, we noticed a puzzling increase in the propagation delay — the time between when the Istio control plane gets notified of a change event and when the change is processed and pushed to a workload. This delay is important for our service owners because they depend on it to make critical routing decisions. For example, servers need to have a graceful shutdown period longer than the propagation delay, otherwise clients can send requests to already-shut-down server workloads and get 503 errors.

在我们将Istio从1.11升级到1.12之后,我们注意到传播延迟出现了令人费解的增长--从Istio控制平面得到变更事件的通知到变更被处理并推送到工作负载的时间。这个延迟对我们的服务所有者很重要,因为他们依靠它来做出关键的路由决定。例如,服务器需要有一个长于传播延迟的优雅关闭期,否则客户可以向已经关闭的服务器工作负载发送请求并获得503错误。

Data Gathering: Propagation Delay Metrics

数据收集:传播延迟度量

Here’s how we discovered the condition: we had been monitoring the Istio metric pilot_proxy_convergence_time for propagation delay when we noticed an increase from 1.5 seconds (p90 in Istio 1.11) to 4.5 s...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.123.4. UTC+08:00, 2024-04-19 08:11
浙ICP备14020137号-1 $访客地图$