我们如何在30个关键任务服务中节省了7万个内核(大规模、半自动化的Go GC调整@Uber)。

As part of Uber engineering’s wide efforts to reach profitability, recently our team was focused on reducing cost of compute capacity by improving efficiency. Some of the most impactful work was around GOGC optimization. In this blog we want to share our experience with a highly effective, low-risk, large-scale, semi-automated Go GC tuning mechanism.
作为Uber工程为实现盈利所做的广泛努力的一部分,最近我们的团队专注于通过提高效率降低计算能力的成本。一些最有影响的工作是围绕GOGC的优化。在这篇博客中,我们想分享我们在高效、低风险、大规模、半自动化的Go GC调整机制方面的经验。
Uber’s tech stack is composed of thousands of microservices, backed by a cloud-native, scheduler-based infrastructure. Most of these services are written in Go. Our team, Maps Production Engineering, has previously played an instrumental role in significantly improving the efficiency of multiple Java services by tuning GC. At the beginning of 2021, we explored the possibilities of having a similar impact on Go-based services. We ran several CPU profiles to assess the current state of affairs and we found that GC was the top CPU consumer for a vast majority of mission-critical services. Here is a representation of some CPU profiles where GC (identified by the runtime.scanobject method) is consuming a significant portion of allocated compute resources.
Uber的技术栈是由数以千计的微服务组成的,由一个云原生的、基于调度器的基础设施支持。这些服务大多是用Go编写的。我们的团队,地图生产工程,之前通过调整GC,在大幅提高多个Java服务的效率方面发挥了重要作用。在2021年初,我们探索了对基于Go的服务产生类似影响的可能性。我们运行了几个CPU配置文件来评估当前的状况,我们发现GC是绝大多数关键任务服务的首要CPU消耗者。下面是一些CPU配置文件的表示,其中GC(由运行时.scanobject方法识别)消耗了分配的计算资源的很大一部分。
Service #1
服务#1

Figure 1: GC CPU cost of Example Service #1
图1:示例服务#1的GC CPU成本
Service #2
服务#2

Figure 2: GC CPU cost of Example Service #1
图2:示例服务#1的GC CPU成本
Emboldened by this finding, we commenced to tune GC for the relevant services. To our delight, Go’s GC implementation and the simplicity of tuning allowed us to automate the bulk of the detection and tuning mechanism. We detail our approach and its impact in the following sections.
在这一发现的鼓舞下,我们开始对相关服务的GC进行调整。令我们高兴的是,Go的GC实现和调整的简单性使我们能够自动完成大部分的检测和调整机制。我们在下面的章节中详细介绍了我们的方法和它的影响。
GOGC Tuner
GOGC调谐器
Go runtime inv...