在使用cgo绑定的Go应用程序中发现内存泄漏

In this post, I’d like to share a story how my team was able to find and fix a memory leak in a Go app that’s been using a leaking C extension through cgo.

在这篇文章中,我想分享一个故事,我的团队是如何发现并修复一个Go应用中的内存泄漏的,这个应用一直在通过cgo使用一个泄漏的C语言扩展。

Usually, finding a leak in Go apps is rather trivial thanks to the built-in profiling tool that comes with Go. go tool pprof with a minimum setup steps will show you all recent allocations and the overview of the memory heap. Our case turned out to be a lot more interesting.

通常情况下,发现Go应用中的泄漏是相当简单的,这要归功于Go内置的剖析工具。go tool pprof ,只需最低限度的设置步骤,就可以显示所有最近的分配和内存堆的概况。而我们的案例则要有趣得多。

At work, we have an internal service discovery application written in Go and backed by Zookeeper. Zookeeper is great as a distributed configuration store, but its protocol is quite complex, so we have a REST API wrapper in Go on top of it to make it easy to consume from other apps. This service discovery tool answers questions like “in which datacenter and region does a shop live?” or “to which region should we send a new shop?” or “what’s the list of IPs that are bad citizens?”.

在工作中,我们有一个用Go编写的内部服务发现程序,由Zookeeper支持。Zookeeper作为一个分布式的配置存储是非常好的,但是它的协议相当复杂,所以我们在它上面有一个Go的REST API包装器,使它很容易被其他应用程序所使用。这个服务发现工具可以回答诸如 "一个商店住在哪个数据中心和地区?"或 "我们应该把一个新的商店送到哪个地区?"或 "什么是坏公民的IP列表?"等问题。

The problem we’ve been seeing was that the memory consumed by the app has been growing unexpectedly fast. It would boot taking ~50Mb of RSS and then grow to >500Mb in a matter of hours until it was killed by container’s OOM setting.

我们看到的问题是,该应用所消耗的内存一直在意外地快速增长。它开机后会占用约50Mb的RSS,然后在几个小时内增长到>500Mb,直到它被容器的OOM设置杀死。

This was a typical saw wave memory leak situation.

这是一个典型的锯齿波内存泄漏情况。

We could allow the container to take more memory and buy more time until it would get killed but that would only threat the symptom. We really wanted to figure out what was wrong with it.

我们可以让容器占用更多的内存,争取更多的时间,直到它被杀死,但这只能威胁到症状。我们真的想弄清楚它出了什么问题。

The first thing to do was to add import _ "net/http/pprof" and attach to the profiling port. You could even do that to a production container in ...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.3. UTC+08:00, 2024-05-21 00:02
浙ICP备14020137号-1 $访客地图$