从 Python3.8 到 Python3.10:我们在内存泄漏中的旅程
Image generated with ChatGPT (OpenAI), 2025.
图像由ChatGPT(OpenAI)生成,2025年。
When working with Python, memory management often feels like a solved problem. The garbage collector quietly does its job, and unlike C or C++, we rarely think about malloc or free. This doesn’t mean that there are no memory leaks in Python. Reference cycles, unreleased resources like connection pooling, global caches, etc can slowly inflate your process’s memory footprint. You might not notice it at first, until your worker starts OOM-ing, latency creeps up, or container restarts become mysteriously frequent.
在使用 Python 时,内存管理常常感觉像是一个已解决的问题。垃圾收集器默默地完成它的工作,与 C 或 C++ 不同,我们很少考虑 malloc 或 free。这并不意味着 Python 中没有内存泄漏。引用循环、未释放的资源(如连接池)、全局缓存等可能会慢慢膨胀你的进程内存占用。你可能一开始没有注意到,直到你的工作进程开始 OOM,延迟逐渐增加,或者容器重启变得神秘频繁。
In this post, we’ll share the story of a real-world memory leak we encountered during a Python upgrade — how we discovered it, the tools and techniques we used to investigate, and the lessons we learned.
在这篇文章中,我们将分享在Python升级过程中遇到的一个真实内存泄漏的故事——我们是如何发现它的,我们用来调查的工具和技术,以及我们学到的经验教训。
Back in the summer of 2024, we had an initiative at Lyft to upgrade all of our Python services from v3.8 to 3.10 as v3.8 was scheduled to be EoL by the end of 2024. You can find more details on how our awesome Backend Foundations team at Lyft does Python upgrade across hundreds of repos at scale here. The upgrade involved two phases: the first phase was to upgrade all the dependencies to be Python 3.10 compatible, and the second phase was to upgrade the services to Python 3.10. The dependency upgrades went smoothly for all services and then the phase to upgrade all services to Python 3.10 rolled out. While all services were running Python 3.10 smoothly, there was one service for which the upgrade in the test environment caused a flurry of latency spikes, resulting in timeouts for downstream services.
在2024年夏天,我们在Lyft发起了一项倡议,将我们所有的Python服务从v3.8升级到3.10,因为v3.8计划在2024年底结束生命周期。你可以在这里找到更多关于我们在Lyft的优秀后端基础团队如何在大规模上跨数百个仓库进行Python升级的详细信息。升级分为两个阶段:第一阶段是将所有依赖项升级为Python 3.10兼容,第二阶段是将服务升级...