调试Ubuntu18升级中的PininfoService死锁。第2部分(共2篇

Solving Engineering Problems as Doing Research

解决工程问题就是做研究

Kangnan Li | Software Engineer, Key Value Systems

李康南|软件工程师,关键价值系统

unlock deadlock for PininfoService

为PininfoService解开死锁

This is part 2 of a two-part blog series on deep systems debugging techniques in a real-world scenario to upgrade our stateful systems to U18.

这是由两部分组成的博客系列的第二部分,介绍了在真实场景中的深度系统调试技术,将我们的有状态系统升级到U18。

In part 1, we narrowed down that the two issues observed — QPS drop and inconsistent memory usage — are from the PininfoService leaf layer. In this article, we narrow down the issue further to GlobalCPUExecutor (GCPU) and eventually the root cause of the issue: a deadlock.

在第一部分中,我们缩小了观察到的两个问题--QPS下降和内存使用不一致--是来自PininfoService叶层。在这篇文章中,我们将问题进一步缩小到GlobalCPUExecutor(GCPU),并最终找到问题的根本原因:死锁

To better understand how requests flow in and out of PininfoService, here is a brief summary of threads (or pools) in order used in PininfoService (also refer to Thrift intervals to learn how fbthrift server works):

为了更好地理解请求是如何进出PininfoService的,这里简要介绍了PininfoService中使用的线程(或池)的顺序(也可以参考Thrift间隔来了解fbthrift服务器的工作原理)。

  • Thrift Acceptor Thread: accept connection from clients
  • Thrift接受器线程:接受客户的连接
  • ThriftIOPool: process data in/out via established connections between PininfoService and clients who send requests to PininfoService
  • ThriftIOPool:通过PininfoService和向PininfoService发送请求的客户之间建立的连接来处理数据的输入/输出。
  • ThriftWorkerPool: the thread manager provided in the PininfoService logic to process aync_tm_ function calls
  • ThriftWorkerPool:PininfoService逻辑中提供的线程管理器,用于处理aync_tm_ 函数调用
  • GlobalCPUExecutor: a global CPU pool to delegate the heavy lifting work, such as processing the response from upstream data stores
  • 全局CPU执行器:一个全局CPU池,用于委托繁重的工作,如处理来自上游数据存储的响应。
  • ThriftClientPool: pool of clients to talk to upstream data stores
  • ThriftClientPool:与上游数据存储对话的客户端池。

We will now dive deeper into how we utilize tools to debug the two issues observed (QPS drop and inconsistent memory usage), with particular focus on the memory issue.

我们现在将深入研究我们如何利用工具来调试观察到的两个问题(QPS下降和内存使用不一致),特别是关注内存问题。

Digging int...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.1. UTC+08:00, 2024-05-17 10:32
浙ICP备14020137号-1 $访客地图$