CPU 利用率是个谜
I deal with a lot of servers at work, and one thing everyone wants to know about their servers is how close they are to being at max utilization. It should be easy, right? Just pull up top
or another system monitor tool, look at network, memory and CPU utilization, and whichever one is the highest tells you how close you are to the limits.
我在工作中要管理大量服务器,而大家最关心的就是:这些服务器离最大利用率还有多远。听起来很简单,对吧?打开 top
或其他系统监控工具,看看网络、内存和 CPU 利用率,哪个最高就说明离瓶颈最近。
For example, this machine is at 50% CPU utilization, so it can probably do twice as much of whatever it's doing.
例如,这台机器的 CPU 利用率为 50%,因此它 大概还能把当前的工作量翻倍。
And yet, whenever people actually try to project these numbers, they find that CPU utilization doesn't quite increase linearly. But how bad could it possibly be?
然而,每当人们真正尝试预测这些数字时,他们发现 CPU 利用率并不会完全线性增长。但它究竟能有多糟?
To answer this question, I ran a bunch of stress tests and monitored both how much work they did and what the system-reported CPU utilization was, then graphed the results.
为了回答这个问题,我运行了大量压力测试,并同时监测它们完成了多少工作以及系统报告的 CPU 利用率,然后将结果绘制成图。
Setup
Setup
For my test machine, I used a desktop computer running Ubuntu with a Ryzen 9 5900X (12 core / 24 thread) processor. I also enabled Precision Boost Overdrive (i.e. Turbo).
对于我的测试机,我使用了一台运行 Ubuntu 的台式机,配备 Ryzen 9 5900X(12 核 / 24 线程)处理器,并启用了 Precision Boost Overdrive(即 Turbo)。
I vibe-coded a script that runs stress-ng in a loop, first using 24 workers and attempting to run them each at different utilizations from 1% to 100%, then using 1 to 24 workers all at 100% utilization. It used different stress testing method and measured the number of operations that could be completed ("Bogo ops1").
我用 vibe-coding 的方式写了一个脚本,它会循环运行stress-ng:先用 24 个 worker,尝试让每个 worker 的利用率从 1% 到 100% 逐步变化;然后再用 1 到 24 个 worker,全部跑满 100% 利用率。脚本采用了不同的压力测试方法,并统计了能够完成的操作数量(“Bogo ops1”)。
The reason I did two different methods was that operating systems are smart about how they schedule work, and scheduling a small number of workers at 100% utilization can be done optimally (spoilers) but with 24 workers all at 50% util...