Efficient and Reliable Compute Cluster Management at Scale

出处：www.uber.com

存档：存档

译文：中文

摘要

Uber relies on a containerized microservice architecture. Our need for computational resources has grown significantly over the years, as a consequence of business’ growth. It is an important goal now to increase the efficiency of our computing resources. Broadly speaking, the efficiency efforts in compute cluster management involve scheduling more workloads on the same number of machines. This approach is based on the observation that the average CPU utilization of a typical cluster is far lower than the CPU resources that have been allocated to it. The approach we have adopted is to overcommit CPU resources, without compromising the reliability of the platform, which is achieved by maintaining a safe headroom at all times. Another possible and complementary approach is to reduce the allocations of services that are overprovisioned, which we also do. The benefit of overcommitment is that we are able to free up machines that can be used to run non-critical, preemptible workloads, without purchasing extra machines.

阅读原文

xiaozi 于 2021-06-23 分享

6322

关联话题： #Uber

欢迎在评论区写下你对这篇文章的看法。

Efficient and Reliable Compute Cluster Management at Scale

Efficient and Reliable Compute Cluster Management at Scale

摘要

评论

文库