容量推荐引擎：基于吞吐量和利用率的预测性扩展

Capacity Recommendation Engine: Throughput and Utilization Based Predictive Scaling

Capacity is a key component of reliability. Uber’s services require enough resources in order to handle daily peak traffic and to support our different kinds of business units. These services are deployed across different cloud platforms and data centers (“zones”). With manual capacity management, it often results in an over-provisioned capacity, which is insufficient for resource usage. Uber built an auto-scaling service, which is able to manage and adjust resources for thousands of micro services. Currently, our auto-scaling service is based on a pure utilization metric. We recently built a new system, Capacity Recommendation Engine (CRE), with a new algorithm that relies on throughput and utilization based scaling with machine learning modeling. The model provides us with the relationship between the golden signal metrics and service capacity. With reactive prediction, CRE helps us to estimate the zonal service capacity based on linear regression modeling and peak traffic estimation. Apart from capacity, the analysis report can also tell us different zonal service characteristics and performance regression. In this article, we will deep dive into CRE’s modeling and system architecture, and present some analysis of its results.

容量是可靠性的一个关键组成部分。Uber的服务需要足够的资源，以处理每日高峰流量，并支持我们不同种类的业务部门。这些服务被部署在不同的云平台和数据中心（"区域"）。通过人工容量管理，往往会造成容量过大，不足以满足资源使用。Uber建立了一个自动扩展服务，能够管理和调整成千上万的微型服务的资源。目前，我们的自动扩展服务是基于一个纯粹的利用率指标。我们最近建立了一个新的系统--容量推荐引擎（CRE），它的新算法依赖于基于吞吐量和利用率的扩展与机器学习建模。该模型为我们提供了黄金信号指标和服务能力之间的关系。通过反应式预测，CRE帮助我们在线性回归建模和高峰流量估计的基础上估计分区的服务能力。除了容量之外，分析报告还可以告诉我们不同的分区服务特征和性能回归。在这篇文章中，我们将深入研究CRE的建模和系统架构，并提出一些对其结果的分析。

Utilized Metrics

使用的衡量标准

In terms of capacity management, utilization is one of the most widely used metrics for auto-scaling. In CRE, besides utilization, we also consider throughput as another important metric for capacity estimation. Throughput presents the business product requirement. At service level, it translates to requests per second (RPS) for each instance. Whenever there are new products launching and dependenc...