克雷恩。Uber的下一代基础设施堆栈

Uber has been on a multi-year journey to reimagine our infrastructure stack for a hybrid, multi-cloud world. The internal code name for this project is Crane. In this post we’ll examine the original motivation behind Crane, requirements we needed to satisfy, and some key features of our implementation. Finally, we’ll wrap up with some forward-looking views for Uber’s infrastructure.

Uber已经进行了多年的旅程，为混合、多云的世界重新构想我们的基础设施堆栈。这个项目的内部代号是Crane。在这篇文章中，我们将研究Crane背后的原始动机，我们需要满足的要求，以及我们实施的一些关键功能。最后，我们将对Uber的基础设施提出一些前瞻性的看法。

In the Beginning…

在开始的时候...

In 2018 Uber was facing 3 major challenges with respect to our infrastructure:

2018年，Uber在我们的基础设施方面面临着3大挑战。

The size of our server fleet was growing rapidly, and our tooling and teams weren’t able to keep up. Many operations for managing servers were still manual. The automated tooling we did have was constantly breaking down. Both the manual operations and automated tooling were frequent outage culprits. In addition, operational load was taking a severe toll on teams, which meant less time for them to work on fundamental software fixes, leading to a vicious cycle.
我们的服务器群的规模正在迅速增长，而我们的工具和团队无法跟上。许多管理服务器的操作仍然是手动的。我们所拥有的自动化工具也在不断发生故障。手工操作和自动化工具都是经常中断的罪魁祸首。此外，运行负荷对团队造成了严重的损失，这意味着他们用于基本软件修复的时间减少，导致了恶性循环。
Fleet size growth came with the need to expand into more data centers/availability zones. What little tooling existed for turning up new zones was ad hoc, with the vast majority of the work being manual and diffused across many different infrastructure teams. Turning up a new zone took multiple months across dozens of teams and hundreds of engineers. In addition, circular dependencies between infrastructure components often led to awkward bootstrapping problems that were difficult to solve.
车队规模的增长伴随着需要扩展到更多的数据中心/可用性区域。现有的用于建立新区域的少量工具是临时性的，绝大部分工作都是手工操作，并且分散在许多不同的基础设施团队中。建立一个新的区域需要几个月的时间，涉及几十个团队和几百个工程师。此外，基础设施组件之间的循环依赖常常导致难以解决的尴尬的启动问题。
Our server fleet consisted mostly of on-prem machines, with limited ability to take advantage of additional capacity that was available in the cloud. W...