Robinhood,我们的内部负载均衡服务,有哪些新变化
Robinhood is the internal Dropbox load balancing service we deployed in 2020. It is responsible for routing all our internal traffic between servers to balance service load. Before we built Robinhood, most Dropbox services suffered from uneven load distribution among backends. Hardware differences throughout our fleet and the limitations of our prior load balancing algorithms led to reliability issues due to overloaded instances. In order to solve this problem, we often had to over-provision a service’s fleet, which inevitably increased our hardware spend—a pricey and avoidable headache.
Robinhood是我们在2020年部署的内部Dropbox负载均衡服务。它负责在服务器之间路由我们所有的内部流量,以平衡服务负载。在我们构建Robinhood之前,大多数Dropbox服务在后端之间的负载分布不均。我们机群中的硬件差异和之前负载均衡算法的限制导致了由于实例过载而出现的可靠性问题。为了解决这个问题,我们经常不得不为服务的机群提供过多的资源,这不可避免地增加了我们的硬件支出——这是一个昂贵且可以避免的麻烦。
Robinhood solves these long-standing load balancing problems at Dropbox scale, across our entire global data center footprint. Last year, we introduced the latest iteration to Robinhood: By leveraging proportional–integral–derivative (PID) controllers, Robinhood can now more quickly and effectively manage load imbalances. This has not only improved the reliability of our infrastructure, but yielded significant hardware cost savings. And with an increase in AI workloads that power our latest intelligent features, effectively managing demands on our GPU resources is more critical to the business than ever.
Robinhood在Dropbox规模上解决了这些长期存在的负载均衡问题,覆盖了我们整个全球数据中心的足迹。去年,我们引入了Robinhood的最新迭代:通过利用比例-积分-微分(PID)控制器,Robinhood现在可以更快更有效地管理负载不平衡。这不仅提高了我们基础设施的可靠性,还带来了显著的硬件成本节约。随着支持我们最新智能功能的AI工作负载的增加,有效管理对GPU资源的需求比以往任何时候都更加关键。
The challenge of load balancing at Dropbox
Dropbox负载均衡的挑战
Our in-house service discovery system can scale to hundreds of thousands of hosts across multiple data centers around the globe. Some Dropbox services have millions of clients; however, we cannot allow each client to create connections to every server instance. This approach puts too much memory pressure on servers, and TLS handshakes during server restarts can...