Uber在Kubernetes上前往Ray的旅程:资源管理

This is the second blog in a two-part series that describes Uber’s journey to Ray® on Kubernetes®. In the first part, we introduced our motivation to the problem and the approach we took to set up a Ray-based job management system. In this blog, we zoom into how we run this job management platform on top of Kubernetes. In particular, we talk about the enhancements we made to Kubernetes to be able to run these Ray-based jobs.

这是描述Uber在Kubernetes®上迁移到Ray®的两部分系列中的第二篇博客。在第一部分中,我们介绍了我们对问题的动机以及我们为建立基于Ray的作业管理系统所采取的方法。在这篇博客中,我们深入探讨了如何在Kubernetes上运行这个作业管理平台。特别是,我们讨论了为能够运行这些基于Ray的作业而对Kubernetes所做的增强。

In the world of containerized applications, Kubernetes has emerged as the de-facto standard for orchestration. However, as we push the boundaries of large-scale, multi-tenant environments, we discovered that Kubernetes’ native resource management capabilities, while robust, leave room for optimization.

在容器化应用程序的世界中,Kubernetes已成为编排的事实标准。然而,随着我们推动大规模多租户环境的边界,我们发现Kubernetes的原生资源管理能力虽然强大,但仍有优化的空间。

In addition, the upstream components described in the first blog post make use of some of the custom abstractions that we built on top of Peloton. We adapted them to work on Kubernetes.

此外,第一篇博客中描述的上游组件利用了我们在Peloton之上构建的一些自定义抽象。我们对它们进行了调整,以便在Kubernetes上工作。

A resource pool is a logical abstraction for a subset of resources in a cluster. All resources in a cluster can be divided into hierarchical resource pools based on organizations and teams. A resource pool can contain hierarchical child resource pools to further divide the resources within an organization. Resource sharing among pools is elastic in nature—resource pools with high demand can borrow resources from other pools if they aren’t using those resources.

资源池是集群中一部分资源的逻辑抽象。集群中的所有资源可以根据组织和团队划分为分层资源池。资源池可以包含分层的子资源池,以进一步划分组织内的资源。资源池之间的资源共享本质上是弹性的——需求高的资源池可以从其他未使用这些资源的池中借用资源。

Every resource pool has different resource dimensions, such as those for CPUs, memory, disk size, and GPUs. We expect the number of resource dimensions to increase in the future as cluster management systems beg...

开通本站会员,查看完整译文。

Главная - Вики-сайт
Copyright © 2011-2025 iteam. Current version is 2.143.0. UTC+08:00, 2025-04-13 10:05
浙ICP备14020137号-1 $Гость$