Pinterest上的Ray基础设施

Chia-Wei Chen; Sr. Software Engineer | Raymond Lee; Sr. Software Engineer | Alex Wang; Software Engineer I | Saurabh Vishwas Joshi; Sr. Staff Software Engineer | Karthik Anantha Padmanabhan; Sr. Manager, Engineering | Se Won Jang; Sr. Manager, Engineering |

Chia-Wei Chen; 高级软件工程师 | Raymond Lee; 高级软件工程师 | Alex Wang; 软件工程师 I | Saurabh Vishwas Joshi; 高级员工软件工程师 | Karthik Anantha Padmanabhan; 高级工程经理 | Se Won Jang; 高级工程经理 |

The Journey of our Ray Infrastructure

我们的Ray基础设施之旅

In the Part 1 of our blog series, we discussed the reason why we were motivated to invest in Ray to solve critical business problems. In this blogpost, we will go one step further to describe what it takes to integrate Ray into a web-scale company like Pinterest, where we have various unique constraints and challenges to embrace new technologies. This is a more comprehensive version of Ray Infrastructure part in our talk Last Mile Data Processing for ML Training using Ray in Ray summit 2023.

在我们的博客系列第1部分中,我们讨论了我们为什么有动力投资于Ray来解决关键业务问题。在本博客文章中,我们将进一步描述如何将Ray集成到像Pinterest这样的大规模公司中,我们在接受新技术时面临各种独特的限制和挑战。这是我们在Ray峰会2023年的演讲Last Mile Data Processing for ML Training using Ray中Ray基础设施部分的更全面版本。

In our use case, being able to provision a Ray Cluster like what KubeRay provides is only part of having a matured Ray infrastructure. Companies need to follow all the other best practices suggested by Ray and other specific requirements including log, metrics persistence, network isolation, identifying optimal hardware instances, security, traffic setting, and miscellaneous internal service integrations.

在我们的用例中,能够像 KubeRay 提供的那样配置 Ray 集群只是拥有成熟的 Ray 基础设施的一部分。公司需要遵循 Ray 和其他特定要求的所有其他最佳实践,包括日志、指标持久化、网络隔离、识别最佳硬件实例、安全性、流量设置和其他内部服务集成。

The journey began in 2023 when one full-time engineer dedicated 50% of their time to this project:

这个旅程始于 2023 年,当时一位全职工程师将其 50% 的时间投入到这个项目中:

  • 2023 Q1: Prototyping stage was initiated with assistance from our partners at Anyscale
  • 2023年第一季度:在Anyscale的合作伙伴的协助下启动了原型阶段
  • 2023 Q2: Ray Infra MVP was completed, including essential tools such as logging, m...
开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-24 07:02
浙ICP备14020137号-1 $访客地图$