Pinterest的批量处理平台的高效资源管理
Yongjun Zhang | Software Engineer; Ang Zhang | Engineering Manager; Shaowen Wang | Software Engineer, Batch Processing Platform Team
张永军|软件工程师;张昂|工程经理;王绍文|软件工程师,批量处理平台团队
Pinterest’s Batch Processing Platform, Monarch, runs most of the batch processing workflows of the company. At the scale shown in Table 1, it is important to manage the platform resources to provide quality of service (QoS) while achieving cost efficiency. This article shares how we do that and future work.
Pinterest的批处理平台,Monarch,运行着公司的大部分批处理工作流程。在表1所示的规模下,管理平台资源以提供服务质量(QoS),同时实现成本效益是很重要的。这篇文章分享了我们如何做到这一点以及未来的工作。

Table 1: Scale of Monarch Batch Processing Platform
表1:蒙牛批处理平台的规模
Introduction of Monarch
君主的介绍
Figure 1 shows what Pinterest’s data system looks like at a high level. When users are using Pinterest applications on their mobile or desktop devices, they generate various logs that are ingested to our system via Singer + Kafka (see Scalable and reliable data ingestion at Pinterest) and the resulting data is stored to S3. Then the data is processed and analyzed by various workflows like sanitization, analytics, and machine learning data preparation. The results of the workflows are typically stored back to S3. There are essentially two types of processing platforms: batch and streaming. This blog is about the batch processing platform named Monarch. See this blog for more information about the streaming platform.
图1显示了Pinterest的数据系统的高层次面貌。当用户在移动或桌面设备上使用Pinterest的应用程序时,他们会产生各种日志,并通过Singer+ Kafka摄入到我们的系统中(见Pinterest的可扩展和可靠的数据摄入),所产生的数据被存储到S3。然后通过各种工作流程对数据进行处理和分析,如消毒、分析和机器学习数据准备。工作流程的结果通常被存储回S3。基本上有两种类型的处理平台:批处理和流处理。这篇博客是关于名为Monarch的批处理平台。关于流媒体平台的更多信息,请参见本博客。
As an in-house big data platform, Monarch provides the infrastructure, services, and tools to help users develop, build, deploy, and troubleshoot their batch processing applications (mostly in the form of workflows) at scale. Monarch consists of more than 20 Hadoop YARN clusters built entirely in the Cloud utilizing AWS EC2, and we use many different instance types offered by EC2. The actu...