Spinner:Pinterest的工作流程平台

Ace Haidrey | Software Engineer, Workflow; Ashim Shrestha | Site Reliability Engineer, Workflow; Dinghang Yu | Software Engineer, Workflow; Euccas Chen | Software Engineer, Workflow; Evan Li | Engineering Manager, Workflow; Hannah Chen | Product Manager, Workflow; Yulei Li | Software Engineer, Workflow

Ace Haidrey | 工作流软件工程师; Ashim Shrestha | 工作流网站可靠性工程师; Dinghang Yu | 工作流软件工程师; Euccas Chen | 工作流软件工程师; Evan Li | 工作流工程经理; Hannah Chen | 工作流产品经理; Yulei Li | 工作流软件工程师

This article is a repost from the author’s original account here.

本文转自作者的原始账户 这里.

Four large circles in a horizontal line. The first circle has text that states 10+ clusters, the second states 4K+ total flows, the third states 10K+ daily flow execution, the final circle states 38K+ Daily Job Execution

Workflow Scale at Pinterest Before Migration to Airflow

迁移到Airflow之前Pinterest的工作流规模

Since its inception, Pinterest’s philosophy has always been centered around data. As a data driven company, that means all data ingested is stored for further use. This looks like 600 terabytes of new data every day, encompassing over 500 petabytes of total data. At this scale, big data tooling plays a critical role in enabling our company to gather meaningful insights. This is where the workflow team comes in. We help facilitate over 4000 workflows, which produce 10,000 daily flow executions and 38,000 daily job executions on average.

自成立以来,Pinterest的理念一直以数据为中心。作为一个数据驱动的公司,这意味着所有摄入的数据都被储存起来,以便进一步使用。这看起来像每天600兆字节的新数据,包含了超过500兆字节的总数据。在这种规模下,大数据工具在使我们公司收集有意义的见解方面发挥着关键作用。这就是工作流程团队的作用。我们帮助促进4000多个工作流程,平均每天产生10,000个流程执行和38,000个工作执行。

Background

背景介绍

Back in 2013, Pinterest built an in-house scheduler framework named Pinball. This solution suited the company’s needs at that time, but it was not able to scale up with increasing requirements to serve other products and services both internally and externally. The following limitations became increasingly apparent:

早在2013年,Pinterest建立了一个名为Pinball的内部调度器框架。这个解决方案适合公司当时的需求,但它无法随着内部和外部对其他产品和服务要求的增加而扩大规模。以下的限制变得越来越明显。

  • Performance:
    – the schedule/job start delay time (the time between when a job is scheduled to begin and when it actually begins) was higher than desired.

    性能
    - 计划/工作开始的延迟时间(工作计划开始与实际开始之间的时间)高于预期。

  • Scalability:– the components of the system are...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.1. UTC+08:00, 2024-05-17 13:12
浙ICP备14020137号-1 $访客地图$