气流智能传感器服务
Consolidating long-running, lightweight tasks for improved resource utilization
整合长期运行的轻量级任务以提高资源利用率
By: Yingbo Wang, Kevin Yang
Introduction
简介
Airflow is a platform to programmatically author, schedule, and monitor data pipelines. A typical Airflow cluster supports thousands of workflows, called DAGs (directed acyclic graphs), and there could be tens of thousands of concurrently running tasks at peak hours. Back in 2018, Airbnb’s Airflow cluster had several thousand DAGs and more than 30 thousand tasks running at the same time. This amount of workload would often result in Airflow’s database being overloaded. It also made the cluster quite expensive since it required a lot of resources to support those concurrent tasks.
Airflow是一个以编程方式编写、安排和监控数据管道的平台。一个典型的Airflow集群支持数以千计的工作流程,称为DAG(有向无环图),在高峰期可能有数以万计的任务同时运行。早在2018年,Airbnb的Airflow集群有几千个DAG和超过3万个任务在同时运行。这种工作量经常会导致Airflow的数据库过载。这也使得集群相当昂贵,因为它需要大量的资源来支持这些并发的任务。
In order to make the system more stable, and to reduce the cost of the cluster, we looked to optimize the Airflow system. We soon found that the long-running lightweight (LRLW) tasks waste a lot of resources, so we proposed a Smart Sensor to consolidate them and address the waste.
为了使系统更加稳定,并降低集群的成本,我们寻求优化Airflow系统。我们很快发现,长期运行的轻量级(LRLW)任务浪费了大量的资源,所以我们提出了一个智能传感器来整合它们,解决浪费问题。
Long-Running Lightweight Tasks
长期运行的轻量级任务
When we investigated the Airflow performance issues, we found that a few kinds of tasks shared the same LRLW patterns. They are the sensor tasks, the subDAGs, and the SparkSubmitOperator.
当我们调查Airflow的性能问题时,我们发现有几类任务共享相同的LRLW模式。它们是传感器任务、subDAGs和SparkSubmitOperator。
Sensors, or sensor tasks, are a special kind of operator that will keep running until a certain criterion is met. The criterion can be a file landing in HDFS or S3, a partition appearing in Hive, whether some other external task succeeded, or even if it is a specific time of the day.
传感器,或称传感器任务,是一种特殊的操作者,它将持续运行,直到满足某个标准。这个标准可以是HDFS或S3中的一个文件着陆,Hive中出现的一个分区,其他一些外部任务是否成功,甚至是一天中的一个特定时间。
Figure 1. The lifespan of a sensor task
图1.一个...