规模化运行Apache Airflow的经验之谈

By Megan Parker and Sam Wheating

作者:Megan Parker和Sam Wheating

Apache Airflow is an orchestration platform that enables development, scheduling and monitoring of workflows. At Shopify, we’ve been running Airflow in production for over two years for a variety of workflows, including data extractions, machine learning model training, Apache Iceberg table maintenance, and DBT-powered data modeling. At the time of writing, we are currently running Airflow 2.2 on Kubernetes, using the Celery executor and MySQL 8.

Apache Airflow是一个协调平台,可以实现工作流程的开发、调度和监控。在Shopify,我们已经在生产中运行了两年多的Airflow,用于各种工作流程,包括数据提取、机器学习模型训练、Apache Iceberg表维护和DBT驱动的数据建模。在撰写本文时,我们目前正在Kubernetes上运行Airflow 2.2,使用Celery执行器和MySQL 8。

System diagram showing Shopify's Airflow Architecture

Shopify’s Airflow Architecture

Shopify的气流架构

Shopify’s usage of Airflow has scaled dramatically over the past two years. In our largest environment, we run over 10,000 DAGs representing a large variety of workloads. This environment averages over 400 tasks running at a given moment and over 150,000 runs executed per day. As adoption increases within Shopify, the load incurred on our Airflow deployments will only increase. As a result of this rapid growth, we have encountered a few challenges, including slow file access, insufficient control over DAG (directed acyclic graph) capabilities, irregular levels of traffic, and resource contention between workloads, to name a few.

在过去的两年里,Shopify对Airflow的使用规模急剧扩大。在我们最大的环境中,我们运行了超过10,000个DAG,代表了大量不同的工作负载。这个环境中,平均有400多个任务在运行,每天有超过15万次运行。随着Shopify内部采用率的提高,我们的Airflow部署所产生的负载也会增加。由于这种快速增长,我们遇到了一些挑战,包括缓慢的文件访问,对DAG(有向无环图)能力的控制不足,不规则的流量水平,以及工作负载之间的资源争夺,仅此而已。

Below we’ll share some of the lessons we learned and solutions we built in order to run Airflow at scale.

下面我们将分享一些我们学到的经验和建立的解决方案,以便大规模地运行Airflow。

1. File Access Can Be Slow When Using Cloud Storage

1.使用云存储时,文件访问可能很慢

Fast file access is critical to the performance and integrity of an Airflow environment. A well defined strategy for file access ensures that the scheduler can process DAG files quickly and keep your jobs up-to-date.

快速的文件访问对Airfl...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-13 14:55
浙ICP备14020137号-1 $访客地图$