使用Netflix Maestro和Apache Iceberg进行增量处理

by Jun He, Yingyi Zhang, and Pawan Dixit

by Jun He, Yingyi Zhang, and Pawan Dixit

Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset. This not only reduces the cost of compute resources but also reduces the execution time in a significant manner. When workflow execution has a shorter duration, chances of failure and manual intervention reduce. It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns.

增量处理是一种处理工作流中的新数据或更改数据的方法。其主要优势在于,它仅增量处理新增或更新到数据集中的数据,而不是重新处理整个数据集。这不仅减少了计算资源的成本,还显著减少了执行时间。当工作流执行时间较短时,故障和手动干预的机会减少。它还通过简化现有流水线和解锁新模式来提高工程效率。

In this blog post, we talk about the landscape and the challenges in workflows at Netflix. We will show how we are building a clean and efficient incremental processing solution (IPS) by using Netflix Maestro and Apache Iceberg. IPS provides the incremental processing support with data accuracy, data freshness, and backfill for users and addresses many of the challenges in workflows. IPS enables users to continue to use the data processing patterns with minimal changes.

在这篇博文中,我们将讨论Netflix工作流中的情况和挑战。我们将展示如何通过使用Netflix Maestro和Apache Iceberg构建一个干净高效的增量处理解决方案(IPS)。IPS为用户提供了增量处理支持,具有数据准确性、数据新鲜度和回溯功能,并解决了许多工作流中的挑战。IPS使用户能够继续使用数据处理模式,而只需进行最小的更改。

Introduction

介绍

Netflix relies on data to power its business in all phases. Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. As our business scales globally, the demand for data is growing and the needs for scalable low latency incremental processing begin to emerge. There are three common issues that the dataset owners usually face.

Netflix依靠数据来推动其业务的各个阶段。无论是在分析A/B测试、优化制片工作、训练算法、投资内容采购、检测安全漏洞还是优化支付方面,良好结构化和准确的数据都是基础。随着我们的业务全球扩张,对数据的...

开通本站会员,查看完整译文。

Home - Wiki
Copyright © 2011-2024 iteam. Current version is 2.129.0. UTC+08:00, 2024-07-02 13:53
浙ICP备14020137号-1 $Map of visitor$