为什么我们放弃了 Flink Table API Joins:使用 DataStream Unions 将 State 减少 75%

The beauty of a high-level abstraction is that it lets you focus on the "what" rather than the "how." In the world of Apache Flink, the Table API and SQL represent this convenience very well. You write a simple join statement, and the query optimizer handles the heavy lifting. It feels like magic - until that magic starts costing you thousands of dollars in AWS bills and crashing your clusters every time a snapshot is triggered.
高层抽象的美妙之处在于,它让你专注于“什么”而不是“如何”。在 Apache Flink 世界中,Table API and SQL 很好地体现了这种便利。你写一个简单的 join 语句,查询优化器处理繁重的工作。感觉像魔法——直到这种魔法开始让你在 AWS 账单上花费数千美元,并且每次触发快照时崩溃你的集群。
This is exactly what we faced with our Product Offer Enrichment applications at Zalando. What began as an elegant, declarative solution eventually started crumbling under the weight of its own state. By moving from the "magic" of SQL to the manual control of the DataStream API and a custom MultiStreamJoinProcessor we managed to decrease our state size from 240GB to 56GB, a 75% improvement.
这正是我们在 Zalando 的 Product Offer Enrichment 应用中遇到的状况。原本优雅的声明式解决方案最终开始在自身状态的重量下崩溃。通过从 SQL 的“魔法”转向 DataStream API 的手动控制以及自定义的 MultiStreamJoinProcessor,我们成功将状态大小从 240GB 减少到 56GB,提升了 75%。
Here is the deep dive into why Flink SQL state accumulates, how joins actually work under the hood, and how we rewrote our job to save the applications.
这里深入探讨了为什么 Flink SQL 状态会累积、联接在底层如何实际工作,以及我们如何重写作业来拯救应用程序。
Disclaimer: this article is about Flink 1.20, which is the only version of Flink currently (Feb 2026) available on AWS Managed Flink.
免责声明:本文讨论的是 Flink 1.20,这是目前(2026 年 2 月)在 AWS Managed Flink 上可用的唯一 Flink 版本。
The Initial Architecture: The Attraction of SQL
初始架构:SQL 的吸引力
Our Product Offer Enrichment pipeline is a critical piece of the Zalando Search and Browse ecosystem. It is responsible for joining multiple streams of differing speed and "weight", including data about pricing and stock offers from partners, sorting metadata we call Boost, sponsored products metadata, and product data - to create a unified, enriched view of what a customer sees on the site when ...