Pinterest 新一代 DB Ingestion Framework 中的自动化 Schema 演进
[
[
Pinterest Engineering Blog
Pinterest 工程博客
](https://medium.com/pinterest-engineering?source=post_page---publication_nav-4c5a5f6279b6-36c5c07070de---------------------------------------)
](https://medium.com/pinterest-engineering?source=post_page---publication_nav-4c5a5f6279b6-36c5c07070de---------------------------------------)
[
[

](https://medium.com/pinterest-engineering?source=post_page---post_publication_sidebar-4c5a5f6279b6-36c5c07070de---------------------------------------)
](https://medium.com/pinterest-engineering?source=post_page---post_publication_sidebar-4c5a5f6279b6-36c5c07070de---------------------------------------)
Inventive engineers building the first visual discovery engine, 300 billion ideas and counting.
富有创造力的工程师们正在构建首个视觉发现引擎,目前已汇聚 3000 亿个灵感,且数量仍在不断增加。
Yisheng Zhou | Software Engineer II
Liang Mou | Sr Staff Software Engineer
Gabriel Raphael Garcia Montoya | Staff Software Engineer
Istvan Podor | Staff Software Engineer
Yisheng Zhou | 二级软件工程师
Liang Mou | 高级 Staff 软件工程师
Gabriel Raphael Garcia Montoya | Staff 软件工程师
Istvan Podor | Staff 软件工程师

Introduction
简介
In the first post of this series, we introduced Pinterest’s next-generation CDC-based ingestion platform built on Kafka, Flink, Spark, and Iceberg. In production, upstream schemas are constantly evolving, and in a distributed CDC pipeline, schema is not just metadata — it is a cross-system contract spanning ingestion, transformation, storage, and historical backfill. A schema change that is not handled carefully can break Flink jobs, block Spark upserts, or create inconsistencies between online and offline representations.
在本系列的第一篇文章中,我们介绍了 Pinterest 基于 CDC 构建的下一代数据摄取平台,该平台基于 Kafka、Flink、Spark 和 Iceberg 打造。在生产环境中,上游模式不断演进,在分布式 CDC 管道中,模式不仅仅是元数据——它是跨越摄取、转换、存储和历史回填的跨系统契约。如果模式变更处理不当,可能会导致 Flink 作业中断、阻塞 Spark upsert 操作,或者造成在线和离线表示之间的不一致。
This post walks through how we make schema evolution safe in practice: the onboarding model it builds on, the changes we support and the tradeoffs we accept, how updates propagate across the stack,...