Pinterest层次化存储Apache Kafka®️:一种解耦代理的方法

Jeff Xiang | Senior Software Engineer, Logging Platform; Vahid Hashemian | Staff Software Engineer, Logging Platform

Jeff Xiang | 高级软件工程师,日志平台;Vahid Hashemian | 高级软件工程师,日志平台

When it comes to PubSub solutions, few have achieved higher degrees of ubiquity, community support, and adoption than Apache Kafka®️, which has become the industry standard for data transportation at large scale. At Pinterest, petabytes of data are transported through PubSub pipelines every day, powering foundational systems such as AI training, content safety and relevance, and real-time ad bidding, bringing inspiration to hundreds of millions of Pinners worldwide. Given the continuous growth in PubSub-dependent use cases and organic data volume, it became paramount that PubSub storage must be scaled to meet growing storage demands while lowering the per-unit cost of storage.

在发布-订阅解决方案中,很少有比Apache Kafka®更具普及性、社区支持和采用度更高的解决方案,它已成为大规模数据传输的行业标准。在Pinterest,每天通过发布-订阅管道传输的数据量达到了PB级,为AI训练、内容安全和相关性以及实时广告竞价等基础系统提供动力,为全球数亿用户带来灵感。鉴于发布-订阅依赖的用例和有机数据量的持续增长,必须扩展发布-订阅存储以满足不断增长的存储需求,同时降低存储的单位成本变得至关重要。

Tiered Storage is a design pattern that addresses this problem by offloading data typically stored on broker disk to a cheaper remote storage, such as Amazon S3®️. This allows the brokers themselves to keep less data on expensive local disks, reducing the overall storage footprint and cost of PubSub clusters. MemQ is a PubSub solution that maximally employs this design pattern by keeping all data in object storage, eliminating the need for local disk storage to decouple storage from compute.

Tiered Storage是一种设计模式,通过将通常存储在代理磁盘上的数据转移到更便宜的远程存储(如Amazon S3®️)来解决这个问题。这使得代理自身在昂贵的本地磁盘上保留较少的数据,从而减少PubSub集群的整体存储占用和成本。MemQ是一种PubSub解决方案,通过将所有数据保存在对象存储中,最大程度地利用了这种设计模式,消除了对本地磁盘存储的需求,将存储与计算解耦。

KIP-405 adopts the Tiered Storage design pattern for open-source Kafka (available in Kafka 3.6.0+). It details a broker-coupled implementation, which natively integrates Tiered Storage functionality into the broker process itself.

KIP-405采用了分层存储设计模式用于开源Kafka(在Kafka 3.6.0+中可用)。它详细介绍了一种与...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.139.0. UTC+08:00, 2025-01-10 04:28
浙ICP备14020137号-1 $mapa de visitantes$