Uber的Kafka分层存储介绍
Apache Kafka® is the cornerstone of Uber’s tech stack. It plays an important role in powering several critical use cases and is the foundation for batch and real-time systems at Uber.
Apache Kafka® 是 Uber 技术栈的基石。它在推动几个关键用例方面发挥着重要作用,并且是 Uber 批处理和实时系统的基础。
Figure 1: Uber’s Data Pipeline.
图1:Uber的数据管道。
Kafka stores the messages in append-only log segments on the broker’s local storage. Each topic can be configured with the targeted retention based on size or time. It gives guarantees for users to consume the data within the retention period or size even when the respective consuming applications fail or become slow for several reasons. Total storage on a cluster depends upon factors like the total number of topic partitions, produce throughput, and retention configuration. A Kafka broker typically needs to have larger storage to support the required topic partitions hosted on a broker.
Kafka将消息存储在代理的本地存储中的追加日志段中。每个主题可以根据大小或时间配置目标保留。即使相应的消费应用由于多种原因而失败或变慢,它也可以保证用户在保留期或大小内消费数据。集群上的总存储量取决于诸如主题分区总数、生产吞吐量和保留配置等因素。Kafka代理通常需要更大的存储空间来支持托管在代理上的所需主题分区。
Kafka cluster storage is typically scaled by adding more broker nodes to the cluster. But this also adds needless memory and CPUs to the cluster, making overall storage cost less efficient compared to storing the older data in external storage. A larger cluster with more nodes also adds to the deployment complexity and increases operational costs because of the tight coupling of storage and processing. So, it brings several issues related to scalability, efficiency, and operations.
Kafka集群存储通常通过向集群添加更多的代理节点来进行扩展。但这也会增加集群的不必要的内存和CPU,使整体存储成本比将旧数据存储在外部存储中效率低下。更大的集群和更多的节点还会增加部署复杂性,并增加由于存储和处理的紧密耦合而导致的运营成本。因此,它带来了与可扩展性、效率和操作相关的几个问题。
We proposed Kafka Tiered Storage (KIP-405) to avoid tight coupling of storage and processing in a broker. It provides two tiers of storage, called local and remote. These two tiers can have respective retention policies based on the respective use cases.
我们提出了 Kafka 分层存储 (KIP-405),以避免经纪人中存储和处理的紧密耦合。它提供了两个存储层,称为本地和远程。这两个层可以根据各自的用例具有相应的保留策略。
Figure 2: End t...