中间件与数据库:Kafka

Setting Up Kafka Multi-Tenancy

Discover how DoorDash implemented a multi-tenancy awareness system for both Kafka producers and consumers.

Introduction to Kafka Tiered Storage at Uber

Uber提出了Kafka分层存储方案,用于解决Kafka集群存储的扩展性、效率和操作成本问题。该方案通过引入本地和远程两个存储层,实现了存储的扩展性和长期存储的可行性。远程存储层可以使用不同的扩展存储,并具有更长的数据保留期。分层存储方案减轻了Kafka代理的本地存储负担,降低了操作成本。此外,该方案通过引入RemoteLogManager和RemoteStorageManager等组件,实现了远程日志的复制、获取和删除操作,并提供元数据的生命周期管理。分层存储方案的架构图展示了本地和远程日志的复制过程,以及远程日志的清理过程。跟随者副本需要复制可在领导者本地存储中找到的段,并在开始从领导者获取任何消息之前构建辅助数据。

Apache Kafka 在知乎的实践

知乎使用Kafka进行消息通知、日志传输和离线数据处理。面临的挑战是提升稳定性和可维护性,以及增加业务研发效率。文章介绍了采用k8s部署Kafka的关键技术,包括固定brok等。

小红书云原生 Kafka 技术剖析:分层存储与弹性伸缩

存储成本直降 60%,运维效率增至 10 倍,打造「弹性伸缩、按量付费」商品化模式

去哪儿KAFKA性能优化-节省2000核CPU

去哪儿旅行的Kafka日志集群在春节压测期间遇到性能问题,导致部分客户端堆积和数据生产异常。集群网络闲置率降低到0.4以下,部分机器接近闲置,无法通过增加机器解决性能问题。经排查,发现数据量增大和高峰期pod扩展导致网络链接数增加影响性能。通过将num.io.threads参数从32修改为128,优化了Kafka本身解决了问题,并节省了2000核CPU。此外,将单盘改为双盘并没有提升闲置率。

kafka-go消费者代码分析(一)

该文章介绍了在Go语言中使用kafka-go库创建消费者并进行数据消费的过程。重点讨论了重平衡的机制和实现。通过分析kafka-go库中的代码,展示了心跳机制的实现方式,并解释了当协调者通知消费者进行重平衡时,消费者如何暂停消费并重新加入消费组。重平衡期间,消费者会停止消费并重新分配分区。最后,消费者会重新创建协程进行数据的获取。这篇文章对于想要在Go语言中使用kafka-go库进行数据消费的开发者来说是非常有用的参考资料。

这些年背过的面试题——Kafka篇

本文是技术人面试系列Kafka篇,面试中关于Kafka都需要了解哪些基础?一文带你详细了解。

Kafka on Kubernetes: Reloaded for fault tolerance

Coban - Grab’s real-time data streaming platform - has been operating Kafka on Kubernetes with Strimzi in production for about two years. In a previous article (Zero trust with Kafka), we explained how we leveraged Strimzi to enhance the security of our data streaming offering.

In this article, we are going to describe how we improved the fault tolerance of our initial design, to the point where we no longer need to intervene if a Kafka broker is unexpectedly terminated.

Kafka 分级存储在腾讯云的实践与演进

本文介绍了一系列与微服务和消息队列相关的技术文章。其中包括云原生API网关支持WAF对象接入、Apache RocketMQ在腾讯云的实践、RocketMQ 5.X PopAck源码拆解等。该系列文章涵盖了多个技术领域,并提供了相关的详细信息和实践案例。

B站KAFKA探索与实践

Kafka 是我们公司各个部门的重要数据中间件,主要用于上报、暂存和分发各种数据。

Flink消费kafka数据同步问题排查

我们有一个flink任务,消费的kafka的数据,写入到es,非常简单的逻辑,但是出现了数据丢失的情况。

专为小白打造—Kafka一篇文章入门

Kafka 是MQ消息队列作为最常用的中间件之一,其主要特性有:解耦、异步、限流/削峰。

Kafka 和传统的消息系统(也称作消息中间件)都具备系统解耦、冗余存储、流量削峰、缓冲、异步通信、扩展性、可恢复性等功能。与此同时,Kafka 还提供了大多数消息系统难以实现的消息顺序性保障及回溯消费的功能。

Scaling Kafka to Support PayPal’s Data Growth

Apache Kafka is an open-source distributed event streaming platform that is used for data streaming pipelines, integration, and ingestion at PayPal. It supports our most mission-critical applications and ingests trillions of messages per day into the platform, making it one of the most reliable platforms for handling the enormous volumes of data we process every day.

To handle the tremendous growth of PayPal’s streaming data since its introduction, Kafka needed to scale seamlessly while ensuring high availability, fault tolerance, and optimal performance. In this blog post, we will provide a high-level overview of Kafka and discuss the steps taken to achieve high performance at scale while managing operational overhead, and our key learnings and takeaways.

揭秘eBay Kafka跨数据中心高可用方案

本文讨论了基于local-aggregation集群拓扑, 设计Kafka跨数据中心高可用方案的思路,同时支撑了上下游数据和服务的高可用和连续性。

Monitoring Apache Kafka with JMX Exporter and Kafka Exporter

At Mixpanel, we use Apache Kafka to ingest trillions of data points per month. Continuous and reliable monitoring of our Apache Kafka brokers is crucial to avoid any unexpected service degradation or loss of data.

Zero traffic cost for Kafka consumers

Coban, Grab’s real-time data streaming platform team, has been building an ecosystem around Kafka, serving all Grab verticals. Along with stability and performance, one of our priorities is also cost efficiency.

In this article, we explain how the Coban team has substantially reduced Grab’s annual cost for data streaming by enabling Kafka consumers to fetch from the closest replica.

Home - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-05 12:19
浙ICP备14020137号-1 $Map of visitor$