中间件与数据库:Kafka
Kafka 万亿级消息实践之资源组流量掉零故障排查分析
本文是Kafak万亿消息实践中一次典型的故障进行详细分析和说明。深入到Kafka架构原理层分析故障出现的根因及对应的解决方案。
Presto® on Apache Kafka® At Uber Scale
Uber’s goal is to ignite opportunity by setting the world in motion, and big data is a very important part of that. Presto® and Apache Kafka® play critical roles in Uber’s big data stack. Presto is the de facto standard for query federation that has been used for interactive queries, near-real-time data analysis, and large-scale data analysis. Kafka is the backbone for data streaming that supports many use cases such as pub/sub, streaming processing, etc. In the following article we will discuss how we have connected these two important services together to enable a lightweight, interactive SQL query directly over Kafka via Presto at Uber scale.
Securing Kafka® Infrastructure at Uber
Uber has one of the largest deployments of Apache Kafka® in the world. It empowers a large number of real-time workflows at Uber, including pub-sub message buses for passing event data from the rider and driver apps, as well as financial transaction events between the backend services. As Kafka forms a critical component of Uber’s core workflows, it is important to secure the data being published and subscribed from the topics to maintain the integrity of the data and to provide an access control mechanism for who can publish/subscribe to a given topic.
How Kafka Connect helps move data seamlessly
Grab’s real-time data platform team (Coban) covers the importance of moving data in and out of Kafka easily and how Kafka Connect helps with that.
基于 Kafka 的实时数仓在搜索的实践应用
Apache Kafka 作为一个热门消息队列中间件,具备高效可靠的消息处理能力,且拥有非常广泛的应用领域。文章介绍基于 Kafka 的实时数仓在搜索的实践应用。
Exposing a Kafka Cluster via a VPC Endpoint Service
In large organisations, it is a common practice to isolate the cloud resources of different verticals. Amazon Web Services (AWS) Virtual Private Cloud (VPC) is a convenient way of doing so. At Grab, while our core AWS services reside in a main VPC, a number of Grab Tech Families (TFs) have their own dedicated VPC. One such example is GrabKios. Previously known as “Kudo”, GrabKios was acquired by Grab in 2017 and has always been residing in its own AWS account and dedicated VPC.
In this article, we explore how we exposed an Apache Kafka cluster across multiple Availability Zones (AZs) in Grab’s main VPC, to producers and consumers residing in the GrabKios VPC, via a VPC Endpoint Service. This design is part of Coban unified stream processing platform at Grab.
Kafka消息(存储)格式及索引组织方式
“ 要深入学习Kafka,理解Kafka的存储机制是非常重要的。本文介绍Kafka存储消息的格式以及数据文件和索引组织方式,以便更好的理解Kafka是如何工作的。”
Migrating Kafka transparently between Zookeeper clusters
Learn more about how to migrate your Kafka cluster from one Zookeeper cluster to another without any user impact.
10分钟带你玩转Kafka基于Controller的领导选举!
Controller,是Apache Kafka的核心组件非常重要。它的主要作用是在Apache Zookeeper的帮助下管理和协调控制整个Kafka集群。
在整个Kafka集群中,如果Controller故障异常,有可能会影响到生产和消费。所以,我们需要对其状态、选举、日志等做全面的监控。
Real-Time Exactly-Once Ad Event Processing with Apache Flink and Kafka
Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we leveraged open source technology to build Uber’s first “near real-time” exactly-once events processing system. We’ll dive into the details of how we achieved exactly-once processing as well as the inner workings of our event processing jobs.
Enabling Seamless Kafka Async Queuing with Consumer Proxy
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workflows, including pub-sub message buses for passing event data from the rider and driver apps, streaming analytics (e.g., Apache Flink, Apache Samza), streaming database changelogs to the downstream subscribers, and ingesting all sorts of data into Uber’s Apache Hadoop data lake.
避坑指南:Kafka集群快速扩容的方案总结
熟悉Apache Kafka的同学都知道,当Kafka集群负载到达瓶颈或者出现突发流量需要紧急扩容时,新加入集群的节点需要经过数据迁移才能均分集群压力。而数据迁移会因为数据堆积量,节点负载等因素的影响,导致迁移时间较长,甚至出现迁移不动的情况。同时数据迁移也会增大当前节点的压力,可能导致集群进一步崩溃。本文将探讨应对需要紧急扩容的技术方案。
nsq(有赞分支)、kafka、rocketMq 架构浅析
消息队列是分布式系统中重要中间件,目前比较常见的产品有ActiveMQ,RabbitMQ,ZeroMQ,Kafka,RocketMQ,NSQ等。本文将其中对三款优秀消息中间件(nsq,kafka,rocketMq)的实现架构进行简单介绍~
Kafka万亿级消息实战
本文主要总结当kafka集群流量达到 万亿级记录/天或者十万亿级记录/天 甚至更高后,我们需要具备哪些能力才能保障集群高可用、高可靠、高性能、高吞吐、安全的运行。
我用kafka两年踩过的一些非比寻常的坑
我的上家公司是做餐饮系统的,每天中午和晚上用餐高峰期,系统的并发量不容小觑。为了保险起见,公司规定各部门都要在吃饭的时间轮流值班,防止出现线上问题时能够及时处理。
我当时在后厨显示系统团队,该系统属于订单的下游业务。用户点完菜下单后,订单系统会通过发kafka消息给我们系统,系统读取消息后,做业务逻辑处理,持久化订单和菜品数据,然后展示到划菜客户端。这样厨师就知道哪个订单要做哪些菜,有些菜做好了,就可以通过该系统出菜。系统自动通知服务员上菜,如果服务员上完菜,修改菜品上菜状态,用户就知道哪些菜已经上了,哪些还没有上。这个系统可以大大提高后厨到用户的效率。
接下来,我跟大家一起聊聊使用kafka两年时间踩过哪些坑。
滴滴开源Logi-KafkaManager 一站式Kafka监控与管控平台
LogI-KafkaManager脱胎于滴滴内部多年的Kafka运营实践经验,是面向Kafka用户、Kafka运维人员打造的共享多租户Kafka云平台。专注于Kafka运维管控、监控告警、资源治理等核心场景,经历过大规模集群、海量大数据的考验。内部满意度高达90%的同时,还与多家知名企业达成商业化合作。