中间件与数据库:Kafka

如何更好地使用Kafka?

本文主要从Kafka消费、堆积、稳定性、预案、成本控制等角度等最佳实践。

新浪微博从 Kafka 到 Pulsar 的演变

新浪现有 Kafka 集群主要处理来自新浪新闻、微博等的数据,数据类型包括特征日志、订单数据、广告曝光、埋点 / 监控 / 服务日志等。这些数据经过 Kafka 在线集群、广告专用集群、日志集群、离线集群和机器学习训练等集群的处理后,会用于推荐训练、HDFS 落地、离线数仓、实时监控、数据报表和实时分析等生产目的。

Kafka 负载均衡在 vivo 的落地实践

Cruise Control作为Kafka的运维工具,它包含了Kafka服务上下线、集群内负载均衡、副本扩缩容、副本缺失修复以及节点降级等功能。

Kafka 万亿级消息实践之资源组流量掉零故障排查分析

本文是Kafak万亿消息实践中一次典型的故障进行详细分析和说明。深入到Kafka架构原理层分析故障出现的根因及对应的解决方案。

Presto® on Apache Kafka® At Uber Scale

Uber’s goal is to ignite opportunity by setting the world in motion, and big data is a very important part of that. Presto® and Apache Kafka® play critical roles in Uber’s big data stack. Presto is the de facto standard for query federation that has been used for interactive queries, near-real-time data analysis, and large-scale data analysis. Kafka is the backbone for data streaming that supports many use cases such as pub/sub, streaming processing, etc. In the following article we will discuss how we have connected these two important services together to enable a lightweight, interactive SQL query directly over Kafka via Presto at Uber scale.

Securing Kafka® Infrastructure at Uber

Uber has one of the largest deployments of Apache Kafka® in the world. It empowers a large number of real-time workflows at Uber, including pub-sub message buses for passing event data from the rider and driver apps, as well as financial transaction events between the backend services. As Kafka forms a critical component of Uber’s core workflows, it is important to secure the data being published and subscribed from the topics to maintain the integrity of the data and to provide an access control mechanism for who can publish/subscribe to a given topic.

How Kafka Connect helps move data seamlessly

Grab’s real-time data platform team (Coban) covers the importance of moving data in and out of Kafka easily and how Kafka Connect helps with that.

基于 Kafka 的实时数仓在搜索的实践应用

Apache Kafka 作为一个热门消息队列中间件,具备高效可靠的消息处理能力,且拥有非常广泛的应用领域。文章介绍基于 Kafka 的实时数仓在搜索的实践应用。

Exposing a Kafka Cluster via a VPC Endpoint Service

In large organisations, it is a common practice to isolate the cloud resources of different verticals. Amazon Web Services (AWS) Virtual Private Cloud (VPC) is a convenient way of doing so. At Grab, while our core AWS services reside in a main VPC, a number of Grab Tech Families (TFs) have their own dedicated VPC. One such example is GrabKios. Previously known as “Kudo”, GrabKios was acquired by Grab in 2017 and has always been residing in its own AWS account and dedicated VPC.

In this article, we explore how we exposed an Apache Kafka cluster across multiple Availability Zones (AZs) in Grab’s main VPC, to producers and consumers residing in the GrabKios VPC, via a VPC Endpoint Service. This design is part of Coban unified stream processing platform at Grab.

Kafka消息(存储)格式及索引组织方式

“ 要深入学习Kafka,理解Kafka的存储机制是非常重要的。本文介绍Kafka存储消息的格式以及数据文件和索引组织方式,以便更好的理解Kafka是如何工作的。”

Migrating Kafka transparently between Zookeeper clusters

Learn more about how to migrate your Kafka cluster from one Zookeeper cluster to another without any user impact.

10分钟带你玩转Kafka基于Controller的领导选举!

Controller,是Apache Kafka的核心组件非常重要。它的主要作用是在Apache Zookeeper的帮助下管理和协调控制整个Kafka集群。

在整个Kafka集群中,如果Controller故障异常,有可能会影响到生产和消费。所以,我们需要对其状态、选举、日志等做全面的监控。

Real-Time Exactly-Once Ad Event Processing with Apache Flink and Kafka

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we leveraged open source technology to build Uber’s first “near real-time” exactly-once events processing system. We’ll dive into the details of how we achieved exactly-once processing as well as the inner workings of our event processing jobs.

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workflows, including pub-sub message buses for passing event data from the rider and driver apps, streaming analytics (e.g., Apache Flink, Apache Samza), streaming database changelogs to the downstream subscribers, and ingesting all sorts of data into Uber’s Apache Hadoop data lake.

避坑指南:Kafka集群快速扩容的方案总结

熟悉Apache Kafka的同学都知道,当Kafka集群负载到达瓶颈或者出现突发流量需要紧急扩容时,新加入集群的节点需要经过数据迁移才能均分集群压力。而数据迁移会因为数据堆积量,节点负载等因素的影响,导致迁移时间较长,甚至出现迁移不动的情况。同时数据迁移也会增大当前节点的压力,可能导致集群进一步崩溃。本文将探讨应对需要紧急扩容的技术方案。

nsq(有赞分支)、kafka、rocketMq 架构浅析

消息队列是分布式系统中重要中间件,目前比较常见的产品有ActiveMQ,RabbitMQ,ZeroMQ,Kafka,RocketMQ,NSQ等。本文将其中对三款优秀消息中间件(nsq,kafka,rocketMq)的实现架构进行简单介绍~

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.0. UTC+08:00, 2024-05-06 11:22
浙ICP备14020137号-1 $访客地图$