Kafka在Kubernetes上:为容错性重新加载
Coban - Grab’s real-time data streaming platform - has been operating Kafka on Kubernetes with Strimzi in production for about two years. In a previous article (Zero trust with Kafka), we explained how we leveraged Strimzi to enhance the security of our data streaming offering.
Coban - Grab 的实时数据流平台 - 在生产环境中使用 Kafka 和 Kubernetes 以及 Strimzi 已经有大约两年的运行时间。在之前的一篇文章中 (Zero trust with Kafka),我们解释了如何利用 Strimzi 来增强我们的数据流服务的安全性。
In this article, we are going to describe how we improved the fault tolerance of our initial design, to the point where we no longer need to intervene if a Kafka broker is unexpectedly terminated.
在本文中,我们将描述如何改进我们的初始设计的容错性,以至于如果Kafka代理意外终止,我们不再需要干预。
Problem statement
问题陈述
We operate Kafka in the AWS Cloud. For the Kafka on Kubernetes design described in this article, we rely on Amazon Elastic Kubernetes Service (EKS), the managed Kubernetes offering by AWS, with the worker nodes deployed as self-managed nodes on Amazon Elastic Compute Cloud (EC2).
我们在AWS云中运行Kafka。对于本文中描述的Kubernetes上的Kafka设计,我们依赖于AWS提供的托管Kubernetes服务Amazon Elastic Kubernetes Service(EKS),其中工作节点部署为自管理节点在Amazon Elastic Compute Cloud(EC2)上。
To make our operations easier and limit the blast radius of any incidents, we deploy exactly one Kafka cluster for each EKS cluster. We also give a full worker node to each Kafka broker. In terms of storage, we initially relied on EC2 instances with non-volatile memory express (NVMe) instance store volumes for maximal I/O performance. Also, each Kafka cluster is accessible beyond its own Virtual Private Cloud (VPC) via a VPC Endpoint Service.
为了使我们的操作更加简便,并限制任何事故的影响范围,我们为每个EKS集群部署了一个Kafka集群。我们还为每个Kafka broker提供了一个完整的工作节点。在存储方面,我们最初依赖于具有非易失性内存扩展(NVMe)实例存储卷的EC2实例,以实现最大的I/O性能。此外,每个Kafka集群可以通过Virtual Private Cloud(VPC)之外的VPC Endpoint Service进行访问。
Fig. 1 Initial design of a 3-node Kafka cluster running on Kubernetes.
图1 Kubernetes上运行的3节点Kafka集群的初始设计。
Fig. 1 shows a logical view of our initial design of a 3-node Kafka on Kubernetes cluster, as typically run by Coban. The Zookeeper and Cruise-Control compon...