Kafka Connect如何帮助无缝移动数据

Grab’s real-time data platform team a.k.a. Coban has written about Plumbing at scale, Optimally scaling Kakfa consumer applications, and Exposing Kafka via VPCE. In this article, we will cover the importance of being able to easily move data in and out of Kafka in a low-code way and how we achieved this with Kafka Connect.

Grab的实时数据平台团队(又称Coban)曾经写过关于Plumbing at scale,Optimally scaling Kakfa consumer applications, 和Exposing Kafka via VPCE的文章。在这篇文章中,我们将介绍能够以低代码的方式轻松进出Kafka的重要性,以及我们是如何通过Kafka Connect实现的。

To build a NoOps managed streaming platform in Grab, the Coban team has:

为了在Grab中建立一个NoOps管理流媒体平台,Coban团队已经。

  • Engineered an ecosystem on top of Apache Kafka.
  • 在Apache Kafka的基础上设计了一个生态系统。
  • Successfully adopted it to production for both transactional and analytical use cases.
  • 成功地将其应用于生产中的交易和分析用例。
  • Made it a battle-tested industrial-standard platform.
  • 使之成为一个经过战斗考验的工业标准平台。

In 2021, the Coban team embarked on a new journey (Kafka Connect) that enables and empowers Grabbers to move data in and out of Apache Kafka seamlessly and conveniently.

2021年,Coban团队开始了新的旅程(Kafka Connect),使Grabbers能够无缝、方便地将数据移入和移出Apache Kafka。

Kafka Connect stack in Grab

Grab中的Kafka连接栈

This is what Coban’s Kafka Connect stack looks like today. Multiple data sources and data sinks, such as MySQL, S3 and Azure Data Explorer, have already been supported and productionised.

这就是Coban的Kafka Connect堆栈今天的样子。多个数据源和数据汇,如MySQL、S3和Azure Data Explorer,已经被支持和生产了。

Kafka Connect stack in Grab

The Coban team has been using Protobuf as the serialisation-deserialisation (SerDes) format in Kafka. Therefore, the role of Confluent schema registry (shown at the top of the figure) is crucial to the Kafka Connect ecosystem, as it serves as the building block for conversions such as Protobuf-to-Avro, Protobuf-to-JSON and Protobuf-to-Parquet.

Coban团队一直使用Protobuf作为Kafka的序列化-反序列化(SerDes)格式。因此,Confluent模式注册表(如图上方所示)的作用对Kafka Connect生态系统至关重要,因为它是Protobuf-to-Avro、Protobuf-to-JSON和Protobuf-to-Parquet等转换的构建块。

What problems are we trying to solve?

我们要解决的是什么问题?

Problem 1: Change Data Capture (CDC)

问题1:变更数据采集(CDC)

In a bi...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.139.2. UTC+08:00, 2025-01-25 13:29
浙ICP备14020137号-1 $mapa de visitantes$