我的数据在哪里--与 Flink Streaming 的 Kinesis 连接器的独特邂逅

For years now, Lyft has not only been a proponent of but also a contributor to Apache Flink. Lyft’s pipelines have evolved drastically over the years, yet, time and time again, we run into unique cases that stretch Flink to its breaking points — this is one of those times.

多年来,Lyft 不仅是 Apache Flink 的支持者,也是其贡献者。这些年来,Lyft 的管道已经发生了翻天覆地的变化,然而,我们一次又一次地遇到了让 Flink 达到极限的独特案例--这次就是其中之一。

Context

背景

While Lyft runs many streaming applications, the one specifically in question is a persistence job. Simply put, it streams data from Kinesis, performs some level of serializations and transformations, and writes to S3 every few minutes.

虽然 Lyft 运行着许多流应用程序,但其中一个是持久化作业。简单地说,它从 Kinesis 流式传输数据,执行一定程度的序列化和转换,并每隔几分钟写入 S3。

A kinesis pipeline reading from Kinesis and writing to S3

Flink pipeline for persisting data from Kinesis to S3.

将数据从 Kinesis 持久化到 S3 的 Flink 管道。

In this case, it persists a hefty majority of events generated at Lyft, occurring at a rate of 80 gigabytes per minute on average and running at a parallelism of 1800, which happens to be one of Lyft’s largest streaming jobs.

在本例中,它持续处理了 Lyft 产生的绝大多数事件,平均每分钟发生 80 千兆字节,并行度高达 1800,这恰好是 Lyft 最大的流作业之一。

Chapter 1: The Outage

第 1 章:停电

Let’s start at the end, shall we?

让我们从结尾开始,好吗?

Data Engineer: “Alert! My reports aren’t being generated! The upstream data is not available to generate them on!”

数据工程师"警报!我的报告没有生成!上游数据无法生成报告!"

Platform Engineer: “I’m on it! Looks like our streaming application to persist data is up and running, but I hardly see any data being written either!”

平台工程师:"我正在处理!看起来我们用于持久化数据的流应用程序已经启动并运行,但我也几乎没看到任何数据被写入!"

Like any good engineer would, we pulled out our runbooks and carefully performed the well-detailed steps:

像所有优秀的工程师一样,我们拿出运行手册,仔细地执行详细的步骤:

Platform Engineer: “Let me roll back our seemingly innocuous change we just deployed.”

平台工程师"让我回滚我们刚刚部署的看似无害的变更"。

Platform Engineer: “No luck.”

平台工程师"不走运"

Platform Engineer: “Ok, let me try turning it off and on again.”

平台工程师"好的,让我试试把它关掉再打开"。

Platform Engineer: “No luck.”

平台工程师"不走运"

**Platform Engineer: “**Ok, let me try performing a hard reset and we’ll backfill later.”

**平台工程师"**好吧...

开通本站会员,查看完整译文。

Home - Wiki
Copyright © 2011-2024 iteam. Current version is 2.129.0. UTC+08:00, 2024-07-04 15:07
浙ICP备14020137号-1 $Map of visitor$