Psyberg:自动化的端到端追赶
By Abhinaya Shetty, Bharath Mummadisetty
由Abhinaya Shetty,Bharath Mummadisetty撰写
This blog post will cover how Psyberg helps automate the end-to-end catchup of different pipelines, including dimension tables.
本博客文章将介绍Psyberg如何帮助自动化不同流水线的端到端追赶,包括维度表。
In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Now, let’s explore the state of our pipelines after incorporating Psyberg.
在本系列的前几篇文章中,我们介绍了Psyberg并深入探讨了其核心操作模式:无状态和有状态数据处理。现在,让我们在整合Psyberg后探索我们的流水线的状态。
Pipelines After Psyberg
Psyberg后的流水线
Let’s explore how different modes of Psyberg could help with a multistep data pipeline. We’ll return to the sample customer lifecycle:
让我们探索Psyberg的不同模式如何帮助多步骤数据流水线。我们将回到示例客户生命周期:
Processing Requirement:
Keep track of the end-of-hour state of accounts, e.g., Active/Upgraded/Downgraded/Canceled.
处理要求:
跟踪账户的小时结束状态,例如,活跃/升级/降级/取消。
Solution:
One potential approach here would be as follows
解决方案:
这里可能的一种方法如下
-
Create two stateless fact tables :a. Signups
创建两个无状态的事实表:a. 注册
b. Account Plans
b. 账户计划
-
Create one stateful fact table:
a. Cancels创建一个有状态的事实表:
a. 取消 -
Create a stateful dimension that reads the above fact tables every hour and derives the latest account state.
创建一个有状态维度,每小时读取上述事实表,并推导出最新的账户状态。
Let’s look at how this can be integrated with Psyberg to auto-handle late-arriving data and corresponding end-to-end data catchup.
让我们看看如何将其与Psyberg集成,以自动处理迟到的数据和相应的端到端数据追赶。
Navigating the Workflow: How Psyberg Handles Late-Arriving Data
导航工作流程:Psyberg如何处理迟到的数据
We follow a generic workflow structure for both stateful and stateless processing with Psyberg; this helps maintain consistency and makes debugging and understanding these pipelines easier. The following is a concise overview of the various stages involved; for a more detailed exploration of the workflow specifics, please turn to the second installment of this series.
我们在Psyberg中为有状态和无状态处理都遵循通用的工作流结构;这有助于保持一致性,并使调试和理解这些流水线更容易。以下是涉及的各个阶段的简要概述;要详细了解工作流程的具体内容,请参阅本系列的第二部分。