Psyberg:自动化的端到端追赶

By Abhinaya Shetty, Bharath Mummadisetty

Abhinaya ShettyBharath Mummadisetty撰写

This blog post will cover how Psyberg helps automate the end-to-end catchup of different pipelines, including dimension tables.

本博客文章将介绍Psyberg如何帮助自动化不同流水线的端到端追赶,包括维度表。

In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Now, let’s explore the state of our pipelines after incorporating Psyberg.

在本系列的前几篇文章中,我们介绍了Psyberg并深入探讨了其核心操作模式:无状态和有状态数据处理。现在,让我们在整合Psyberg后探索我们的流水线的状态。

Pipelines After Psyberg

Psyberg后的流水线

Let’s explore how different modes of Psyberg could help with a multistep data pipeline. We’ll return to the sample customer lifecycle:

让我们探索Psyberg的不同模式如何帮助多步骤数据流水线。我们将回到示例客户生命周期:

Processing Requirement:
Keep track of the end-of-hour state of accounts, e.g., Active/Upgraded/Downgraded/Canceled.

处理要求:
跟踪账户的小时结束状态,例如,活跃/升级/降级/取消。

Solution:
One potential approach here would be as follows

解决方案:
这里可能的一种方法如下

  1. Create two stateless fact tables :a. Signups

    创建两个无状态事实表:a. 注册

    b. Account Plans

    b. 账户计划

  2. Create one stateful fact table:
    a. Cancels

    创建一个有状态事实表:
    a. 取消

  3. Create a stateful dimension that reads the above fact tables every hour and derives the latest account state.

    创建一个有状态维度,每小时读取上述事实表,并推导出最新的账户状态。

Let’s look at how this can be integrated with Psyberg to auto-handle late-arriving data and corresponding end-to-end data catchup.

让我们看看如何将其与Psyberg集成,以自动处理迟到的数据和相应的端到端数据追赶。

Navigating the Workflow: How Psyberg Handles Late-Arriving Data

导航工作流程:Psyberg如何处理迟到的数据

We follow a generic workflow structure for both stateful and stateless processing with Psyberg; this helps maintain consistency and makes debugging and understanding these pipelines easier. The following is a concise overview of the various stages involved; for a more detailed exploration of the workflow specifics, please turn to the second installment of this series.

我们在Psyberg中为有状态和无状态处理都遵循通用的工作流结构;这有助于保持一致性,并使调试和理解这些流水线更容易。以下是涉及的各个阶段的简要概述;要详细了解工作流程的具体内容,请参阅本系列的第二部分

1. Psyberg Initializa...

开通本站会员,查看完整译文。

Home - Wiki
Copyright © 2011-2024 iteam. Current version is 2.129.0. UTC+08:00, 2024-07-02 13:29
浙ICP备14020137号-1 $Map of visitor$