用Dataflow进行数据管道资产管理

by Sam Setegne, Jai Balani, Olek Gorajek

作者:Sam Setegne, Jai Balani, Olek Gorajek

  • asset — any business logic code in a raw (e.g. SQL) or compiled (e.g. JAR) form to be executed as part of the user defined data pipeline.
  • 资产- 以原始(如SQL)或编译(如JAR)形式的任何业务逻辑代码,作为用户定义的数据管道的一部分被执行。
  • data pipeline — a set of tasks (or jobs) to be executed in a predefined order (a.k.a. DAG) for the purpose of transforming data using some business logic.
  • 数据管线- 一组按预定顺序执行的任务(或工作)(又称DAG),目的是使用一些业务逻辑来转换数据。
  • Dataflow — Netflix homegrown CLI tool for data pipeline management.
  • 数据流- Netflix自制的CLI工具,用于数据管道管理。
  • job — a.k.a task, an atomic unit of data transformation logic, a non-separable execution block in the workflow chain.
  • job- 又称任务,是数据转换逻辑的原子单元,是工作流链中不可分割的执行块。
  • namespace — unique label, usually representing a business subject area, assigned to a workflow asset to identify it across all other assets managed by Dataflow (e.g. security).
  • 名称空间- 唯一的标签,通常代表一个业务主题领域,分配给一个工作流资产,以便在Dataflow管理的所有其他资产中识别它(例如安全)。
  • workflow — see “data pipeline”
  • 工作流程- 见 "数据管道"

The problem of managing scheduled workflows and their assets is as old as the use of cron daemon in early Unix operating systems. The design of a cron job is simple, you take some system command, you pick the schedule to run it on and you are done. Example:

管理预定的工作流及其资产的问题与早期Unix操作系统中使用的cron daemon一样古老。cron工作的设计很简单,你采取一些系统命令,你选择时间表来运行它,你就完成了。例子。

In the above example the system would wake up every Monday morning and execute the backup.sh script. Simple right? But what if the script does not exist in the given path, or what if it existed initially but then Alice let Bob access her home directory and he accidentally deleted it? Or what if Alice wanted to add new backup functionality and she accidentally broke existing code while updating it?

在上面的例子中,系统会在每周一早上醒来,执行backup.sh脚本。很简单吧?但是,如果这个脚本在给定的路径中不存在,或者它最初存在,但后来Alice让Bob访问她的主目录,他不小心删除了它,怎么办?或者,如果Alice想增加新的备份功能,而她在更新时不小心破坏了现有的代码呢?

The answers to these questions is something we would like to address in this article and propose a clean solution to this prob...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.139.0. UTC+08:00, 2025-01-10 04:52
浙ICP备14020137号-1 $mapa de visitantes$