在Vimeo的dbt发展

Do you work with data? Then you’re likely familiar with dbt, a tool that aims to “let data teams work like software engineers” by enabling engineers and analysts to define data transformations with simple SQL.

你是做数据工作的吗?那么你可能熟悉dbt,一个旨在 "让数据团队像软件工程师一样工作 "的工具,使工程师和分析师能够用简单的SQL定义数据转换。

This blog post is for anyone who is interested in or sold on dbt Cloud, the SaaS platform offered by dbt Labs that wraps features of dbt with a web interface and provides additional tools such as job scheduling, Slack notifications, and GitHub integration. You may be in the midst of evaluating the product or just researching better ways to work with data at your company. Whichever bucket you fall into, you’ll want to get a sense of how much engineering effort is required to set up dbt with distinct environments and Continuous Integration testing.

这篇博文是为那些对dbt云感兴趣或已售出的人而写的,dbt实验室提供的SaaS平台将dbt的功能用网络界面包装起来,并提供额外的工具,如工作调度、Slack通知和GitHub集成。你可能正在评估该产品,或者只是在研究如何在你的公司以更好的方式处理数据。无论你属于哪种情况,你都会想了解设置DBT与不同环境和持续集成测试需要多少工程努力。

In this post, I share how we on the Data Engineering team at Vimeo develop with dbt and how it compares to our previous workflow.

在这篇文章中,我分享了我们Vimeo的数据工程团队是如何使用dbt进行开发的,以及它与我们以前的工作流程相比有何不同。

Understanding data at Vimeo

在Vimeo了解数据

At Vimeo, we ingest data from dozens of data sources into our Snowflake data warehouse. With an abundance of raw data available, data modeling is critical in enabling us to understand the business, track key performance metrics, and make projections. It’s therefore important to understand the lineage, or relationships, of our data models and to version them via source control.

在Vimeo,我们从几十个数据源摄取数据到我们的Snowflake数据仓库。由于有大量的原始数据可用,数据建模对于我们了解业务、跟踪关键性能指标和进行预测至关重要。因此,了解我们的数据模型的脉络,或关系,并通过源控制对其进行版本控制是非常重要的。

Most of our ETL and ELT pipelines are orchestrated by Apache Airflow, while business logic is generally stored as SQL files in source control and executed with SnowSQL. Incremental loading, slowly changing dimensions, and backfilling are all implemented manually — we have even created Jinja templates for incre...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.1. UTC+08:00, 2024-05-17 14:18
浙ICP备14020137号-1 $访客地图$