我们如何为数据集开发构建一个稳健的生态系统

At Duolingo, we are big believers in making data-driven decisions. It’s not just people with statistics backgrounds who work with data here. On any given day, a Data Scientist, Product Manager, Learning Scientist, Marketing Analyst, or Software Engineer might be performing complex analyses to understand how to improve our learners’ app experience.

在 Duolingo,我们坚信数据驱动的决策。这里并非只有统计学背景的人才与数据打交道。在任何一天,数据科学家、产品经理、学习科学家、营销分析师或软件工程师都可能进行复杂分析,以了解如何改善学习者的应用体验。

We collect various pieces of data about how learners interact with the app (e.g., when a learner completes an exercise, or when a learner accepts a Friend Streak invite), and with tens of millions of learners using the app every day, we have large amounts of data on our hands! At Duolingo, the Data Refinery team builds tooling and infrastructure for data modeling, helping take raw data and shaping it to a cleaner structure that makes it easier to gather meaningful metrics and insights. In addition to Data Scientists, we staffed this team with Software Engineers who were brand new to the data analytics space. This naturally led us to ask “What lessons from developing backend services can we apply to developing datasets?” It turns out, quite a lot!

我们收集关于学习者如何与应用互动的各种数据(例如,当学习者完成练习,或接受好友连胜邀请时),每天数千万学习者使用应用,我们手上有大量数据!在 Duolingo,Data Refinery 团队构建数据建模的工具和基础设施,帮助将原始数据塑造成更清晰的结构,从而更容易收集有意义的指标洞察。除了数据科学家,我们还让刚进入数据分析领域的软件工程师加入该团队。这自然引出了一个问题:“开发后端服务的经验中有哪些可以应用于开发数据集?”结果证明,非常多!

Modeling data isn’t so different from designing an API – you set up code to take in some input data, perform some validation and computation, and produce output data in a form that’s easy to work with.

建模数据与设计 API 并没有太大不同——你编写代码接收输入数据,执行验证和计算,然后以易于使用的形式输出数据。

A diagram representing the data modeling process, with an example. The diagram shows that two tables of "Raw Data" can be combined through the "Data Modeling Layer" to form a single, cleaned table that is "Modeled Data."

This understanding led us to build out a data modeling developer experience that mirrors the process of building an engineering system.

基于这一理解,我们构建了一套数据建模开发者体验,其流程与构建工程系统类似。

Conventions should be automatically enforceable

约定应该可以自动强制执行

Code linting comes in handy for setting and enforcing standards, especially when multiple people contribute to a codeb...

开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2025 iteam. Current version is 2.146.0. UTC+08:00, 2025-10-20 18:44
浙ICP备14020137号-1 $お客様$