ML事实库的演变
At Netflix, we aim to provide recommendations that match our members’ interests. To achieve this, we rely on Machine Learning (ML) algorithms. ML algorithms can be only as good as the data that we provide to it. This post will focus on the large volume of high-quality data stored in Axion — our fact store that is leveraged to compute ML features offline. We built Axion primarily to remove any training-serving skew and make offline experimentation faster. We will share how its design has evolved over the years and the lessons learned while building it.
在Netflix,我们的目标是提供符合我们会员兴趣的推荐。为了实现这一目标,我们依靠机器学习(ML)算法。ML算法的好坏取决于我们提供给它的数据。这篇文章将重点介绍存储在Axion中的大量高质量数据--我们的事实存储被用来离线计算ML特征。我们建立Axion主要是为了消除任何训练服务的偏差,使离线实验更快。我们将分享它的设计多年来的发展,以及在建立它时学到的教训。
Terminology
术语
Axion fact store is part of our Machine Learning Platform, the platform that serves machine learning needs across Netflix. Figure 1 below shows how Axion interacts with Netflix’s ML platform. The overall ML platform has tens of components, and the diagram below only shows the ones that are relevant to this post. To understand Axion’s design, we need to know the various components that interact with it.
Axion事实存储是我们机器学习平台的一部分,该平台为整个Netflix的机器学习需求服务。下图1显示了Axion如何与Netflix的ML平台互动。整个ML平台有几十个组件,下面的图只显示了与本篇文章相关的组件。为了理解Axion的设计,我们需要了解与之互动的各种组件。
Figure 1: Netflix ML Architecture
图1:Netflix的ML架构
- Fact: A fact is data about our members or videos. An example of data about members is the video they had watched or added to their My List. An example of video data is video metadata, like the length of a video. Time is a critical component of Axion — When we talk about facts, we talk about facts at a moment in time. These facts are managed and made available by services like viewing history or video metadata services outside of Axion.
- 事实:一个事实是关于我们的会员或视频的数据。关于会员的数据的一个例子是他们观看过的视频或添加到他们的 "我的名单 "中。视频数据的一个例子是视频元数据,如视频的长度。时间是Axion的一个重要组成部分--当我们谈论事实时,我们谈论的是某个时间点的事实。这些事实由Axion之外的服务,如观看历史或视频元数据服务来管理和提供。
- Compute application: These applications generate recommenda...