为了在Netflix实现长期会员满意度而推荐
By Jiangwei Pan, Gary Tang, Henry Wang, and Justin Basilico
由Jiangwei Pan,Gary Tang,Henry Wang和Justin Basilico撰写
Introduction
介绍
Our mission at Netflix is to entertain the world. Our personalization algorithms play a crucial role in delivering on this mission for all members by recommending the right shows, movies, and games at the right time. This goal extends beyond immediate engagement; we aim to create an experience that brings lasting enjoyment to our members. Traditional recommender systems often optimize for short-term metrics like clicks or engagement, which may not fully capture long-term satisfaction. We strive to recommend content that not only engages members in the moment but also enhances their long-term satisfaction, which increases the value they get from Netflix, and thus they’ll be more likely to continue to be a member.
Netflix的使命是娱乐全世界。我们的个性化算法在为所有会员推荐适时的节目、电影和游戏方面发挥着关键作用。这个目标超越了即时参与;我们的目标是创造一种能给会员带来持久享受的体验。传统的推荐系统通常优化短期指标,如点击或参与度,这些指标可能无法完全捕捉到长期满意度。我们努力推荐不仅在当下吸引会员的内容,还能增强他们的长期满意度,从而增加他们从Netflix获得的价值,因此他们更有可能继续成为会员。
Recommendations as Contextual Bandit
推荐作为上下文强盗
One simple way we can view recommendations is as a contextual bandit problem. When a member visits, that becomes a context for our system and it selects an action of what recommendations to show, and then the member provides various types of feedback. These feedback signals can be immediate (skips, plays, thumbs up/down, or adding items to their playlist) or delayed (completing a show or renewing their subscription). We can define reward functions to reflect the quality of the recommendations from these feedback signals and then train a contextual bandit policy on historical data to maximize the expected reward.
我们可以将推荐视为一种上下文强盗问题的简单方式。当会员访问时,这成为我们系统的上下文,并选择要显示的推荐动作,然后会员提供各种类型的反馈。这些反馈信号可以是即时的(跳过、播放、赞/踩或将项目添加到播放列表)或延迟的(完成节目或续订订阅)。我们可以定义奖励函数来反映这些反馈信号的推荐质量,然后在历史数据上训练上下文强盗策略以最大化预期奖励。
Improving Recommendations: Models and Objectives
改进推荐:模型和目标
There are many ways that a recommendation model can be improved. They may come from more informative...