后训练生成推荐系统与优势加权监督微调

Author: Keertana Chidambaram, Qiuling Xu, Ko-Jen Hsiao, Moumita Bhattacharya

作者：Keertana Chidambaram，Qiuling Xu，Ko-Jen Hsiao，Moumita Bhattacharya

(*The work was done when Keertana interned at Netflix.)

(*该工作是在Keertana在Netflix实习期间完成的。)

Introduction

介绍

This blog focuses on post-training generative recommender systems. Generative recommenders (GRs) represent a new paradigm in the field of recommendation systems (e.g. HSTU, OneRec). These models draw inspiration from recent advancements in transformer architectures used for language and vision tasks. They approach the recommendation problem, including both ranking and retrieval, as a sequential transduction task. This perspective enables generative training, where the model learns by imitating the next event in a sequence of user activities, thereby effectively modeling user behavior over time.

本博客专注于后训练生成推荐系统。生成推荐器（GRs）代表了推荐系统领域的一种新范式（例如 HSTU, OneRec）。这些模型从最近在语言和视觉任务中使用的变换器架构的进展中获得灵感。它们将推荐问题，包括排名和检索，视为一个顺序转导任务。这一视角使得生成训练成为可能，模型通过模仿用户活动序列中的下一个事件来学习，从而有效地建模用户行为随时间的变化。

However, a key challenge with simply replicating observed user patterns is that it may not always lead to the best possible recommendations. User interactions are influenced by a variety of factors — such as trends, or external suggestions — and the system’s view of these interactions is inherently limited. For example, if a user tries a popular show but later indicates it wasn’t a good fit, a model that only imitates this behavior might continue to recommend similar content, missing the chance to enhance the user’s experience.

然而，简单复制观察到的用户模式的一个关键挑战是，这可能并不总是导致最佳推荐。用户的互动受到多种因素的影响——例如趋势或外部建议——而系统对这些互动的视角本质上是有限的。例如，如果用户尝试了一部热门节目，但后来表示这并不合适，那么仅仅模仿这种行为的模型可能会继续推荐类似的内容，错过了提升用户体验的机会。

This highlights the importance of incorporating user preferences and feedback, rather than solely relying on observed behavior, to improve recommendation quality. In the context of recommendation systems, we benefit from a wealth of user feedback, which includes explicit signals such as ratings and reviews, as well as implicit signal...