Pinterest 首页信息流中的基于嵌入的检索进展
Zhibo Fan | Machine Learning Engineer, Homefeed Candidate Generation; Bowen Deng | Machine Learning Engineer, Homefeed Candidate Generation; Hedi Xia | Machine Learning Engineer, Homefeed Candidate Generation; Yuke Yan | Machine Learning Engineer, Homefeed Candidate Generation; Hongtao Lin | Machine Learning Engineer, ATG Applied Science; Haoyu Chen | Machine Learning Engineer, ATG Applied Science; Dafang He | Machine Learning Engineer, Homefeed Relevance; Jay Adams | Principal Engineer, Pinner Curation & Growth; Raymond Hsu | Engineering Manager, Homefeed CG Product Enablement; James Li | Engineering Manager, Homefeed Candidate Generation; Dylan Wang | Engineering Manager, Homefeed Relevance
Zhibo Fan | 机器学习工程师,Homefeed候选生成;Bowen Deng | 机器学习工程师,Homefeed候选生成;Hedi Xia | 机器学习工程师,Homefeed候选生成;Yuke Yan | 机器学习工程师,Homefeed候选生成;Hongtao Lin | 机器学习工程师,ATG应用科学;Haoyu Chen | 机器学习工程师,ATG应用科学;Dafang He | 机器学习工程师,Homefeed相关性;Jay Adams | 首席工程师,Pinner策划与增长;Raymond Hsu | 工程经理,Homefeed CG产品赋能;James Li | 工程经理,Homefeed候选生成;Dylan Wang | 工程经理,Homefeed相关性
Introduction
介绍
At Pinterest Homefeed, embedding-based retrieval (a.k.a Learned Retrieval) is a key candidate generator to retrieve highly personalized, engaging, and diverse content to fulfill various user intents and enable multiple actionability, such as Pin saving and shopping. We have introduced the establishment of this two-tower model with its modeling basics and serving details. In this blog, we will focus on the improvements we made on embedding-based retrieval: how we scale up with advanced feature crossing and ID embeddings, upgrading the serving corpus, and our current journey to machine learning based retrieval revolution with state-of-the-art modeling.
在Pinterest首页,基于嵌入的检索(即学习检索)是一个关键候选生成器,用于检索高度个性化、引人入胜和多样化的内容,以满足各种用户意图并实现多种可操作性,例如保存Pin和购物。我们介绍了这个双塔模型的建立及其建模基础和服务细节。在这篇博客中,我们将重点介绍我们在基于嵌入的检索上所做的改进:我们如何通过先进的特征交叉和ID嵌入进行扩展,升级服务语料库,以及我们当前在基于机器学习的检索革命中的旅程,采用最先进的建模。
Feature Crossing
特征交叉
We have various features provided to the model in the hope that it can reveal the latent patt...