弥合差距:诊断 Pinterest 的 L1 转化模型中的在线‑离线差异
[
[
Authors: Yao Cheng | Senior Machine Learning Engineer; Qingmengting Wang | Machine Learning Engineer II; Yuanlu Bai | Machine Learning Engineer II; Yuan Wang | Machine Learning Engineer II; Zhaohong Han | Machine Learning Engineer Manager ; Jinfeng Zhuang | Senior Machine Learning Engineer Manager
作者:Yao Cheng | 高级机器学习工程师;Qingmengting Wang | 机器学习工程师 II;Yuanlu Bai | 机器学习工程师 II;Yuan Wang | 机器学习工程师 II;Zhaohong Han | 机器学习工程师经理;Jinfeng Zhuang | 高级机器学习工程师经理
Press enter or click to view image in full size
按回车或点击查看图片全尺寸

Introduction
引言
The L1 ranking stage sits in the middle of Pinterest’s ads funnel. It filters and prioritizes candidates under tight latency constraints so that downstream ranking and auction systems only see a manageable set of ads.
L1 排名阶段 位于 Pinterest’s 广告漏斗的中间。它在严格的延迟约束下过滤并优先排序候选广告,以便下游排名和拍卖系统只看到一组可管理的广告。
When we started pushing new L1 conversion (CVR) models, we saw the same pattern repeatedly:
当我们开始推送新的 L1 conversion (CVR) models 时,我们反复看到相同的模式:
- Offline: strong, consistent gains on loss and calibration across log sources and pCVR buckets.
- Offline: 在 loss 和 calibration 上跨 log sources 和 pCVR buckets 实现了强劲、一致的提升。
- Online: neutral or negative A/B results, plus surprising mix‑shifts for oCPM traffic.
- Online: 中性或负面的 A/B 结果,加上 oCPM 流量的令人惊讶的 mix‑shifts。
This gap between offline evaluation and online A/B performance, which we call our Online–Offline (O/O) discrepancy, kept promising models from launching.
离线评估与在线 A/B 性能之间的差距,我们称之为 Online–Offline (O/O) 差异,阻碍了有前景的模型上线。
In this post, we’ll walk through:
在本文中,我们将逐步讲解:
- How we structured the investigation, instead of chasing one‑off bugs
- 我们如何构建调查,而不是追逐一次性 bug
- What actually went wrong in features, embeddings, and funnel design
- features、embeddings 和 funnel design 中实际出了什么问题