弥合差距：诊断 Pinterest 的 L1 转化模型中的在线‑离线差异

[

](https://medium.com/@Pinterest_Engineering?source=post_page---byline--1320faaaeefe---------------------------------------)

Press enter or click to view image in full size

按回车或点击查看图片全尺寸

Introduction

引言

The L1 ranking stage sits in the middle of Pinterest’s ads funnel. It filters and prioritizes candidates under tight latency constraints so that downstream ranking and auction systems only see a manageable set of ads.

L1 排名阶段 位于 Pinterest’s 广告漏斗的中间。它在严格的延迟约束下过滤并优先排序候选广告，以便下游排名和拍卖系统只看到一组可管理的广告。

When we started pushing new L1 conversion (CVR) models, we saw the same pattern repeatedly:

当我们开始推送新的 L1 conversion (CVR) models 时，我们反复看到相同的模式：

Offline: strong, consistent gains on loss and calibration across log sources and pCVR buckets.
Offline: 在 loss 和 calibration 上跨 log sources 和 pCVR buckets 实现了强劲、一致的提升。
Online: neutral or negative A/B results, plus surprising mix‑shifts for oCPM traffic.
Online： 中性或负面的 A/B 结果，加上 oCPM 流量的令人惊讶的 mix‑shifts。

This gap between offline evaluation and online A/B performance, which we call our Online–Offline (O/O) discrepancy, kept promising models from launching.

离线评估与在线 A/B 性能之间的差距，我们称之为 Online–Offline (O/O) 差异，阻碍了有前景的模型上线。

In this post, we’ll walk through:

在本文中，我们将逐步讲解：

How we structured the investigation, instead of chasing one‑off bugs
我们如何构建调查，而不是追逐一次性 bug
What actually went wrong in features, embeddings, and funnel design
features、embeddings 和 funnel design 中实际出了什么问题

弥合差距：诊断 Pinterest 的 L1 转化模型中的在线‑离线差异

弥合差距：诊断 Pinterest 的 L1 转化模型中的在线‑离线差异

Introduction

引言

Background: Two Ways to Judge an L1 Model

背景：判断 L1 模型...