从过去的实验中学习更好的代理指标,提升您下一个实验的质量

We are excited to share our work on how to learn good proxy metrics from historical experiments at KDD 2024. This work addresses a fundamental question for technology companies and academic researchers alike: how do we establish that a treatment that improves short-term (statistically sensitive) outcomes also improves long-term (statistically insensitive) outcomes? Or, faced with multiple short-term outcomes, how do we optimally trade them off for long-term benefit?

我们很高兴与大家分享我们在这篇论文中关于如何从历史实验中学习良好的代理度量的工作,该论文将在KDD 2024上展示。这项工作对于科技公司和学术研究人员来说都是一个基本问题:我们如何确定一种改善短期(统计敏感)结果的治疗方法也能改善长期(统计不敏感)结果?或者,在面对多个短期结果时,我们如何最优地权衡它们以获得长期利益?

For example, in an A/B test, you may observe that a product change improves the click-through rate. However, the test does not provide enough signal to measure a change in long-term retention, leaving you in the dark as to whether this treatment makes users more satisfied with your service. The click-through rate is a proxy metric (S, for surrogate, in our paper) while retention is a downstream business outcome or north star metric (Y). We may even have several proxy metrics, such as other types of clicks or the length of engagement after click. Taken together, these form a vector of proxy metrics.

例如,在A/B测试中,您可能观察到产品变更提高了点击率。然而,该测试并不能提供足够的信号来衡量长期留存的变化,使您无法确定这种治疗方法是否使用户对您的服务更满意。点击率是一个代理度量(在我们的论文中表示为S),而留存是一个下游的业务结果核心指标Y)。我们甚至可能有几个代理度量,例如其他类型的点击或点击后的参与时长。这些代理度量共同构成了一个向量

The goal of our work is to understand the true relationship between the proxy metric(s) and the north star metric — so that we can assess a proxy’s ability to stand in for the north star metric, learn how to combine multiple metrics into a single best one, and better explore and compare different proxies.

我们的工作目标是理解代理指标与北极星指标之间的真实关系,以便我们可以评估代理指标代表北极星指标的能力,学习如何将多个指标合并为一个最佳指标,并更好地探索和比较不同的代理。

Several intuitive approaches to understanding this relationship have surprising pitfalls:

对于理解这种关系的几种直观方法存在一些意想不到的陷阱:

  • Looking only at user-level correlations between the proxy S and north star Y. Continuing the example from above, you may ...
开通本站会员,查看完整译文。

trang chủ - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.3. UTC+08:00, 2024-11-28 17:40
浙ICP备14020137号-1 $bản đồ khách truy cập$