无需等待的实验:用离线重放实验加快迭代周期
Maxine Qian | Data Scientist, Experimentation and Metric Sciences
数据科学家,实验和度量科学
Ideas fuel innovation. Innovation drives our product toward our mission of bringing everyone the inspiration to create a life they love. The speed of innovation is determined by how quickly we can get a signal or feedback on the promise of an idea so we can learn whether to pursue or pivot. Online experimentation is often used to evaluate product ideas, but it is costly and time-consuming. Could we predict experiment outcomes without even running an experiment? Could it be done in hours instead of weeks? Could we rapidly pick only the best ideas to run an online experiment? This post will describe how Pinterest uses offline replay experimentation to predict experiment results in advance.
思想推动创新。创新推动我们的产品朝着我们的使命前进,即为每个人带来灵感,创造他们热爱的生活。创新的速度取决于我们能多快地获得关于一个想法的承诺的信号或反馈,这样我们就能了解是否要追求或转向。在线实验经常被用来评估产品创意,但它是昂贵和耗时的。我们能否在不进行实验的情况下预测实验结果?能否在几小时内完成,而不是几周?我们能不能迅速地只挑选最好的想法来进行在线实验?这篇文章将描述Pinterest如何使用离线重放实验来提前预测实验结果。
Online Experimentation Limitations
在线实验的局限性
Data-supported decisions shape the evolution of our products at Pinterest. All product teams are empowered to test their product changes with online experimentation (A/B testing), a process to measure the impact on Pinterest users, aka Pinners. However, online experiments have several limitations:
数据支持的决策决定了我们Pinterest产品的发展。所有产品团队都被授权通过在线实验(A/B测试)来测试他们的产品变化,这是一个衡量对Pinterest用户(又称Pinners)影响的过程。然而,在线实验有几个限制。
- Slow data collection: It takes at least seven days and often more to allow sufficient power and capture any weekly patterns.
- 数据收集缓慢。至少需要七天,而且往往需要更多的时间,才能有足够的力量,并捕捉到任何每周的模式。
- Limited simultaneous arms: There can only be a limited number of variations running at the same time to allow a sufficient sample size for each.
- 有限的同时进行的武器。只能有有限的变化同时进行,以使每个变化有足够的样本量。
- Risk-averse treatments: To minimize potential negative impact, there is an incentive to deploy safer, more conservative ideas instead of riskier but potentially highly impactful ideas.
- 规避风险的处理方法。为了最大限度地减少潜在的负面影响,有动力部署更安全、更保守的想法,而不是风险更大但可能...