解读A/B测试结果:假阳性和统计意义

Martin Tingley with Wenjing Zheng, Simon Ejdemyr, Stephanie Lane, and Colin McFarland

马丁-丁利 郑文静, Simon Ejdemyr, 斯蒂芬妮-莱恩, 和 科林-麦克法兰

This is the third post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. Need to catch up? Have a look at Part 1 (Decision Making at Netflix) and Part 2 (What is an A/B Test?). Subsequent posts will go into more details on experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, and the importance of the culture of experimentation within Netflix.

这是关于Netflix如何使用A/B测试为决策提供信息并不断创新产品的多部分系列的第三篇文章。需要了解一下吗?请看 第一部分 (Netflix的决策)和 第二部分 (什么是A/B测试?)。)随后的文章将更详细地介绍Netflix的实验,Netflix如何投资基础设施以支持和扩大实验,以及Netflix内部实验文化的重要性。

In Part 2: What is an A/B Test we talked about testing the Top 10 lists on Netflix, and how the primary decision metric for this test was a measure of member satisfaction with Netflix. If a test like this shows a statistically significant improvement in the primary decision metric, the feature is a strong candidate for a roll out to all of our members. But how do we know if we’ve made the right decision, given the results of the test? It’s important to acknowledge that no approach to decision making can entirely eliminate uncertainty and the possibility of making mistakes. Using a framework based on hypothesis generation, A/B testing, and statistical analysis allows us to carefully quantify uncertainties, and understand the probabilities of making different types of mistakes.

第二部分:什么是A/B测试中,我们谈到了测试Netflix的前10名列表,以及这个测试的主要决策指标是衡量会员对Netflix的满意度。如果这样的测试显示主要决策指标有明显的改善,那么该功能就成为向所有会员推广的有力候选。但是,鉴于测试的结果,我们如何知道我们是否做出了正确的决定?重要的是要承认,没有任何决策方法可以完全消除不确定性和犯错的可能性。使用基于假设生成、A/B测试和统计分析的框架,使我们能够仔细地量化不确定性,并了解犯不同类型错误的概率。

There are two types of mistakes we can make in acting on test results. A false positive (also called a Type I error) occurs when the data from the test indicates a meaningful difference between the control and treatment experiences, but in truth there is no differ...

开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2024 iteam. Current version is 2.129.0. UTC+08:00, 2024-07-04 01:42
浙ICP备14020137号-1 $お客様$