Supercharging A/B Testing at Uber
摘要
In early 2020, we took a deeper look at this ecosystem. We discovered that a large percentage of the experiments had fatal problems and often needed to be rerun. Obtaining high-quality results required an expert-level understanding of experimentation and statistics, and an inordinate amount of toil (custom analysis, pipelining, etc.). This slowed down decision-making, and re-running poorly conducted experiments was common.
After assessing the customer problems and the internals of Morpheus we concluded that the core abstractions supported only a very narrow set of experiment designs correctly, and even minute deviations from such designs resulted in incomparable cohorts of users in control and treatment and compromised experiment results. To give a very simple example, while gradually rolling out an experiment that was split 30/70 between control and treatment, due to peculiarities of rollout and treatment assignment logic it would be ok to roll it out to 10% of users but not to 5%. Furthermore, the system was not able to support advanced experiment configurations needed to support Uber’s diverse use cases, or other advanced functionality at scale such as monitoring/rolling back experiments that were negatively impacting business metrics. So we decided to build a new platform from scratch with correct abstractions.
欢迎在评论区写下你对这篇文章的看法。