Lyft的因果预测(第二部分)
By Sameer Manek and Duane Rich
作者:Sameer Manek和Duane Rich
In our last blog, we discussed how managing our business effectively comes down to, in large part, making causally valid forecasts based on our decisions. Such forecasts accurately predict the future while still agreeing with experiment (e.g. increasing prices by X will decrease conversion by Y). With this, we can optimize our decisions to yield a desirable future.
在我们的上一篇博客中,我们讨论了如何有效地管理我们的业务,在很大程度上,基于我们的决策,做出有因果关系的预测。这样的预测在准确预测未来的同时,仍然同意实验(例如,增加X的价格将减少Y的转换)。有了这些,我们可以优化我们的决策,以产生一个理想的未来。
But there remains a gap between theory and the implementation that makes it a reality. In this blog, we will discuss the design of software and algorithms we use to bridge this gap.
但是在理论和使之成为现实的实施之间仍然存在着差距。在这篇博客中,我们将讨论我们用来弥补这一差距的软件和算法的设计。
Problems Between Theory and Application
理论与应用之间的问题
A powerful way to reason is to imagine the end state and enumerate the issues anticipated along the way. Our goal is a causally-valid forecasting system that predicts the whole of our business. As mentioned, there are theoretical advantages to framing this as a DAG of DAGs, each of which represents some input/output mapping from one set of variables to another.
一个强大的推理方式是想象最终状态,并列举出沿途预期的问题。我们的目标是一个有因果关系的预测系统,可以预测我们的整个业务。如前所述,将其框定为DAG的DAG有理论上的优势,每个DAG都代表从一组变量到另一组变量的一些输入/输出映射。
From here, it’s clear we need to enable data scientists to develop models in parallel and combine them later using some protocol to guarantee compatibility and accuracy. Given the complexity of our business, we consider this modeling parallelism a hard requirement.
从这里可以看出,我们需要让数据科学家能够并行地开发模型,并在以后使用一些协议将它们结合起来,以保证兼容性和准确性。鉴于我们业务的复杂性,我们认为这种建模的并行性是一个硬性要求。
We also require useful aggregations. We know, apriori, it’s impossible to model bottom-up from the session-level to the national net revenue level. This necessitates a metrics framework that anticipates the use of sums and averages to measure our business’s most important dimensions. Doing so alleviates issues of computational tractability upfront.
我们还需要有用的汇总。我们预先知道,不可能自下而...