DeepETA：Uber如何利用深度学习预测到达时间

DeepETA: How Uber Predicts Arrival Times Using Deep Learning

At Uber, magical customer experiences depend on accurate arrival time predictions (ETAs). We use ETAs to calculate fares, estimate pickup times, match riders to drivers, plan deliveries, and more. Traditional routing engines compute ETAs by dividing up the road network into small road segments represented by weighted edges in a graph. They use shortest-path algorithms to find the best path through the graph and add up the weights to derive an ETA. But as we all know, the map is not the terrain: a road graph is just a model, and it can’t perfectly capture conditions on the ground. Moreover, we may not know which route a particular rider and driver will choose to their destination. By training machine learning (ML) models on top of the road graph prediction using historical data in combination with real-time signals, we can refine ETAs that better predict real-world outcomes.

在Uber，神奇的客户体验取决于准确的到达时间预测（ETA）。我们使用ETA来计算票价，估计接载时间，为乘客和司机牵线搭桥，计划送货，等等。传统的路由引擎计算ETA的方法是将道路网络划分为由图中加权边代表的小路段。他们使用最短路径算法来寻找通过该图的最佳路径，并将权重相加以得出ETA。但我们都知道，地图不是地形：道路图只是一个模型，它不能完美地反映地面的情况。此外，我们可能不知道一个特定的骑手和司机会选择哪条路线到达目的地。通过在道路图预测的基础上训练机器学习（ML）模型，使用历史数据与实时信号相结合，我们可以完善ETA，更好地预测真实世界的结果。

For several years, Uber used gradient-boosted decision tree ensembles to refine ETA predictions. The ETA model and its training dataset grew steadily larger with each release. To keep pace with this growth, Uber’s Apache Spark_™_ team contributed upstream improvements [1, 2] to XGBoost to allow the model to grow ever deeper, making it one of the largest and deepest XGBoost ensembles in the world at that time. Eventually, we reached a point where increasing the dataset and model size using XGBoost became untenable. To continue scaling the model and improving accuracy, we decided to explore deep learning because of the relative ease of scaling to large datasets using data-parallel SGD [3]. To justify switching to deep learning we needed to overcome three main challenges:

几年来，Uber使用梯度提升的决策树群来完善ETA预测。ETA模型和它的训练数据集随着每个版本的发布而稳步增长。为了跟上这个增长的步伐，Uber的Apache Spark_™_团队对XGBoost进...