强化学习用于建模市场平衡

This blog describes how we apply reinforcement learning techniques to make the Uber network more efficient, helping the world move and creating magical experiences for riders and drivers. We discuss how we apply reinforcement learning in our matching algorithm to improve driver and demand balance in our mobility marketplace.

本博客描述了我们如何应用强化学习技术使 Uber 网络更高效，帮助世界移动并为乘客和司机创造神奇的体验。我们讨论了如何在我们的匹配算法中应用强化学习，以改善我们移动市场中的司机和需求平衡。

In a real-time two-sided marketplace like Uber, the balance between drivers and demand for rides is constantly fluctuating depending on external factors like demand variations as well as internal contributors like how Uber moves drivers by matching them to riders. The core challenge for matching algorithms is how to match riders with drivers in the most efficient way, minimizing wait times for riders while maximizing earnings for drivers. Matching drivers to the right places at the right time can be a difficult task, especially when trying to optimize for immediate and long-term efficiency.

在像Uber这样的实时双边市场中，司机与乘车需求之间的平衡不断波动，这取决于需求变化等外部因素以及Uber通过将司机与乘客匹配来移动司机等内部因素。匹配算法的核心挑战是如何以最有效的方式将乘客与司机匹配，最小化乘客的等待时间，同时最大化司机的收入。在正确的时间将司机匹配到正确的地点可能是一项困难的任务，尤其是在试图优化即时和长期效率时。

We specifically view this problem from the lens of balance. A greedy matching algorithm without an understanding of subsequent likely outcomes might create balance at the time of the match, but may cause imbalances in other parts of the city in the future, leading to longer wait times or surge pricing elsewhere. This sequential decision making problem creates an opportunity to use reinforcement learning techniques in the ridesharing marketplace.

我们特别从平衡的角度看待这个问题。没有对后续可能结果的理解的贪婪匹配算法可能在匹配时创造平衡，但可能会导致未来城市其他地方的不平衡，导致更长的等待时间或其他地方的价格飙升。这个序列决策问题为在共享出行市场中使用强化学习技术创造了机会。

We model the Uber matching system in an MDP (Markov Decision Process) framework where the agent takes collective decisions to match drivers to riders in a particular order. The environment is one where the market reacts to the sequence of collective decisions. The MDP system tracks colle...