Demand and ETR Forecasting at Airports

Airports currently hold a significant portion of Uber’s supply and open supply hours (i.e., supply that is not utilized, but open for dispatch) across the globe. At most airports, drivers are obligated to join a “first-in-first-out” (FIFO) queue from which they are dispatched. When the demand for trips is high relative to the supply of drivers in the queue (“undersupply”), this queue moves quickly and wait times for drivers can be quite low. However, when demand is low relative to the amount of available supply (“oversupply”), the queue moves slowly and wait times can be very high. Undersupply creates a poor experience for riders, as they are less likely to get a suitable ride. On the other hand, oversupply creates a poor experience for drivers as they are spending more time waiting for each ride and less time driving. What’s more, drivers don’t currently have a way to see when airports are under- or over-supplied, which perpetuates this problem.

One way to tackle this undersupply/oversupply issue at airports is to forecast supply balance and use this to optimize resource allocation. Our first application of these models is in estimating the time to request (ETR) for the airport driver queue. We estimate the length of time a driver would have to wait before they receive a trip request, thereby giving drivers the information they need to identify and reposition in periods of undersupply (short waits), or to remain in the city during periods of oversupply (long waits).

ETR Venue Marker

When not on-trip, drivers can preview information about an airport by clicking on the airport venue marker in the driver app. Once clicked, the venue marker displays a tile on the primary screen that contains the estimated wait time, the number of drivers in queue, and the number of flights arriving in the next hour (together these provide context for the estimated wait time). Wait times are classified as short (0-15 minutes), medium (15-30 minutes), or long (>30 minutes). These thresholds follow from UX research. See Figure 1 for an example of the ETR tile.

Figure 1: Example ETR tiles in the venue marker.

Architecture

We built a new models for demand-forecasting and effective queue length on the top of the Michelangelo platform, productionized it, and integrated with current Driver app, with details below:

ETR Model Design

To estimate wait times we develop three models (Figure 2):

We estimate the “true” position of the last person in the FIFO queue, accounting for queue dynamics (e.g., abandonment, joining of priority pass holders, etc.). We refer to this as our “supply” model.
We estimate the queue consumption rate in 15 minute intervals, up to one hour into the future. We refer to this as our “demand” model.
We simulate the consumption of the queue up to the drivers estimated position according to the queue consumption rate using a simple arithmetic algorithm. This results in an estimate of the wait time classification.

Figure 2: Diagram representing the relationship between three constituent models.

“Supply” Model

Airport FIFO queue positions are very dynamic and do not decrease monotonically; that is, because of queue abandonment, multiple different ways of getting matched with rides, and increases in queue length due to drivers with priority passes, the true position of the driver at the back of the queue may be longer or shorter than initially observed. We therefore train a gradient boosted tree containing the following features to derive an estimate of the true position of the last person in the queue every minute:

Observed queue position (i.e. queue length)
Driver dynamics (e.g., rate of abandonment, rate of priority pass additions, rate of trip radar matching, etc.)
Temporal elements (e.g., day of the week, hour of the day)

The output of this model is an integer that is accessed through a dedicated endpoint, and that we refer to as s__t where t refers to the minute of observation.

“Demand” Model

Demand on the FIFO queue is somewhat different from demand for drivers in the general marketplace because it largely depends on arriving flights. These flights are susceptible to exogenous factors, such as inclement weather, congestion, etc., at the origin airport. These factors can substantially change the distribution of demand at the destination airport, and – in the extreme – lead to significant surges and drops in demand at times. To account for this, the queue consumption model leverages a gradient boosted tree trained on the following features to estimate the distribution of demand to the queue:

Distribution of arriving flights
Weather at destination airport
Engagement with the Uber app (e.g., number of people searching trips, number of people with the app open, etc.)
Temporal features (e.g., hour of the day, day of the week, etc.)

The queue consumption rate is estimated in 15 minute increments up to 1 hour into the future. Each increment is estimated independently and accessed through independent endpoints. The result is an ordinal array with the format: {dt,t:t+15 , dt,t+15:t+30 , dt,t+30:t+45 , dt,t+45:t+60} where t refers to the minute of observation. To align with the short – medium – long classification for the ETR venue marker, we would modify this to sum over the last two increments: {dt,t:t+15 , dt,t+15:t+30 , dt,t+30:t+45 + dt,t+45:t+60}

Queue Consumption Logic

In the last stage of estimating ETR we apply a simple algorithm to the supply and demand models to estimate the wait time class. We iterate through the demand array, comparing it to the supply estimate, as a crude simulation of queue consumption. Once the driver in the position estimated by the supply model has been “consumed” we return the timestep of that iteration as the classification.

At this point, “Why not estimate ETR directly?” is a reasonable question to ask. The approach outlined above was deliberately modularized to mitigate risk with complexity: to facilitate the maintenance, troubleshooting, and upgrading of each component as necessary. More importantly, it also allows us to reuse individual elements in other Airports projects – not related to estimated wait times but requiring some notion of airport demand and supply – in the future. This improves our efficiency and allows us to streamline our processes such that the technology underpinning multiple different products is aligned. In the next section we discuss how this was implemented.

Engineering Design

This work resulted in reusable MA Palette Features (most near-real-time features and some batch based), which could be shared with many rider vertical ML use cases and BI analytics purpose (e.g., actual time to receive an offer, which is a key metric for evaluating whether an airport is in undersupply or oversupply).

Below are diagrams of the architecture for ETR and demand forecasting flows. The first shows the overall architecture and explains how the venue marker service interacts with the prediction endpoint to calculate the ETR prediction, then pushes the driver inspection sheet to the driver app (as well as effective queue length training/serving data flows) (Figure 3). The second illustrates how the demand forecasting model trains and serves data flows (Figure 4).

Figure 3: Diagram for ETR calculation and EQL model (simplified)

Figure 4: Diagram for Demand Forecasting Prediction (simplified)

There are a couple of key components and tech challenges/trade-offs from the engineering process that we would like to highlight:

Feature Ingestion with Batch and Real-Time Data

In our project many signals are ingested to make real-time and up-to-date predictions based on real-time signals. One alternative is to use the batch data (Hive) for training purposes and at the server time also pass the feature value from the prediction client that resides in the venue marker service directly for the prediction. We ended up leveraging NRT (near-real-time) features for most signals needed to speed up the feature ingestion, which simplified the client logic during the servicing time (the client only need to pass the joinKey like timestamp during request) It also made many NRT features reusable and shareable with many other use cases and teams.

Some of the challenges are on onboarding various signals from different platforms (especially on the NRT signals), including flights, weather, and rider app real-time engagement (“eyeballs”). On the platform level there are many limitations on how we could enrich via join with dimensional tables or aggregate the data especially during the streaming process. Aligning all the signals in temporal order for the time series model was quite challenging. Eventually we conquered those challenges through many engineering workarounds and trade-offs.

Single Pipeline for All Airports vs. Multiple Pipeline with Airport as Partitions for Training Process

Many statistical learning methods are prone to develop a bias towards the dominant value of a categorical variable when a few categories dominate, even as other categories contain significant samples. Partitioned models reduce this kind of bias, but the downside of that is the operational cost to build and maintain multiple pipelines partitioned by airport ID or queue ID. As the MVP project we will start with a single, non-partitioned pipeline and then identify less performing airports and single them out to build standalone partitioned pipelines to further improve the performance.

Driver queue signals are a key feature for modeling the airport driver/supply. Due to airport regulation, drivers need to be staged in the waiting lot before getting dispatched and some drivers could get the driver queue token for the priority dispatch. To instrument and ingest those real-time signals from our entity queue service we significantly refactored the legacy codebase, but it was causing many scaling issues prior the launch.

Kafka Data (for Driver Queue-Related) Emission Scaling Issues

The amount of data emission was too large for the Kafka client to handle and this is a hard tech constraint on our infrastructure. In order to make the proper trade-off, we switched from our original plan of emit on write to emission on peek operation, although it could cause data under-sampling during the mid-night hours and might caused some duplicated events during peak hours.

Database Scaling Issues

The Airport driver team is using Uber’s own NOSQL database docstore for the entity queue data for the queue storage and state change management. There were some scaling issues caused by two culprits: one is from the unbalanced partition scheme, the implementation of which was based on a partition by the queue_id (in most of Uber airports, drivers need to be staged in the queue in the waiting lot), which corrupted the data and DocStore start throwing errors and triggered rate limiting. Then we switched back to the partition by entity_id (driver_id), which resolved the issue, but with the trade-off that we needed to build a materialized view to handle queries on the queue level, which occasionally causes data sync delay issues. The other issue is that without further horizontally scaling out the entity queue to more hosts, the high CPU/memory usage caused DocStore client hiccups and eventually introduced a timeout error.

ETR Display and Prediction Serving Logic on Venue Marker Backend Service

Having the ETR displayed on the driver app was relatively straightforward. The Uber Driver app, Carbon, was built on top of architecture with the majority of UI elements displayed being backend driven. Venue marker services are responsible for serving airport map maker and relevant info to drivers to guide them to the airport pickup.

Two Michelangelo clients are embedded in the Venue marker service that call two different Michelangelo services and use an algorithmic heuristic to calculate the final ETR value. The design is modularized and in the first launching phase we used the other heuristics to swap out the effective queue length model and catch up on the project timeline without sacrificing significant prediction degradation.

What’s Next?

We are exploring a couple of different options to further improve ETR precision/recall and extend the ETR function on top of other Uber products. These include leveraging a time-series-based deep learning model to expand the prediction horizon across the entire day, in-queue ETR calculation, and better integration with current supply summoning tools at airports to solve undersupply challenges at airports.

Acknowledgements

The project, launched right before the Christmas and New Year holidays, achieved great success in most business metrics thanks to the cross-functional efforts by the Airport team within Rider Verticals in the Uber Mobility Org.

Uber’s Airports Team is responsible for managing and optimizing the pickup and drop-off experience for riders and drivers at airports, and also works with airport authorities to develop solutions that improve the pickup and drop-off process, such as dedicated pickup zones and efficient routing systems. Also the team is expanding the business beyond airports into ancillary travel services and capturing an even greater share of travel spend.

Many thanks from the support from the Uber internal ML platform (Michelangelo) team, that helped us to onboard the training features (batch and near-real-time) onto their platform as well as leveraging their serving, monitoring, and alerting capabilities at scale. Michelangelo is a self-service ML platform that Uber teams use to train, manage, and deploy machine learning systems at scale.