ETA (Estimated Time of Arrival) Reliability at Lyft

Imagine this: You’ve got a crucial early morning flight to catch — you open the Lyft app, and see an estimated pickup time on the screen. But here’s the million-dollar question: Will the driver you are paired with, arrive at the estimated time? One of Lyft’s most simple, yet profound goals is to ensure we provide riders with the most accurate ETAs we can.

Before you even hit ‘request’ and summon a ride, there are complex algorithms that sift through historical and real-time data, leveraging machine learning alongside traffic and weather insights to predict the ETA (or pickup time) to display on the rider’s screen based on the destination they input. This got us thinking — how do we determine ETA estimates as accurately as possible, before a rider requests a ride?

Enter: Reliability

What does ‘reliable’ really mean in this context?

In the realm of ridesharing, ‘reliability’ takes on different dimensions depending on the ride’s phase — be it prior to a rider requesting a ride, after the ride has been requested, or after the details of the driver who has accepted the rider’s offer, emerge. For instance, once a driver’s information is visible, reliability translates to the accuracy of the predicted driver arrival time (displayed on the screen) compared to the driver’s factual arrival time.

So, what encompasses reliability before a ride is even requested?

Simply put — Given an ETA before a ride request, reliability is the likelihood that a driver will arrive within a reasonable timeframe around that ETA, should the ride be booked.

Rider Screen pre-request

Understanding and estimating this aspect of reliability is crucial for setting accurate ETAs, as it has a direct impact on the likelihood of riders canceling their bookings. Illustrated below is a simulated graph depicting the relationship between reliability and rider cancellation rates — demonstrating a higher reliability % results in lower % of cancellations.

Reliability vs Cancels graph

Our objective, for each ride option presented to our users, is to showcase an accurate ETA with a strong likelihood of a swift, reliable pickup.

Unpacking ETA Uncertainties

Before we set out to tackle this problem, it is important to understand the reasons for unreliability — i.e, why estimated times of arrival (ETAs) might differ from the actual arrival times.

Unpredictability of Driver Availability: At the heart of the ETA challenge is the inherent uncertainty around driver availability at the time a rider requests a ride. Our system endeavors to predict the closest and most suitable drivers who may choose to accept a ride request. Yet, there are still many variables at play:
– Driver Preference: Drivers have the autonomy to reject or cancel a ride based on personal preferences, impacting the ETA estimate.
– Driver Contention: The occurrence of several requests simultaneously vying for the same driver complicates the process of matching each ride request with a driver.
– Changes in Driver Status: Drivers may choose to log off unexpectedly which could alter ETAs.
Organic ETA uncertainty/mapping volatility: Beyond the unpredictability of driver availability, there are other factors at play that can skew ETA accuracy:
– Traffic Conditions: Traffic can unpredictably affect travel times.
– Navigation Challenges: Unexpected detours like missed turns/ road closures can add time to the journey.
– GPS Volatility: GPS inaccuracies can affect the exact location of a driver or rider, impacting ETA predictions prior to request.
Marketplace Dynamics: Another layer of complexity is the supply and demand dynamics within specific neighborhoods. There are instances where requests are made from areas with a lower density of available drivers. Additionally, marketplace conditions are in constant flux, with the balance of supply and demand shifting within minutes, further impacting ETA reliability.

Harnessing Machine Learning (ML) for Reliability Prediction

The rest of the article delves into some technical aspects of ML and familiarity with fundamental concepts is recommended!

To predict the effect of selecting a certain ETA on ride reliability, we could potentially use Causal Inference methods leveraging historical data to predict the causal effect of different ETA settings on reliability, given actual arrival times, rider cancellations and other relevant metrics.

However, in order to automatically detect complex interactions between multiple variables (like driver behavior, ETA patterns, demand and supply conditions) without explicit specifications, we decided to harness ML. This approach enhances our ability to accurately predict ride fulfillment reliability by analyzing rich datasets, while also ensuring scalability and efficiency in our processes.

We started with the objective of developing a classification model, capable of predicting the reliability probability of ETA estimates. The goal? To arm downstream services with reliability scores for all possible ETA brackets, enabling the selection of the most accurate ETAs for our riders.

Fig 1. Example classification model prototype

But how are these reliability estimates used for ETA selection?

Using product requirements, data insights and UX research, we set a stringent reliability Service Level Agreement (SLA) for every possible ETA estimate that we can present to our users (possible ETA brackets are pre-determined per ride type). This ensures we hit reliability targets by only selecting ETAs that meet the desired SLAs. Fig 1 illustrates a model prototype — sample inputs include possible ETAs (ranging from 1 to 10 minutes) and ride-level and marketplace features. The outputs are ETAs enhanced with model-based reliability estimates. Finally, the ETA with reliability greater than SLA is selected. In this narrative, we will focus on our approach towards reliability estimation.

The Model

At the core of our solution lies a tree-based classification model. While deep learning models have their advantages and are increasingly used in ridesharing for tasks like demand forecasting, route optimization, and image recognition (e.g. identifying road conditions), they are often an overkill for everyday lightweight business classification tasks.

Gradient boosting tree-based models have been a historic choice at Lyft for these purposes due to their clear interpretability, efficiency with smaller datasets, and robustness to outliers and missing values. These models excel in handling structured tabular data common in ridesharing, capturing complex, non-linear relationships and feature interactions without extensive feature preprocessing/ scaling. They require less computational resources and are straightforward to implement and maintain, facilitating rapid deployment in production environments.

Features and Training

Along with the ETA estimate we want to predict reliability for, we need features that would help us capture as much of the marketplace uncertainty as possible at prediction time itself -

Nearby Available Drivers: We identify a list of the closest drivers to a ride request and use their characteristics, such as estimated driving time, distance, and driver status (online, offline, or completing a trip) as model features. This data helps the model gauge the likelihood of each driver matching with the ride, should the ride be requested in the future.
Harnessing Historical Insights: Our model integrates historical data at the regional and granular geohash level to offer a broader perspective on performance trends. Recent driver ETA estimates and match times, and number of completed and canceled rides establish historical benchmarks that help adjust predictions based on recent performance.
Marketplace Features: Capturing the Pulse of Demand and Supply: Realtime neighborhood-level demand and supply indicators, such as the number of app opens, unassigned rides and driver pool counts offer a granular view of the market conditions.
To further refine model predictions, we incorporate features such as pickup/ dropoff location, temporal elements, and categorical data like which region the ride was requested.

Innovative Training Approach: Our training label is generated by comparing the actual request-to-driver arrival time against the ETA to produce a binary label for reliability. A unique aspect of our approach is the decision to train the model on all possible ETA estimates for each ride, rather than just the factual ETA estimates shown to riders (prior to request), i.e, each ride in the training data is duplicated n times — n = number of possible ETA estimates (eg — 1, 2, 3, … 10 minutes). This strategy helps us -

Avoid negative feedback loops during training — a model trained on only factuals could progressively degrade over time.
Ensures equal representation of all possible ETA estimates which could be seen during inference.
Allows the model to learn variances in driver ETA estimation (driver ETAs which are used as model features are generated upstream by another service and may not always be accurate).

Evaluating Model Performance: We use Area Under the Curve (AUC) metric to evaluate model performance since it evaluates performance across all thresholds and not just a single one (which is useful since we utilize raw probabilities for our use case).

AUC Curve for the Reliability Model

We also look at performance per ETA bracket — the model bias is generally small but increases for larger ETAs (owing to smaller % of rides with say ETA > 15 minutes).

Beyond Prediction: Ensuring Sustained Performance

Ensuring that a model meets our use case upon training and deployment is crucial, but maintaining its performance over time in the dynamic rideshare environment presents a unique challenge. Consider how a model trained before significant societal shifts, such as the pandemic, would struggle as commuting patterns evolve dramatically. Similarly, updates to the Lyft app itself can impact the functionality of its components, including predictive models. This necessitates a robust system for continuous monitoring of features and performance to identify and address any degradation promptly.

It’s often hard to pinpoint the root cause of the model degradation, but most times, a simple retrain on fresh data can often mitigate performance declines. Thankfully, Lyft’s advanced ML Platform called LyftLearn composed of model serving, training, CI/CD, feature serving, and model monitoring functionality equips us with necessary tools to establish drift detection alarms and automated retraining pipelines seamlessly.

What’s Next?

In addition to focusing on our primary regions, we are broadening our analysis to include unique markets (e.g. complex marketplaces like airports) to incorporate more nuanced signals into the model. We are also integrating more real-time signals to better capture dynamic marketplace conditions. As we continue to refine our predictive models and strategies, our goal remains clear: to enhance the reliability of our service and uphold our commitment to providing riders with accurate and trustworthy information.