ML Feature Serving Infrastructure at Lyft

Vinay Kakade

Published in

Lyft Engineering

6 min readMar 16, 2021

By Vinay Kakade and Shiraz Zaman

Introduction

Machine Learning forms the backbone of the Lyft app. Some examples of ML at Lyft include deciding on the optimal way to match drivers and passengers, deciding how to price a ride, distributing coupons and incentives to riders and drivers, detecting fraud, route planning, and automating support. The ML models for these use-cases need features that are computed via batch jobs on the data warehouse or via event streams. Additionally, regardless of the way these features were computed, they need to be made available via batch queries for model training and via low-latency online inference.

Feature Computation and Feature Access Examples

For example, consider a model (say, Cancels Model) that predicts whether a particular ride could be canceled when a ride request is made in the Lyft app. The model may need the Cancels History of the requesting user over the last year as a feature, which would be computed only by running a batch job on the Hive data warehouse which has this historical data. Another feature for this model could be Estimated Price for Current Ride, which would be computed in real-time based on the user’s actions in the app via an event stream such as Kafka. Additionally, (a) historical values for both the features need to be available via batch queries for training the Cancels Model, and (b) current values for both the features need to be available as a point lookup for inference once the model is deployed.

We built a service for ML applications at Lyft, unsurprisingly called the Feature Service, to make all features available for both training and low-latency online inference, regardless of whether they were computed via batch queries or via event streams. The Feature Service is a battle-tested mission-critical service that hosts several 1000s of features across a large number of models, serves millions of requests per minute with single-digit millisecond latency, and has 99.99%+ availability. We share its architecture in this article.

Architecture

The Feature Service consists of Feature Definitions, Feature Ingestion & Processing, and Retrieval.

Feature Definitions

Given the familiarity of SQL amongst ML Practitioners at Lyft, we decided on SQL as the language for feature definitions. The SQL is expected to have one column designated to be an entity ID (which is an identifier for the business entity such as a driver, passenger, or ride), and the rest of the columns are features. The complexity of feature definitions ranges from querying a single table to a few 1000s of lines of SQL comprising complex joins across multiple tables along with transformations. Multiple features can be grouped together into a feature group (aka entity type) for ease of managing a large number of features.

Features are versioned, which is especially important as feature definitions go through several iterations during ML model development. We keep versioning at the feature level rather than at the group level so that feature iterations are quicker. In case the version is omitted in the feature definition, the default feature version is 1.

In addition to the SQL feature definitions, we need users to provide metadata in JSON which consists of the feature group, feature name, version, owner, the data type of the feature, validation information, and operational information such as the team to alert when feature generation is broken or produces invalid data. An example of feature metadata for features gh6_no_show_rate and gh6_total_rides, with the entity type ride, is below.

Feature Ingestion

For features defined on batch data, we run scheduled feature extraction jobs using Flyte. The frequency of the run can be controlled per feature, and the job executes the SQL against Lyft’s Data Warehouse and then writes to the Feature Service.

For features defined on streaming data, the ingestion job happens as a custom Flink job utilizing in-house technology to run SQL queries in the stream. The job executes SQL against a stream window and then writes to the Feature Service.

Feature Processing and Retrieval

The service supports GRPC and REST endpoints for writing and reading features. When the server receives an add or update request for a feature value, it validates the feature value against the feature metadata and then writes the feature value in DynamoDB, where each row corresponds to the most recent feature value for a particular feature. From DynamoDB, the feature values are replicated to Hive and Elasticsearch. We use Redis as a write-through cache for both feature values and feature metadata to increase read throughput, and update the cached feature values while writing new feature values as well. Additionally, given the nature of callers to the service which themselves could be distributed, we use DynamoDB conditional checks to implement optimistic locking, so that the write request with the latest timestamp wins in case of a conflict.

The service supports reads via a batch-get call, where a user can ask for a number of feature values for given entities. This causes reads first from a Redis cache, and then from DynamoDB in case of a cache-miss. The read path is highly optimized for low latency and high throughput, with no locking involved.

For model training, we need to read a large number of feature values together, say for the past year. In these cases, the replicated data to Hive is used. The features are stored as tables in Hive, which are then queried by the training process directly.

Thus, all the four use-cases of features computed via batch jobs or event streams being made available to both training and online inference are satisfied as shown below:

Online-Offline Parity: Note that in the above model, a feature is defined only once, and both training and serving systems use the same feature definitions, along with the same validations and transformations. Thus, feature definitions, validations, and transformations used for training and serving stay in sync. That said, given the replication lags involved, the feature values seen by training and serving systems are eventually consistent (and not strongly consistent) — with replication delay within tolerable limits of ML applications at Lyft.

Summary

Modern ML applications compute features based on both batching and streaming data, and these features should be available in batch mode for model training as well as in point-lookup mode for online inference. In this article, we described the architecture of Lyft’s Feature Service, which makes features available both in batch and point-lookup mode regardless of how they were computed. Our Feature Service was built in Q4 2017, and it has since been widely adopted by a number of Lyft teams for a variety of ML models such as fraud detection, driver dispatch, location projections, growth platforms, pricing, customer support among many others.

Acknowledgments

Huge thanks to Willie Williams, Alex Jaffe, Nitin Aggarwal, Gregory Fee, Arup Malakar, Ravi Kiran Magham, Dev Tagare, Craig Martell, and Balaji Raghavan for making this work possible.

Interested in working with us? Please see https://www.lyft.com/careers for openings.