LyftLearn is Lyft’s ML Platform. It is a machine learning infrastructure built on top of Kubernetes that powers diverse applications such as dispatch, pricing, ETAs, fraud detection, and support. In a previous blog post, we explored the architecture and challenges of the platform.
In this post, we will focus on how we utilize the compute layer of LyftLearn to profile model features and predictions and perform anomaly detection at scale.
This article is divided into two parts. In this first part we will focus on how we profile model features and predictions at scale. In part 2, we will focus on how we use this profiled data for anomaly detection.
Machine learning forms the backbone of the Lyft app and is used in diverse applications such as dispatch, pricing, fraud detection, support, and many more. There are several potential applications of anomaly detection to improve machine learning models at Lyft. By identifying faults or changing trends with the features and predictions of the models, we can quickly identify whether there is feature drift, or concept drift. Identifying anomalies in critical business metrics allows teams to streamline operations and plan for corrective actions.
In our previous blog, we discussed the various challenges we faced in model monitoring and our strategy to address some of these issues. We briefly discussed using z-scores to find anomalies. One challenge we faced with z-scores was that there tend to be many false positives because features and predictions can deviate statistically without im...