构建一个大规模的无监督模型异常检测系统--第一部分

By Anindya Saha, Han Wang, Rajeev Prabhakar

作者：Anindya Saha,Han Wang,Rajeev Prabhakar

Introduction

简介

LyftLearn is Lyft’s ML Platform. It is a machine learning infrastructure built on top of Kubernetes that powers diverse applications such as dispatch, pricing, ETAs, fraud detection, and support. In a previous blog post, we explored the architecture and challenges of the platform.

LyftLearn是Lyft的ML平台。它是建立在Kubernetes之上的机器学习基础设施，为调度、定价、ETA、欺诈检测和支持等不同应用提供动力。在之前的一篇博文中，我们探讨了该平台的架构和挑战。

In this post, we will focus on how we utilize the compute layer of LyftLearn to profile model features and predictions and perform anomaly detection at scale.

在这篇文章中，我们将重点介绍我们如何利用LyftLearn的计算层来剖析模型特征和预测，并大规模地执行异常检测。

This article is divided into two parts. In this first part we will focus on how we profile model features and predictions at scale. In part 2, we will focus on how we use this profiled data for anomaly detection.

这篇文章分为两部分。在第一部分中，我们将重点介绍我们如何在规模上对模型特征和预测进行剖析。在第二部分中，我们将重点介绍我们如何使用这些剖析的数据进行异常检测。

Motivation

激励

Machine learning forms the backbone of the Lyft app and is used in diverse applications such as dispatch, pricing, fraud detection, support, and many more. There are several potential applications of anomaly detection to improve machine learning models at Lyft. By identifying faults or changing trends with the features and predictions of the models, we can quickly identify whether there is feature drift, or concept drift. Identifying anomalies in critical business metrics allows teams to streamline operations and plan for corrective actions.

机器学习构成了Lyft应用程序的支柱，并被用于不同的应用程序，如调度、定价、欺诈检测、支持等等。异常检测有几个潜在的应用，以改善Lyft的机器学习模型。通过识别故障或模型的特征和预测的变化趋势，我们可以快速识别是否存在特征漂移，或概念漂移。识别关键业务指标的异常，使团队能够简化操作，并计划纠正措施。

In our previous blog, we discussed the various challenges we faced in model monitoring and our strategy to address some of these issues. We briefly discussed using z-scores to find anomalies. One challenge we faced with z-scores was that there tend to be many false positives because features and predictions can deviate statistically without im...