使用统计分析进行时间序列中的异常检测
Setting up alerts for metrics isn’t always straightforward. In some cases, a simple threshold works just fine — for example, monitoring disk space on a device. You can just set an alert at 10% remaining, and you’re covered. The same goes for tracking available memory on a server.
为指标设置警报并不总是简单。在某些情况下,简单的阈值就足够了——例如,监控设备上的磁盘空间。您只需在剩余 10% 时设置警报,就可以了。跟踪服务器上的可用内存也是如此。
But what if we need to monitor something like user behavior on a website? Imagine running a web store where you sell products. One approach might be to set a minimum threshold for daily sales and check it once a day. But what if something goes wrong, and you need to catch the issue much sooner — within hours or even minutes? In that case, a static threshold won’t cut it because user activity fluctuates throughout the day. This is where anomaly detection comes in.
但是如果我们需要监控网站上的用户行为呢?想象一下经营一个在线商店,销售产品。一种方法可能是设定每日销售的最低阈值,并每天检查一次。但如果出现问题,而你需要在几小时甚至几分钟内及时发现呢?在这种情况下,静态阈值就不够用了,因为用户活动在一天中会波动。这就是异常检测发挥作用的地方。
What exactly is anomaly detection? Instead of relying on simple rules, it involves analyzing historical data to spot unusual patterns. There are various ways to implement anomaly detection, including machine learning and statistical analysis. In this article, we’ll focus on the statistical approach and walk through how we built our own anomaly detection system for time series data from scratch at Booking.
异常检测到底是什么?它并不是依赖简单的规则,而是分析历史数据以发现不寻常的模式。有多种方法可以实现异常检测,包括机器学习和统计分析。在本文中,我们将重点关注统计方法,并逐步介绍我们如何从零开始为Booking构建自己的时间序列数据异常检测系统。
The Naïve Approach
简单方法
One common mistake I’ve seen across different companies and teams is trying to detect anomalies by simply comparing a business metric to its value exactly one week ago.
我在不同公司和团队中看到的一个常见错误是,仅仅通过将业务指标与一周前的值进行比较来检测异常。
This week vs previous week
本周与上周的对比
At first glance, this approach isn’t entirely useless — you can catch some anomalies, as shown in the image above. But is it a reliable long-term solution? Not really. The big flaw is that today’s anomaly becomes next week’s baseline. That means if the same issue occ...