利用聚类和自动规则创建打击垃圾邮件
Cathy Yang | Software Engineer, Trust & Safety
Cathy Yang | 软件工程师,信任与安全
One of our biggest priorities at Pinterest is keeping Pinners safe, and that includes protecting them from spam. The Trust & Safety team’s goal is not only to catch spam, but to remove it as quickly as possible to minimize Pinner impact.
我们在Pinterest的最大优先事项之一是保证Pinners的安全,这包括保护他们免受垃圾邮件的影响。信任与安全团队的目标不仅是捕捉垃圾邮件,而且要尽快将其删除,以尽量减少对Pinner的影响。
The goal of spammers is to make money, and the best way to do this is to spam at scale. It’s a numbers game: one million spam emails are much more effective than one spam email. In order to remove spam quickly, we look at common trends in spam attacks to identify suspect behavior.
垃圾邮件发送者的目标是赚钱,而做到这一点的最好方法是大规模发送垃圾邮件。这是一个数字游戏:一百万封垃圾邮件比一封垃圾邮件要有效得多。为了迅速清除垃圾邮件,我们研究了垃圾邮件攻击的常见趋势,以确定可疑行为。
To achieve the scale required to be effective, spammers must automate their actions, and each of these “attacks” can be thought of as a cluster. Each event within the attack cluster may share some common features, but different clusters will have a different set of common features.
为了达到有效所需的规模,垃圾邮件发送者必须将其行动自动化,这些 "攻击 "中的每一个都可以被认为是一个集群。攻击集群内的每个事件可能有一些共同特征,但不同的集群会有一组不同的共同特征。
For example, during an attack where a large number of Pins are created, a spammer might point all Pins to the same domain. While the domain may change between attacks, spammers are still trying to direct traffic to the same spam site.
例如,在一次创建大量图钉的攻击中,垃圾邮件发送者可能将所有图钉指向同一个域名。虽然在不同的攻击中,域名可能会改变,但垃圾邮件发送者仍然试图将流量导向同一个垃圾邮件网站。
One of our spam mitigation tactics is our rule engine, Guardian, which helps to identify common features in spam attacks.
我们的垃圾邮件缓解策略之一是我们的规则引擎Guardian,它有助于识别垃圾邮件攻击的共同特征。
Motivation
激励
Previously, when a spam attack happened:
以前,当发生垃圾邮件攻击时。
- An alert would fire in our system statsboard, and an on-call analyst would investigate
- 我们的系统统计板会发出警报,待命的分析员会进行调查
- The analyst would identify the common trends of that attack
- 分析师将确定该攻击的共同趋势
- The analyst would create a “patch rule” (a specific and temporary rule to address the attack) then apply it retroactively to catch old ...