Reddit上的浏览次数统计

****Krishnan Chandra (u/shrink_and_an_arch)
Senior Software Engineer

****Krishnan Chandra (u/shrink_and_an_arch)
高级软件工程师

We want to better communicate the scale of Reddit to our users. Up to this point, vote score and number of comments were the main indicators of activity on a given post. However, Reddit has many visitors that consume content without voting or commenting. We wanted to build a system that could capture this activity by counting the number of views a post received. This number is then shown to content creators and moderators to provide them better insight into the activity on specific posts.

我们希望更好地向用户传达 Reddit的规模。到目前为止,投票得分和评论数量是对给定帖子活动的主要指标。然而,Reddit有许多访问者在不投票或评论的情况下消费内容。我们希望构建一个能够通过计算帖子的浏览次数来捕捉这种活动的系统。然后,将此数字显示给内容创作者和版主,以便为他们提供有关特定帖子活动的更好洞察。

In this post, we’re going to talk about how we implemented counting at scale.

在本文中,我们将讨论如何实现大规模计数。

Counting Methodology

计数方法

We had four main requirements for counting views:

我们对计数视图有四个主要要求:

  • Counts must be real time or near-real time. No daily or hourly aggregates.
  • 计数必须是实时或接近实时的。没有每日或每小时的聚合。
  • Each user must only be counted once within a short time window.
  • 在短时间窗口内,每个用户只能被计数一次。
  • The displayed count must be within a few percentage points of the actual tally.
  • 显示的计数必须与实际计数相差不超过几个百分点。
  • The system must be able to run at production scale and process events within a few seconds of their occurrence.
  • 系统必须能够在事件发生后的几秒内以生产规模运行并处理事件。

Satisfying all four of these requirements is trickier than it sounds. In order to maintain an exact count in real time we would need to know whether or not a specific user visited the post before. To know that information, we would need to store the set of users who had previously visited each post, and then check that set every time we processed a new view on a post. A naive implementation of this solution would be to store the unique user set as a hash table in memory, with the post ID as the key.

满足这四个要求比听起来要棘手。为了实时维护准确计数,我们需要知道特定用户是否之前访问过帖子。要了解这些信息,我们需要存储先前访问过每个帖子的用户集,并在处理新的帖子浏览时检查该集合。这种解决方案的一个简单实现是将唯一用户集合作为内存中的哈希表存储,以帖子ID作为键。

This approach work...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.144.1. UTC+08:00, 2025-07-28 07:51
浙ICP备14020137号-1 $访客地图$