A *bit* about content ratings

Video is power, and with great power comes great responsibility…responsibility that no one stumbles across something they didn’t sign up to see. Vimeo enables creators to rate their videos and viewers to filter by ratings to keep our incredibly wide audience safe. Content is consumed by millions of viewers, of different ages, from different countries, all with different viewing preferences. We have to ensure that we continue abiding by the ever-changing viewership laws around the world considering how global a platform Vimeo is.

Viewing unwanted content can be an unpleasant surprise. We’re doing our part to keep audiences safe.

A recent change in UK law spurred us to revisit our content rating logic and process. The new law stipulates that when certain users attempt to view a mature or unrated video that is publicly available, we must require the user to log in as age validation (see Figure 1). This was put in place to keep kids on the internet safe from mature content.

Figure 1. To keep kids on the internet safe from mature content, the user is required to validate their age by logging in.

With this change, we made several product updates to tie the whole rating process together from an uploader and video manager’s perspective.

  • The uploader is now prompted to rate their video and indicate whether it includes an advertisement immediately after upload (see Figure 2).

Figure 2. Video managers are now prompted to rate their video content and indicate whether their video contains an advertisement.

  • The uploader or video manager can now manage their content ratings in the same place where they manage their privacy settings.
  • The uploader or video manager now sees mature or unrated badges or an indicator when their video’s content rating is locked by a Vimeo moderator.

Although I was primarily responsible for the UI changes, I went down the rabbit hole of content ratings in our backend system to discover our absolutely brilliant way of storing and filtering by content ratings.

The back end

The concept of content ratings was introduced into Vimeo quite a while ago — in fact, the earliest commit I could find dates back to almost a decade ago. With a stroke of luck (and, let’s be real, Vimeo loyalty), the developer who set up content ratings in our backend system 10 years ago, Sean, was on my team! We hopped on a Zoom to break down the legacy code’s complex bitwise operations that I’d been trying to decipher.

Before we delve into the thick of it, I want to mention that we made the technical decision to add advertisements as part of the content ratings data structure. This ultimately means treating the advertisements flag the same way we treat a violence, nudity, or unrated flag — all under the same matrix. This decision avoided adding an entire column to our already very large video table.

When I first looked at the code, I was expecting some sort of mapping for every rating on a video, maybe something like in Figure 3…

Figure 3. An example of a rating map that captures a video’s content rating by using boolean values.

…or even an array that contained just the values of the selected ratings, like in Figure 4.

Figure 4. This array of ratings would only contain the ratings that are applied to the video.

I had never considered a data structure that could potentially condense all of these booleans into a one single compact value.

In reality, Vimeo’s back end stores a video’s content rating as one value: the sum of all its ratings. This is called a bit field. Instead of storing a bunch of booleans for every rating on a user or video, or having an array of selected values like I had assumed, we actually have one compact bit field to represent all the trues and falses for each rating. This is an extraordinary space saver, as we are avoiding having a hashmap or array (although, I do admit, these are much more readable). Our method can also easily accommodate for additional ratings to be added to this structure — advertisements, in our case. It calculates at an incredible speed whether a video passes the user’s content rating acceptance criteria, and reduces memory consumption by “packing” all the information. Although treated like a content rating in the back end, advertisements cannot be filtered in the viewing preferences. This is why, by default, all viewing preferences include advertisements— a value of 256 (see Figure 5).

Figure 5. Every rating maps to a binary value. When we add a new rating, such as `advertisements`, we assign it the next boolean value.

To understand further how a video’s content rating is accepted by a viewer’s content rating preferences, we’ll look at a simple scenario of Radhika and David, each with unique viewing preferences, encountering the same video. We will ultimately be able to determine how the back end accepts or rejects a video based on the user’s viewing preference.

The binary and bitwise

The first premise is that a fresh new video not yet assigned a content rating has a default rating of 3. By default, a video is not rated and the rating is not locked by a moderator, so we get the sum of these two applicable properties, as shown in Figure 6.

Figure 6. An unrated video (value of 1) that is not locked by a moderator (value of 2) has a combined value of 3.

Let’s say that this particular video that is being uploaded and rated contains drugs [RATING_EXPLICIT_DRUGS] and nudity [RATING_EXPLICIT_NUDITY]. To calculate this video’s rating, as shown below in Figure 7, we would add up all the new ratings and subtract what no longer applies — in our case, it is no longer unrated once we give it a rating.

Figure 7. The addition of `drugs` (a value of 16) and `nudity` (a value of 64), and the removal of `unrated` (value of 1), yields a value of 82.

Next, let’s convert our value to binary (see Figure 8).

Figure 8. The decimal value of 82 converted to binary is 1010010.

Our video’s content rating comes out to 82, or 1010010 in binary form, once we rate this video with drugs and nudity. Now, let’s see what happens when two viewers with different viewing preferences come across this video. This is where the bitwise fun happens!

A user’s content rating filter preference is calculated in the exact same way: as a single value that is the summation of all the ratings that the user is okay with seeing.

Let’s do a quick scenario with our two different users, whose names, you’ll recall, are Radhika and David.

Figure 9 shows that Radhika is okay with watching videos that have the ratings of violence (32) , drugs (16), language (8), safe (4), not locked (2), unrated (1), and advertisements (256), yielding a value of 319, which converts to a binary of 100111111.

Figure 9. This mature content filter shows that Radhika doesn’t want to view content that contains nudity but is okay with viewing all other content types.

You see in Figure 10 that David is okay with watching videos that have the ratings of nudity (64), violence (32) , drugs (16), language (8), safe (4), not locked (2), unrated (1), and advertisements (256), yielding a value of 383, which converts to a binary of 101111111.

Figure 10. This mature content filter shows that David wants to view all available content types.

Radhika and David each come across a video containing drugs and nudity.

Now that we have David and Radhika’s viewing preference values, we can use the & bitwise operator to determine whether the video’s content rating meets their viewing preference criteria. If CLIP RATING & USER PREFERENCE results in the video rating (see Figure 11), the user should be able to view the video.

Figure 11. The acceptance criteria for a user to be able to watch a video is if `CLIP RATING & USER PREFERENCE` results in `CLIP RATING`.

Let’s do the math! See Figure 12.

Figure 12. For Radhika, `CLIP RATING & USER PREFERENCE` yields 18, but for David it yields 82. The video’s rating is 82, which is why David passes the acceptance criteria while Radhika does not.

The numbers don’t lie! Looks like Radhika’s viewing preferences don’t let her view the video, while David’s do!

The beauty

Let’s take a second to soak in the beauty of this data structure. We are able to represent the user’s entire content rating preference in one single value instead of having to keep track of nine different rating variables. We not only save space by grouping multiple booleans into one integer value, but we are rapidly and efficiently able to determine whether a video meets the user’s viewing preferences using bitwise operations. This enables us to skip iterating through the videos’ ratings and the user’s preferences and comparing each value to another. Think of all the iterations and comparisons we get to avoid by using bit fields and bitwise operations instead of iterating over a hashmap. What’s notable is how large of a scale these operations are occurring on — imagine the millions of users who are searching and browsing through videos on Vimeo, constantly loading large quantities of videos. Every video loaded requires a content rating calculation to determine whether it should be displayed to each user. Bit fields let us do this at an incredible speed.

Thus, I was pleasantly surprised to learn that bitwise operators and bit fields, a concept my college professors barely grazed over during perhaps a single lecture, had an extremely applicable use case on something I was working on. Bit fields offer an optimal solution to a data structure that would otherwise consist of sometimes dozens of boolean values. The subject of content ratings is a beast of its own, but at least it’s a performant beast that keeps users safe!

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-15 07:02
浙ICP备14020137号-1 $访客地图$