Faster Creator Content Distribution at Pinterest

Pinterest Engineering
Pinterest Engineering Blog
7 min readMar 17, 2021

--

Jiayi Lu | Software Engineer, Creator Distribution

At Pinterest, we are committed to building the best experience for creators to reach and inspire new audiences and for Pinners to easily discover the best ideas for them. We believe more diverse Pin formats and an efficient distribution system that builds a healthy creator content marketplace are the keys to our success.

For example, Story Pins is a new Pin format that makes it possible for creators to bring their original content to Pinterest by publishing directly to the platform. Story Pins enable content creators to compose and upload their own creative ideas to Pinterest through videos, images, and text.

As more creator content comes to Pinterest, we need to continue evolving an efficient distribution system that can best serve creators’ and Pinners’ interests. In order to achieve this goal, we needed to tackle a series of unique challenges.

First, timeliness of content distribution has become more important than ever before, to ensure fresh Pins are shown to Pinners as soon as creators have shared.

Relevance of the audience is also critical for distribution efficiency. For example, surfacing a fresh recipe Pin to those interested in cooking as well as those who might be interested as an adjacent interest can help us better understand the Pin’s performance and quality.

To address these needs, we built a new, fast, real-time content distribution system with creators in focus.

System Overview

The real-time creator content distribution system consists of three main components: indexing, retrieval, and exploration.

During the indexing stage, indexing jobs are triggered by Pin creation and engagement events. Once a Pin is created, we convert it into a document and index it into our real-time index (i.e. Manas) with real-time generated content signals.

During the retrieval stage, we find matching documents from real-time Manas index based on various user signals.

On top of indexing and retrieval, we also built an exploration framework that controls the exploration budget for each Pin based on real-time performance and quality.

Figure 1. Overview of creator content fast distribution system

Above is an overview of the real-time creator content distribution system. Now let’s take a closer look at each component in more detail.

Detailed Design

In this section we will discuss more design details of our distribution system, we will focus on three most important components: real-time indexing, candidate retrieval, and exploration & budget control.

Real-time Indexing: Enabling Fast Distribution

The first problem we needed to solve was to index fresh creator Pins so they could be served within a few seconds after creation. As an index consisting of primarily fresh content, we faced the cold start problem with less engagement signals as compared to evergreen content. Therefore, we have to rely more on real-time content signals (such as text annotations, visual embeddings, etc.) for targeting. Moreover, as we improve distribution, we plan on experimenting different retrieval strategies with minimal changes to the underlying infrastructure. That means a unified indexing and serving framework that supports indexing/retrieval by different signals and configurations is strongly preferred. With that, we implemented a new real-time indexing infrastructure by leveraging our in-house search indexing platform, below is an illustration of the system architecture:

Figure 2. How real-time indexing works for fresh creator pins

The real-time indexing starts by listening to Pin event kafka topics. Each time a Pin gets created, a Kafka message will be written to event topic and would trigger a chain of actions in order:

  1. Parse Pin data from event, augment with additional signals (e.g. Board and User information)
  2. Compute real-time content signals (e.g. annotations, interests, embeddings) with the Pin and image data;
  3. Parse signals and build a document for each Pin;
  4. Write documents to the real-time indexing kafka topic, which could then be consumed by Manas.

The same process will also be triggered on engagement events (e.g. Saves) in order to refresh the indexed signals when there is any update.

The index document comes in the form of a forward index row, in which the document ID (usually image signature, an md5 hash of the Pin image) is the key, followed by a map from signal attributes to their values (tokens). For example:

On the real-time index side, upon consuming each document from indexing the Kafka topic, a post-processing task will be started to merge the document into an inverted index. This process will involve building a set of posting lists, where each token is the key, and a list of document IDs containing the token as the value. As illustrated below:

Candidate Retrieval

Candidate retrieval is the stage where we select hundreds of Pins from the real-time creator content index. It is made of three major steps: resource fetching, query composing, and query processing. In the resource fetching stage, we fetch corresponding user signals from various services, then we use these signals (e.g. interests) in the query composer and compose Manas search queries using the tokenized user signal terms. Finally, we send the Manas query to the Manas index and parse the response to get the Pin IDs that need to be returned for downstream ranking/blending.

Figure 3. How Creator Pin Retrieval Works

For fresh creator Pin distribution, candidate retrieval can be considered as a search problem where we compose a query for each user with different token types and match tokens between users and documents (i.e. Pins). We use various candidate retrieval strategies such as matching users’ interests, using our embedding search, and finding Pins based on users’ recently engaged creators.

Figure 4. Candidate retrieval strategies for creator content retrieval

For example, a user who’s interested in travel content visits Pinterest. We fetch this user’s interests from user signal service (e.g. we get “travel photos” and “hiking trails”) and send the interest tokens to Manas. In this case, Manas will return the documents from the posting lists of “interests:travel photos” and “interests:hiking trails”.

Similarly, we can also use pinners’ engaged creators to compose manas search query, and retrieve pins from these creators, or using our pinner embeddings to retrieve candidates by looking for the nearest pin neighbors in pin embedding space.

Exploration & Budget Control

In order to efficiently distribute fresh creator content, we also built a simple but effective exploration system, which is used to allocate exploration budget for each Pin and dynamically adjust the exploration decision based on a Pin’s real-time performance and feedback. To implement such ability, each Pin document in the Manas index has flags indicating the exploration status, such as whether the Pin has been effectively explored or not. We also have real-time flows that dynamically adjust exploration budget and decisions based on each Pin’s real-time performance and feedback as the Pin gets more impressions and engagements. At retrieval time, we use the exploration flags to determine if a Pin document is sufficiently explored and retrieve them smartly.

Figure 5. Creator content exploration and budget control system

Future Works & Challenges

Creator content distribution is a priority focus area at Pinterest. It comes with a lot of unique challenges, such as how to target fresh content effectively with more accurate real-time signals and better ranking models, how to distribute creator content to help creators build and grow an audience more efficiently on Pinterest, etc. With the new real-time distribution system, we are able to reduce the end-to-end serving delay for fresh creator content from more than 24 hours to a few seconds, and distribute 10x more creator content shortly after they are published. However, this is just the beginning, and we are actively improving our distribution systems and products to help creators be successful on Pinterest. If you are interested in building the next generation of creator content distribution systems, join us!

Acknowledgements

Huge thanks to Cosmin Negruseri, Jiacheng Hong, Nan Zhang, Lingzhi Luo, Sam Liu, Ting Chen, Tamara Louie, Haibin Xie, Tim Koh, Michael Mi, Le Zhang, Stephanie DeWet, Yuliang Yin for their great contribution! The project is a joint effort across multiple teams at Pinterest. Thanks to Interest Understanding, Personalization, Content Quality, Applied Science Team and the rest of Home Team for their constant help and support! Also thanks to the Legal and Corporate Communication Teams for their support publishing this post!

--

--