Improving ABR Video Performance at Pinterest

Summary

Video content has emerged as a favored format for people to discover inspirations at Pinterest. In this blog post, we will outline recent enhancements made to the Adaptive Bitrate (ABR) video performance, as well as its positive impact on user engagement.

Terms:

ABR: An acronym for Adaptive Bitrate (ABR) Streaming protocol.
HLS: HTTP live streaming (HLS) is an ABR protocol developed by Apple and supported both live and on-demand streaming.
DASH: Dynamic Adaptive Streaming over HTTP (DASH) is another ABR protocol that works similarly to HLS.

Context

ABR Streaming is a widely adopted protocol in the industry for delivering video content. This method involves encoding the content at multiple bitrates and resolutions, resulting in several rendition versions of the same video. During playback, players enhance the user experience by selecting the best possible quality and dynamically adjusting it based on network conditions.

At Pinterest, we utilize HLS and DASH for video streaming on iOS and Android platforms, respectively. HLS is supported through iOS’s AVPlayer, while DASH is supported by Android’s ExoPlayer. Currently, HLS accounts for approximately 70% of video playback sessions on iOS apps, and DASH accounts for around 55% of video sessions on Android.

Steps to start video playbacks

Startup latency is a key metric for evaluating video performance at Pinterest. Therefore, we will first examine the steps of starting an ABR video. To initiate playback, clients must obtain both the manifest files and partial media files from the CDN through network requests. Manifest files are typically small-size files but provide essential metadata about video streams, including their resolutions, bitrates, etc. In contrast, the actual video and audio content is encoded in the media files. Downloading resources in both file types takes time and is the primary contributor to users’ perceived latency.

Our work aims to reduce latency in manifest loading (highlighted in Figure 1), thereby improving overall video startup performance. Although manifest files are small in size, downloading them adds a non-trivial amount of overhead on video startup. This is particularly noticeable with HLS, where each video contains multiple manifests, resulting in multiple network round trips. As illustrated in Figure 1, after receiving video URLs from the API endpoint, players begin the streaming process by downloading the main manifest playlist from the CDN. The content within the main manifest provides URLs for the rendition-level manifest playlists, prompting players to send subsequent requests to download them. Finally, based on network conditions, players download the appropriate media files to start playback. Acquiring manifests is a prerequisite for downloading media bytes, and reducing this latency leads to faster loading times and a better viewing experience. In the next section, we will share our solution to this problem.

Figure 1. Standard HLS Video Startup Flow (asset source1, source2)

Cutting round trip latency on manifest download

At a high level, our solution eliminates the round trip latency associated with fetching multiple manifests by embedding them in the API response. When clients request Pin metadata through endpoints, the manifest file bytes are serialized and integrated into the response payload along with other metadata. During playback, players can swiftly access the manifest information locally, enabling them to proceed directly to the final media download stage. This method allows players to bypass the process of downloading manifests, thereby reducing video startup latency.

Figure 2. Video Startup Flow with Manifest Embedding (asset source)

Now let’s dive into some of the implementation details and challenges we learned during the process.

Overcome API response overhead

One of the primary hurdles we faced was the overhead imposed on the API endpoint. To process API requests, the backend is required to retrieve video manifest files, causing a rise in the overall latency of API responses. This issue is particularly prominent for HLS, where numerous manifest playlists are needed for each video, resulting in a significant increase in latency due to multiple network calls. Although an initial attempt to parallelize network calls provided some relief, the latency regression persisted.

We successfully tackled this issue by incorporating a MemCache layer into the manifest serving process. Memcache provides significantly lower latency than network calls when the cache is hit and is effective for platforms like Pinterest, where popular content is consistently served to various clients, resulting in a high cache hit rate. Following the implementation of Memcache, API overhead was effectively managed.

We also conducted iterations on backend mechanisms for retrieving manifests, comparing fetching from CDN or fetching directly from origins: S3. We ultimately landed S3 fetching as the optimal solution. In contrast to CDN fetching, S3 fetching offers better performance and cost efficiency.

After several iterations, the final backend flow looks like this:

Figure 3. Backend Manifest Retrieval Process

Customizing player’s manifest loading process

For the players’ side, we utilize the AVAssetResourceLoaderDelegate APIs on iOS to customize the loading process of manifest files. These APIs enable applications to incorporate their own logic for managing resource loading.

Figure 4 illustrates the manifest loading process between AVPlayer and our code. Manifest files are delivered from the backend and stored on the client. When playback is requested, AVPlayer sends multiple AVAssetResourceLoadingRequests to obtain information about the manifest data. In response to these requests, we locate the serialized data from the API response and supply them accordingly. The loading requests for media files are redirected to the CDN address as usual.

Figure 4. Manifest Loading Process with AVPlayer (asset source1, source2)

ExoPlayer

Android employs a comparable approach by consolidating the loading requests at ExoPlayer. The process starts with getting the contents of the .mpd manifest file from the API. These contents are saved in the tag property of MediaItem.LocalConfiguration.

data class MediaItemTag(
val manifest: String?,)

val mediaItemTag = MediaItemTag(

manifest = apiResponse.dashManifest,)

val mediaItem = MediaItem.Builder()

.setTag(mediaItemTag) .setUri(...)

.build()

After the MediaItem is set and we are ready to call prepare, the notable customization begins when ExoPlayer creates a MediaSource. Specifically, Android implements the MediaSource.Factory interface, which specifies a createMediaSource method.

In createMediaSource, we can retrieve the DASH manifest that was stored in the tag property from the MediaItem and transform it into an ExoPlayer primitive using DashManifestParser:

override fun createMediaSource(mediaItem: MediaItem): MediaSource {
val mediaItemTag = mediaItem.localConfiguration!!.tag
val manifest = (mediaItemTag as MediaItemTag).manifest

val dashManifest = DashManifestParser()

.parse( mediaItem.localConfiguration!!.uri, manifest.byteInputStream() )

}

With the contents of the manifest available in memory, utilizing DashMediaSource.Factory’s API for side-loading a manifest is the last step required:

class PinterestMediaSourceFactory(
private val dataSourceFactory: DataSource.Factory,): MediaSource.Factory {

override fun createMediaSource(mediaItem: MediaItem): MediaSource {

val dashChunkSourceFactory = DefaultDashChunkSource .Factory(dataSourceFactory)

val dashMediaSourceFactory = DashMediaSource

.Factory( dashChunkSourceFactory,

null,

)

return dashMediaSourceFactory.createMediaSource(

dashManifest, mediaItem ) }

}

Now, instead of having to first fetch the .mpd file from the CDN, ExoPlayer can skip that request and immediately begin fetching media.

Showcase

As of this writing, both Android and iOS platforms have fully implemented the above solution. This has led to a significant improvement in video performance metrics, particularly in startup latency. Consequently, user engagement on Pinterest was boosted due to the faster viewing experience.

Future work

With the ability to manipulate the manifest loading process, clients can make local adjustments to achieve more fine-grained video quality control based on the UI surface. By removing unwanted bitrate renditions from the original manifest file before providing it to the players, we can limit the number of bitrate renditions available for the player, thereby gaining more control over playback quality. For instance, on a large UI surface such as full screen, high-quality video is more preferable. By removing the lower bitrate renditions from the original manifest, we can ensure that players only play high-quality video without preparing multiple sets of manifests at backend.

Acknowledgement

We would like to extend our sincere gratitude to Liang Ma and Sterling Li for their significant contributions to this project and exceptional technical leadership. Their expertise and dedication were instrumental in overcoming numerous challenges and driving the project to success. We are deeply appreciative of their efforts and the positive impact they have had on this initiative.

To learn more about engineering at Pinterest, check out the rest of our Engineering Blog and visit our Pinterest Labs site. To explore and apply to open roles, visit our Careers page.