Pinterest由美国加州帕罗奥图的一个名为Cold Brew Labs的团队营运，创办人为Ben Silbermann、 Paul Sciarra 及 Evan Sharp。2010年正式上线。“Pinterest”是由“Pin”及“interest”两个字组成，在社交网站中的访问量仅次于Facebook、Youtube、VKontakte以及Twitter。
As part of our device authentication and compliance initiative, Pinterest has implemented employee-facing mutual TLS with a custom identity provider in a way that results in a positive user experience.
You may have heard of, or experienced first hand, some unpleasant behavior while attempting to authenticate with a certificate within a browser or application. Even the Wikipedia page for mutual TLS mentions that mTLS is a “..less user-friendly experience, [and] it’s rarely used in end-user applications…”.
At Pinterest, we needed to use Mutual TLS as part of our employee SSO authentication, using a custom identity provider. This means that we needed to support authentication across all major platforms, as well as from within browsers and native applications.
In this blog post, we’ll talk about some of the changes that we’ve made to ensure that user-facing mTLS is a seamless experience for our employees.
In early 2020, during a critical iOS out of memory incident (we have a blogpost for that), we realized that we didn’t have much visibility of how the app is running or a good system to look up for monitoring and troubleshooting.
Pinterest Android App offers a rare experience with a mix of images and videos on a two-column grid. In order to maintain a performant video experience on Android devices, we focused on:
- Warming up
- Pooling players
The Ads Intelligence team at Pinterest builds products that help advertisers maximize the value they get out of their ad campaigns. As part of that initiative, we have recently launched Flexible Daily Budgets (FDB) to US advertisers in open beta.
In this blog post, we will demonstrate how we improved Pinterest Homefeed engagement volume from a machine learning model design perspective — by leveraging realtime user action features in Homefeed recommender system.
It’s a well-known fact for Android developers that an app’s manifest (AndroidManifest.xml) holds crucial application declarations. It is rarely monitored after being set up because we assume it hardly ever changes. At Pinterest, however, we have been actively monitoring the manifest after realizing it does change every so often.
While building an app, Gradle downloads all the dependent libraries to compile and link them with the app. These dependent libraries each have their own mini manifest. During the build process, Android Gradle Plugin (AGP) merges them with the app’s main manifest to form the final manifest. Because of this merging process, the final manifest often looks quite different from the original one and contains additional declarations. In most cases, these extra declarations are necessary for dependent libraries to function. However, sometimes they can have unintended behaviors.
In Homefeed, ~30% of recommended pins come from pin to pin-based retrieval. This means that during the retrieval stage, we use a batch of query pins to call our retrieval system to generate pin recommendations. We typically use a user’s previously engaged pins, and a user may have hundreds (or thousands!) of engaged pins, so a key problem for us is: how do we select the right query pins from the user’s profile?
In early 2015 Pinterest engineers ran an experiment that improved mobile web home landing page performance by 60 percent and mobile signup conversion rate by 40 percent. However, the experiment was a hacky solution that used a lot of shortcuts like serving pre-rendered HTML pages without using any internal template rendering engines or common resources (JS, CSS). To productionize learnings from this experiment, the entire front end engine, all page templates and common elements had to be rewritten. It was a huge effort, and to achieve it, we needed to start from building robust metrics to track our progress for all parts of the serving system. In this post, we’ll cover how we improved performance on Pinterest pages, and how it led to the biggest increase in user acquisition of 2016.
At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. cannot serve the needs of our clients for the next 3–5 years. This is due to high operational cost, excessive complexity, and missing functionalities like secondary indexes, support for transactions, etc.
After evaluating 10+ different storage backends and benchmarking three shortlisted backends with shadow traffic (asynchronously copying production traffic to non production environment) and in-depth performance evaluation, we have decided to use TiDB as the final candidate for Unified Storage Service.
The adoption of Unified Storage Service powered by TiDB is a major challenging project spanning over multiple quarters. It involves data migration from HBase to TiDB, design and implementation of Unified Storage Service, API migration from Ixia/Zen/UMS to Unified Storage Service, and Offline Jobs migration from HBase/Hadoop ecosystem to TiSpark ecosystem while maintaining our availability and latency SLA.
In this blog post, we will first learn the various approaches considered for data migration with their trade offs. We will then do a deep dive on how the data migration was conducted from HBase to TiDB for one of the first use cases having 4 TB table size serving 14k read qps and 400 write qps with zero downtime. Lastly we will learn how the verification was done to achieve 99.999% data consistency and how the data consistency was measured between the two tables.
Unlocking 16% Homefeed Engagement by Serving 100x Bigger Recommender Models.
Understanding the size of the potential audience of an ad is an important consideration for an advertiser. It enables advertisers to estimate the total population who might be interested in the products or services they advertise and plan their budgets ahead of time. The Ads Intelligence team at Pinterest provides a service called Potential Audience Size in the Ads Manager, so the advertisers can understand their target audience size while they configure their ad groups. The service updates the estimate in real time as the audience targeting is updated.
Pinterest’s distributed caching system, built on top of open source technologies memcached and mcrouter, is a critical component of the production infrastructure stack. Pinterest’s cache-as-a-service platform is responsible for driving down application latency across the board, reducing the overall cloud cost footprint, and ensuring adherence to strict sitewide availability targets.
Embedding-based retrieval is a core center piece of our recommendations engine at Pinterest. We support a myriad of use cases, from retrieval based on content similarity to learned retrieval. It’s powered by our in-house search engine — Manas — which provides Approximate Nearest Neighbor (ANN) search as a service, primarily using Hierarchical Navigable Small World graphs (HNSW).
While traditional token-based search retrieves documents on term matching on a tree of terms with logical connectives like ANDs and ORs, ANN search retrieves based on embedding similarity. Oftentimes we’d like to do a hybrid search query that combines the two. For example, “find similar products to this pair of shoes that are less than $100, rated 4 stars or more, and ship to the UK.” This is a common problem, and it’s not entirely unsolved, but the solutions each have their own caveats and trade-offs.
At Pinterest, the Logging Platform team manages the PubSub layer and provides support for clients that interact with it.
One major issue that put us at risk of dependency confusion was using multiple index endpoints for our Python “pip” config, using the configuration flag
— extra-index-url. Pinterest Python artifacts were partially stored on our own custom repository, open-sourced as Pinrepo, and some of our Python packages were stored in JFrog’s Artifactory.
There is a major danger in the usage of the
— extra-index-url flag: it will not honor any sort of priority ordering. This has been extensively discussed on Github and Stack Overflow. The short summary is that the volunteer team that manages the pip open-source project does not consider repository index prioritization within the scope of the pip tool. They instead recommend using a single server endpoint that manages priorities on the backend.
Solving Engineering Problems as Doing Research.