公司:pinterest
Representation online matters: practical end-to-end diversification in search and recommender systems
Pinterest is a platform designed to bring everyone the inspiration to create a life they love. This is not only our company’s core mission but something that has become increasingly important in today’s interconnected world. As technology becomes increasingly integrated into the daily lives of billions of people globally, it is crucial for online platforms to reflect the diverse communities they serve. Improving representation online can facilitate content discovery for a more diverse user base by reflecting their inclusion on the platform. This, in turn, demonstrates the platform’s ability to meet their needs and preferences. In addition to improved user experience and satisfaction, this can have a positive business impact through increased engagement, retention, and trust in the platform.
In this post, we show how we improved diversification on Pinterest for three different surfaces: Search, Related Products, and New User Homefeed. Specifically, we have developed and deployed scalable diversification mechanisms that utilize a visual skin tone signal to support representation of a wide range of skin tones in recommendations, as shown in Figure 1 for fashion recommendations in the Related Products surface.
Pacer: Pinterest’s New Generation of Asynchronous Computing Platform
Pinterest的异步作业执行平台Pinlater存在可扩展性瓶颈、硬件效率低、缺乏隔离性和可用性等问题,Pacer重新设计了架构并引入了新的组件和机制。Pacer通过将任务队列划分为分区,并通过Helix和Zookeeper进行管理,解决了Pinlater存在的问题,提高了作业执行的独立性和性能,减少了锁竞争,提高了硬件利用率。Pacer的dequeue broker服务解决了锁竞争问题,并使用缓冲区提高了作业获取的效率。同时,Pacer使用了Helix管理大量的分区,并将它们分配给适当的dequeue broker,以优化资源管理。此次改进是多个团队协作的结果,包括来自Core Services、Data Org、Storage and Caching、Cloud Runtime和Notifications的贡献。
Warden: Real Time Anomaly Detection at Pinterest
Detecting anomalous events has been becoming increasingly important in recent years at Pinterest. Anomalous events, broadly defined, are rare occurrences that deviate from normal or expected behavior. Because these types of events can be found almost anywhere, opportunities and applications for anomaly detection are vast. At Pinterest, we have explored leveraging anomaly detection, specifically our Warden Anomaly Detection Platform, for several use cases (which we’ll get into in this post). With the positive results we are seeing, we are planning to continue to expand our anomaly detection work and use cases.
An ML based approach to proactive advertiser churn prevention
In this blog post, we describe a Machine Learning (ML) powered proactive churn prevention solution that was prototyped with our small & medium business (SMB) advertisers. Results from our initial experiment suggest that we can detect future churn with a high degree of predictive power and consequently empower our sales partners in mitigating churn. ML-powered proactive churn prevention can achieve better results than traditional reactive manual effort.
Large-scale User Sequences at Pinterest
Understanding and responding to user actions and preferences is critical to delivering a personalized, high quality user experience. In this blog post, we’ll discuss how multiple teams joined together to build a new large-scale, highly-flexible, and cost-efficient user signal platform service, which indexes the relevant user events in near real-time, constructs them into user sequences, and makes it super easy to use both for online service requests and for ML training & inferences.
Pinterest is now on HTTP/3
Now Pinterest operates on HTTP/3. We have enabled HTTP/3 for major Pinterest production domains on our multi-CDN edge network, and we’ve upgraded client apps’ network stack to support the new protocol. This allows us to catch up with industry trends. Most importantly, faster and more reliable networking improves Pinners’ experience and business metrics.
Enforcing Device AuthN & Compliance at Pinterest
Pinterest has enforced the use of managed and compliant devices in our Okta authentication flow, using a passwordless implementation, so that access to our tools always requires a healthy Pinterest device.
Following the phishing-based attacks against our peers in the tech industry, Pinterest decided to take a two pronged approach to defend against similar attacks. We decided to:
- Require a managed and healthy Pinterest device be used to access all Pinterest resources, even when in the possession of valid credentials
- Require FIDO2 credentials for user authentication
In this post, we’ll be focusing on how we required the use of Pinterest managed devices in our Okta authentication flow.
Employee-facing Mutual TLS
As part of our device authentication and compliance initiative, Pinterest has implemented employee-facing mutual TLS with a custom identity provider in a way that results in a positive user experience.
You may have heard of, or experienced first hand, some unpleasant behavior while attempting to authenticate with a certificate within a browser or application. Even the Wikipedia page for mutual TLS mentions that mTLS is a “..less user-friendly experience, [and] it’s rarely used in end-user applications…”.
At Pinterest, we needed to use Mutual TLS as part of our employee SSO authentication, using a custom identity provider. This means that we needed to support authentication across all major platforms, as well as from within browsers and native applications.
In this blog post, we’ll talk about some of the changes that we’ve made to ensure that user-facing mTLS is a seamless experience for our employees.
Build an end to end JSON logging system for clients apps
In early 2020, during a critical iOS out of memory incident (we have a blogpost for that), we realized that we didn’t have much visibility of how the app is running or a good system to look up for monitoring and troubleshooting.
Improving the Player on Android
Pinterest Android App offers a rare experience with a mix of images and videos on a two-column grid. In order to maintain a performant video experience on Android devices, we focused on:
- Warming up
- Configurations
- Pooling players
Flexible Daily Budgeting at Pinterest
The Ads Intelligence team at Pinterest builds products that help advertisers maximize the value they get out of their ad campaigns. As part of that initiative, we have recently launched Flexible Daily Budgets (FDB) to US advertisers in open beta.
How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume
In this blog post, we will demonstrate how we improved Pinterest Homefeed engagement volume from a machine learning model design perspective — by leveraging realtime user action features in Homefeed recommender system.
Watch your Manifest
It’s a well-known fact for Android developers that an app’s manifest (AndroidManifest.xml) holds crucial application declarations. It is rarely monitored after being set up because we assume it hardly ever changes. At Pinterest, however, we have been actively monitoring the manifest after realizing it does change every so often.
While building an app, Gradle downloads all the dependent libraries to compile and link them with the app. These dependent libraries each have their own mini manifest. During the build process, Android Gradle Plugin (AGP) merges them with the app’s main manifest to form the final manifest. Because of this merging process, the final manifest often looks quite different from the original one and contains additional declarations. In most cases, these extra declarations are necessary for dependent libraries to function. However, sometimes they can have unintended behaviors.
Query Rewards: Building a Recommendation Feedback Loop During Query Selection
In Homefeed, ~30% of recommended pins come from pin to pin-based retrieval. This means that during the retrieval stage, we use a batch of query pins to call our retrieval system to generate pin recommendations. We typically use a user’s previously engaged pins, and a user may have hundreds (or thousands!) of engaged pins, so a key problem for us is: how do we select the right query pins from the user’s profile?
Driving user growth with performance improvements
In early 2015 Pinterest engineers ran an experiment that improved mobile web home landing page performance by 60 percent and mobile signup conversion rate by 40 percent. However, the experiment was a hacky solution that used a lot of shortcuts like serving pre-rendered HTML pages without using any internal template rendering engines or common resources (JS, CSS). To productionize learnings from this experiment, the entire front end engine, all page templates and common elements had to be rewritten. It was a huge effort, and to achieve it, we needed to start from building robust metrics to track our progress for all parts of the serving system. In this post, we’ll cover how we improved performance on Pinterest pages, and how it led to the biggest increase in user acquisition of 2016.
Online Data Migration from HBase to TiDB with Zero Downtime
At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. cannot serve the needs of our clients for the next 3–5 years. This is due to high operational cost, excessive complexity, and missing functionalities like secondary indexes, support for transactions, etc.
After evaluating 10+ different storage backends and benchmarking three shortlisted backends with shadow traffic (asynchronously copying production traffic to non production environment) and in-depth performance evaluation, we have decided to use TiDB as the final candidate for Unified Storage Service.
The adoption of Unified Storage Service powered by TiDB is a major challenging project spanning over multiple quarters. It involves data migration from HBase to TiDB, design and implementation of Unified Storage Service, API migration from Ixia/Zen/UMS to Unified Storage Service, and Offline Jobs migration from HBase/Hadoop ecosystem to TiSpark ecosystem while maintaining our availability and latency SLA.
In this blog post, we will first learn the various approaches considered for data migration with their trade offs. We will then do a deep dive on how the data migration was conducted from HBase to TiDB for one of the first use cases having 4 TB table size serving 14k read qps and 400 write qps with zero downtime. Lastly we will learn how the verification was done to achieve 99.999% data consistency and how the data consistency was measured between the two tables.