公司:pinterest
Pinterest(中文译名:缤趣),是一个网络与手机的应用程序,可以让用户利用其平台作为个人创意及项目工作所需的视觉探索工具,同时也有人把它视为一个图片分享类的社交网站,用户可以按主题分类添加和管理自己的图片收藏,并与好友分享。其使用的网站布局为瀑布流(Pinterest-style layout)。
Pinterest由美国加州帕罗奥图的一个名为Cold Brew Labs的团队营运,创办人为Ben Silbermann、 Paul Sciarra 及 Evan Sharp。2010年正式上线。“Pinterest”是由“Pin”及“interest”两个字组成,在社交网站中的访问量仅次于Facebook、Youtube、VKontakte以及Twitter。
Query Rewards: Building a Recommendation Feedback Loop During Query Selection
In Homefeed, ~30% of recommended pins come from pin to pin-based retrieval. This means that during the retrieval stage, we use a batch of query pins to call our retrieval system to generate pin recommendations. We typically use a user’s previously engaged pins, and a user may have hundreds (or thousands!) of engaged pins, so a key problem for us is: how do we select the right query pins from the user’s profile?
Driving user growth with performance improvements
In early 2015 Pinterest engineers ran an experiment that improved mobile web home landing page performance by 60 percent and mobile signup conversion rate by 40 percent. However, the experiment was a hacky solution that used a lot of shortcuts like serving pre-rendered HTML pages without using any internal template rendering engines or common resources (JS, CSS). To productionize learnings from this experiment, the entire front end engine, all page templates and common elements had to be rewritten. It was a huge effort, and to achieve it, we needed to start from building robust metrics to track our progress for all parts of the serving system. In this post, we’ll cover how we improved performance on Pinterest pages, and how it led to the biggest increase in user acquisition of 2016.
Online Data Migration from HBase to TiDB with Zero Downtime
At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. cannot serve the needs of our clients for the next 3–5 years. This is due to high operational cost, excessive complexity, and missing functionalities like secondary indexes, support for transactions, etc.
After evaluating 10+ different storage backends and benchmarking three shortlisted backends with shadow traffic (asynchronously copying production traffic to non production environment) and in-depth performance evaluation, we have decided to use TiDB as the final candidate for Unified Storage Service.
The adoption of Unified Storage Service powered by TiDB is a major challenging project spanning over multiple quarters. It involves data migration from HBase to TiDB, design and implementation of Unified Storage Service, API migration from Ixia/Zen/UMS to Unified Storage Service, and Offline Jobs migration from HBase/Hadoop ecosystem to TiSpark ecosystem while maintaining our availability and latency SLA.
In this blog post, we will first learn the various approaches considered for data migration with their trade offs. We will then do a deep dive on how the data migration was conducted from HBase to TiDB for one of the first use cases having 4 TB table size serving 14k read qps and 400 write qps with zero downtime. Lastly we will learn how the verification was done to achieve 99.999% data consistency and how the data consistency was measured between the two tables.
GPU-accelerated ML Inference at Pinterest
Unlocking 16% Homefeed Engagement by Serving 100x Bigger Recommender Models.
Estimating Potential Audience Size of an Ad at Pinterest
Understanding the size of the potential audience of an ad is an important consideration for an advertiser. It enables advertisers to estimate the total population who might be interested in the products or services they advertise and plan their budgets ahead of time. The Ads Intelligence team at Pinterest provides a service called Potential Audience Size in the Ads Manager, so the advertisers can understand their target audience size while they configure their ad groups. The service updates the estimate in real time as the audience targeting is updated.
Improving Distributed Caching Performance and Efficiency at Pinterest
Pinterest’s distributed caching system, built on top of open source technologies memcached and mcrouter, is a critical component of the production infrastructure stack. Pinterest’s cache-as-a-service platform is responsible for driving down application latency across the board, reducing the overall cloud cost footprint, and ensuring adherence to strict sitewide availability targets.
Manas HNSW Streaming Filters
Embedding-based retrieval is a core center piece of our recommendations engine at Pinterest. We support a myriad of use cases, from retrieval based on content similarity to learned retrieval. It’s powered by our in-house search engine — Manas — which provides Approximate Nearest Neighbor (ANN) search as a service, primarily using Hierarchical Navigable Small World graphs (HNSW).
While traditional token-based search retrieves documents on term matching on a tree of terms with logical connectives like ANDs and ORs, ANN search retrieves based on embedding similarity. Oftentimes we’d like to do a hybrid search query that combines the two. For example, “find similar products to this pair of shoes that are less than $100, rated 4 stars or more, and ship to the UK.” This is a common problem, and it’s not entirely unsolved, but the solutions each have their own caveats and trade-offs.
Unified PubSub Client at Pinterest
At Pinterest, the Logging Platform team manages the PubSub layer and provides support for clients that interact with it.
Addressing Python Dependency Confusion at Pinterest
One major issue that put us at risk of dependency confusion was using multiple index endpoints for our Python “pip” config, using the configuration flag  — extra-index-url. Pinterest Python artifacts were partially stored on our own custom repository, open-sourced as Pinrepo, and some of our Python packages were stored in JFrog’s Artifactory.
There is a major danger in the usage of the  — extra-index-url flag: it will not honor any sort of priority ordering. This has been extensively discussed on Github and Stack Overflow. The short summary is that the volunteer team that manages the pip open-source project does not consider repository index prioritization within the scope of the pip tool. They instead recommend using a single server endpoint that manages priorities on the backend.
Debugging Deadlock in PininfoService Ubuntu18 Upgrade: Part 2 of 2
Solving Engineering Problems as Doing Research.
Spinner: Pinterest’s Workflow Platform
Since its inception, Pinterest’s philosophy has always been centered around data. As a data driven company, that means all data ingested is stored for further use. This looks like 600 terabytes of new data every day, encompassing over 500 petabytes of total data. At this scale, big data tooling plays a critical role in enabling our company to gather meaningful insights. This is where the workflow team comes in. We help facilitate over 4000 workflows, which produce 10,000 daily flow executions and 38,000 daily job executions on average.
Spinner: The Mass Migration to Pinterest’s New Workflow Platform
In our last blog post, we discussed how we made the decision and took the actions to move from our legacy system, Pinball, to our new system, Spinner, which is built on top of the Apache Airflow project. As a reminder, this is based off of a custom branch that branched off of Airflow version 1.10-stable with some features cherry picked from the master branch.
In this post, we will explain how we approached and designed the migration, identified requirements, and coordinated with all our engineer teams to seamlessly migrate 3000+ workflows to Airflow. We will deep dive into trade offs made, but before we do that, we want to give our learnings.
3 Innovations While Unifying Pinterest’s Key-Value Storage
Engineers hate migrations. What do engineers hate more than migrations? Data migrations. Especially critical, terabyte-scale, online serving migrations which, if done badly, could bring down the site, enrage customers, or cripple hundreds of critical internal services.
So why did the Key-Value Systems Team at Pinterest embark on a two-year realtime migration of all our online key-value serving data to a single unified storage system? Because the cost of not migrating was too high. In 2019, Pinterest had four separate key-value systems owned by different teams with different APIs and featuresets. This resulted in duplicated development effort, high operational overhead and incident counts, and confusion among engineering customers.
In unifying all of Pinterest’s 500+ key-value use cases (over 4PB of unique data serving 100Ms of QPS) onto one single interface, not only did we make huge gains in reducing system complexity and lowering operational overhead, we achieved a 40–90% performance improvement by moving to the most efficient storage engine, and we saved the company a significant amount in costs per year by moving to the most optimal replication and versioning architecture.
In this blog post, we selected three (out of many more) innovations to dive into that helped us notch all these wins.
PinPoint: A Neural Inductive Attribute Extractor for Web Pages
Despite the explosive growth of the internet over the past couple of decades, much of the digitized knowledge has been curated for human understanding and has stayed unfriendly for machine comprehension. Even promising efforts towards creating semantic web like the Resource Description Framework in Attributes (RDFA), Ontology Web Language (OWL), JSON-LD, and Open Graph Protocol are in infancy and fall short for commercial applications due to data sparsity and high variance in data quality across websites. Hence Web Information Extraction (WIE), colloquially known as scraping, is the dominant knowledge acquisition strategy for several organizations in advertising, commerce, search engines, travel, etc. For our purposes, Pinterest uses this approach to bring high-level information (like price and product description) from saved websites to the Pin-level, to help provide Pinners with more information, along with a link back to the original website for more details, and to ultimately take action.
Debugging Deadlock in PininfoService Ubuntu18 Upgrade: Part 1 of 2
Reading both parts of this series will give you insight into some debugging techniques we are using in the Pinterest Engineering Key Value Systems team (a team derived from the previous Serving System). Related projects owned by this team can be seen in blogs and presentations on Terrapin, Rocksplicator (1 and 2), Aperture and Realpin.
Experiment without the wait: Speeding up the iteration cycle with Offline Replay Experimentation
Ideas fuel innovation. Innovation drives our product toward our mission of bringing everyone the inspiration to create a life they love. The speed of innovation is determined by how quickly we can get a signal or feedback on the promise of an idea so we can learn whether to pursue or pivot. Online experimentation is often used to evaluate product ideas, but it is costly and time-consuming. Could we predict experiment outcomes without even running an experiment? Could it be done in hours instead of weeks? Could we rapidly pick only the best ideas to run an online experiment? This post will describe how Pinterest uses offline replay experimentation to predict experiment results in advance.