话题公司 › pinterest

公司:pinterest

Pinterest(中文译名:缤趣),是一个网络与手机的应用程序,可以让用户利用其平台作为个人创意及项目工作所需的视觉探索工具,同时也有人把它视为一个图片分享类的社交网站,用户可以按主题分类添加和管理自己的图片收藏,并与好友分享。其使用的网站布局为瀑布流(Pinterest-style layout)。

Pinterest由美国加州帕罗奥图的一个名为Cold Brew Labs的团队营运,创办人为Ben Silbermann、 Paul Sciarra 及 Evan Sharp。2010年正式上线。“Pinterest”是由“Pin”及“interest”两个字组成,在社交网站中的访问量仅次于Facebook、Youtube、VKontakte以及Twitter。

Flexible Daily Budgeting at Pinterest

The Ads Intelligence team at Pinterest builds products that help advertisers maximize the value they get out of their ad campaigns. As part of that initiative, we have recently launched Flexible Daily Budgets (FDB) to US advertisers in open beta.

How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume

In this blog post, we will demonstrate how we improved Pinterest Homefeed engagement volume from a machine learning model design perspective — by leveraging realtime user action features in Homefeed recommender system.

Watch your Manifest

It’s a well-known fact for Android developers that an app’s manifest (AndroidManifest.xml) holds crucial application declarations. It is rarely monitored after being set up because we assume it hardly ever changes. At Pinterest, however, we have been actively monitoring the manifest after realizing it does change every so often.

While building an app, Gradle downloads all the dependent libraries to compile and link them with the app. These dependent libraries each have their own mini manifest. During the build process, Android Gradle Plugin (AGP) merges them with the app’s main manifest to form the final manifest. Because of this merging process, the final manifest often looks quite different from the original one and contains additional declarations. In most cases, these extra declarations are necessary for dependent libraries to function. However, sometimes they can have unintended behaviors.

Query Rewards: Building a Recommendation Feedback Loop During Query Selection

In Homefeed, ~30% of recommended pins come from pin to pin-based retrieval. This means that during the retrieval stage, we use a batch of query pins to call our retrieval system to generate pin recommendations. We typically use a user’s previously engaged pins, and a user may have hundreds (or thousands!) of engaged pins, so a key problem for us is: how do we select the right query pins from the user’s profile?

Driving user growth with performance improvements

In early 2015 Pinterest engineers ran an experiment that improved mobile web home landing page performance by 60 percent and mobile signup conversion rate by 40 percent. However, the experiment was a hacky solution that used a lot of shortcuts like serving pre-rendered HTML pages without using any internal template rendering engines or common resources (JS, CSS). To productionize learnings from this experiment, the entire front end engine, all page templates and common elements had to be rewritten. It was a huge effort, and to achieve it, we needed to start from building robust metrics to track our progress for all parts of the serving system. In this post, we’ll cover how we improved performance on Pinterest pages, and how it led to the biggest increase in user acquisition of 2016.

Online Data Migration from HBase to TiDB with Zero Downtime

At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. cannot serve the needs of our clients for the next 3–5 years. This is due to high operational cost, excessive complexity, and missing functionalities like secondary indexes, support for transactions, etc.

After evaluating 10+ different storage backends and benchmarking three shortlisted backends with shadow traffic (asynchronously copying production traffic to non production environment) and in-depth performance evaluation, we have decided to use TiDB as the final candidate for Unified Storage Service.

The adoption of Unified Storage Service powered by TiDB is a major challenging project spanning over multiple quarters. It involves data migration from HBase to TiDB, design and implementation of Unified Storage Service, API migration from Ixia/Zen/UMS to Unified Storage Service, and Offline Jobs migration from HBase/Hadoop ecosystem to TiSpark ecosystem while maintaining our availability and latency SLA.

In this blog post, we will first learn the various approaches considered for data migration with their trade offs. We will then do a deep dive on how the data migration was conducted from HBase to TiDB for one of the first use cases having 4 TB table size serving 14k read qps and 400 write qps with zero downtime. Lastly we will learn how the verification was done to achieve 99.999% data consistency and how the data consistency was measured between the two tables.

GPU-accelerated ML Inference at Pinterest

Unlocking 16% Homefeed Engagement by Serving 100x Bigger Recommender Models.

Estimating Potential Audience Size of an Ad at Pinterest

Understanding the size of the potential audience of an ad is an important consideration for an advertiser. It enables advertisers to estimate the total population who might be interested in the products or services they advertise and plan their budgets ahead of time. The Ads Intelligence team at Pinterest provides a service called Potential Audience Size in the Ads Manager, so the advertisers can understand their target audience size while they configure their ad groups. The service updates the estimate in real time as the audience targeting is updated.

Improving Distributed Caching Performance and Efficiency at Pinterest

Pinterest’s distributed caching system, built on top of open source technologies memcached and mcrouter, is a critical component of the production infrastructure stack. Pinterest’s cache-as-a-service platform is responsible for driving down application latency across the board, reducing the overall cloud cost footprint, and ensuring adherence to strict sitewide availability targets.

Manas HNSW Streaming Filters

Embedding-based retrieval is a core center piece of our recommendations engine at Pinterest. We support a myriad of use cases, from retrieval based on content similarity to learned retrieval. It’s powered by our in-house search engine — Manas — which provides Approximate Nearest Neighbor (ANN) search as a service, primarily using Hierarchical Navigable Small World graphs (HNSW).

While traditional token-based search retrieves documents on term matching on a tree of terms with logical connectives like ANDs and ORs, ANN search retrieves based on embedding similarity. Oftentimes we’d like to do a hybrid search query that combines the two. For example, “find similar products to this pair of shoes that are less than $100, rated 4 stars or more, and ship to the UK.” This is a common problem, and it’s not entirely unsolved, but the solutions each have their own caveats and trade-offs.

Unified PubSub Client at Pinterest

At Pinterest, the Logging Platform team manages the PubSub layer and provides support for clients that interact with it.

Addressing Python Dependency Confusion at Pinterest

One major issue that put us at risk of dependency confusion was using multiple index endpoints for our Python “pip” config, using the configuration flag — extra-index-url. Pinterest Python artifacts were partially stored on our own custom repository, open-sourced as Pinrepo, and some of our Python packages were stored in JFrog’s Artifactory.

There is a major danger in the usage of the — extra-index-url flag: it will not honor any sort of priority ordering. This has been extensively discussed on Github and Stack Overflow. The short summary is that the volunteer team that manages the pip open-source project does not consider repository index prioritization within the scope of the pip tool. They instead recommend using a single server endpoint that manages priorities on the backend.

Debugging Deadlock in PininfoService Ubuntu18 Upgrade: Part 2 of 2

Solving Engineering Problems as Doing Research.

Spinner: Pinterest’s Workflow Platform

Since its inception, Pinterest’s philosophy has always been centered around data. As a data driven company, that means all data ingested is stored for further use. This looks like 600 terabytes of new data every day, encompassing over 500 petabytes of total data. At this scale, big data tooling plays a critical role in enabling our company to gather meaningful insights. This is where the workflow team comes in. We help facilitate over 4000 workflows, which produce 10,000 daily flow executions and 38,000 daily job executions on average.

Spinner: The Mass Migration to Pinterest’s New Workflow Platform

In our last blog post, we discussed how we made the decision and took the actions to move from our legacy system, Pinball, to our new system, Spinner, which is built on top of the Apache Airflow project. As a reminder, this is based off of a custom branch that branched off of Airflow version 1.10-stable with some features cherry picked from the master branch.

In this post, we will explain how we approached and designed the migration, identified requirements, and coordinated with all our engineer teams to seamlessly migrate 3000+ workflows to Airflow. We will deep dive into trade offs made, but before we do that, we want to give our learnings.

3 Innovations While Unifying Pinterest’s Key-Value Storage

Engineers hate migrations. What do engineers hate more than migrations? Data migrations. Especially critical, terabyte-scale, online serving migrations which, if done badly, could bring down the site, enrage customers, or cripple hundreds of critical internal services.

So why did the Key-Value Systems Team at Pinterest embark on a two-year realtime migration of all our online key-value serving data to a single unified storage system? Because the cost of not migrating was too high. In 2019, Pinterest had four separate key-value systems owned by different teams with different APIs and featuresets. This resulted in duplicated development effort, high operational overhead and incident counts, and confusion among engineering customers.

In unifying all of Pinterest’s 500+ key-value use cases (over 4PB of unique data serving 100Ms of QPS) onto one single interface, not only did we make huge gains in reducing system complexity and lowering operational overhead, we achieved a 40–90% performance improvement by moving to the most efficient storage engine, and we saved the company a significant amount in costs per year by moving to the most optimal replication and versioning architecture.

In this blog post, we selected three (out of many more) innovations to dive into that helped us notch all these wins.

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.123.4. UTC+08:00, 2024-04-20 17:00
浙ICP备14020137号-1 $访客地图$