话题公司 › pinterest

公司:pinterest

Pinterest(中文译名:缤趣),是一个网络与手机的应用程序,可以让用户利用其平台作为个人创意及项目工作所需的视觉探索工具,同时也有人把它视为一个图片分享类的社交网站,用户可以按主题分类添加和管理自己的图片收藏,并与好友分享。其使用的网站布局为瀑布流(Pinterest-style layout)。

Pinterest由美国加州帕罗奥图的一个名为Cold Brew Labs的团队营运,创办人为Ben Silbermann、 Paul Sciarra 及 Evan Sharp。2010年正式上线。“Pinterest”是由“Pin”及“interest”两个字组成,在社交网站中的访问量仅次于Facebook、Youtube、VKontakte以及Twitter。

Manas HNSW Streaming Filters

Embedding-based retrieval is a core center piece of our recommendations engine at Pinterest. We support a myriad of use cases, from retrieval based on content similarity to learned retrieval. It’s powered by our in-house search engine — Manas — which provides Approximate Nearest Neighbor (ANN) search as a service, primarily using Hierarchical Navigable Small World graphs (HNSW).

While traditional token-based search retrieves documents on term matching on a tree of terms with logical connectives like ANDs and ORs, ANN search retrieves based on embedding similarity. Oftentimes we’d like to do a hybrid search query that combines the two. For example, “find similar products to this pair of shoes that are less than $100, rated 4 stars or more, and ship to the UK.” This is a common problem, and it’s not entirely unsolved, but the solutions each have their own caveats and trade-offs.

Unified PubSub Client at Pinterest

At Pinterest, the Logging Platform team manages the PubSub layer and provides support for clients that interact with it.

Addressing Python Dependency Confusion at Pinterest

One major issue that put us at risk of dependency confusion was using multiple index endpoints for our Python “pip” config, using the configuration flag — extra-index-url. Pinterest Python artifacts were partially stored on our own custom repository, open-sourced as Pinrepo, and some of our Python packages were stored in JFrog’s Artifactory.

There is a major danger in the usage of the — extra-index-url flag: it will not honor any sort of priority ordering. This has been extensively discussed on Github and Stack Overflow. The short summary is that the volunteer team that manages the pip open-source project does not consider repository index prioritization within the scope of the pip tool. They instead recommend using a single server endpoint that manages priorities on the backend.

Debugging Deadlock in PininfoService Ubuntu18 Upgrade: Part 2 of 2

Solving Engineering Problems as Doing Research.

Spinner: Pinterest’s Workflow Platform

Since its inception, Pinterest’s philosophy has always been centered around data. As a data driven company, that means all data ingested is stored for further use. This looks like 600 terabytes of new data every day, encompassing over 500 petabytes of total data. At this scale, big data tooling plays a critical role in enabling our company to gather meaningful insights. This is where the workflow team comes in. We help facilitate over 4000 workflows, which produce 10,000 daily flow executions and 38,000 daily job executions on average.

Spinner: The Mass Migration to Pinterest’s New Workflow Platform

In our last blog post, we discussed how we made the decision and took the actions to move from our legacy system, Pinball, to our new system, Spinner, which is built on top of the Apache Airflow project. As a reminder, this is based off of a custom branch that branched off of Airflow version 1.10-stable with some features cherry picked from the master branch.

In this post, we will explain how we approached and designed the migration, identified requirements, and coordinated with all our engineer teams to seamlessly migrate 3000+ workflows to Airflow. We will deep dive into trade offs made, but before we do that, we want to give our learnings.

3 Innovations While Unifying Pinterest’s Key-Value Storage

Engineers hate migrations. What do engineers hate more than migrations? Data migrations. Especially critical, terabyte-scale, online serving migrations which, if done badly, could bring down the site, enrage customers, or cripple hundreds of critical internal services.

So why did the Key-Value Systems Team at Pinterest embark on a two-year realtime migration of all our online key-value serving data to a single unified storage system? Because the cost of not migrating was too high. In 2019, Pinterest had four separate key-value systems owned by different teams with different APIs and featuresets. This resulted in duplicated development effort, high operational overhead and incident counts, and confusion among engineering customers.

In unifying all of Pinterest’s 500+ key-value use cases (over 4PB of unique data serving 100Ms of QPS) onto one single interface, not only did we make huge gains in reducing system complexity and lowering operational overhead, we achieved a 40–90% performance improvement by moving to the most efficient storage engine, and we saved the company a significant amount in costs per year by moving to the most optimal replication and versioning architecture.

In this blog post, we selected three (out of many more) innovations to dive into that helped us notch all these wins.

PinPoint: A Neural Inductive Attribute Extractor for Web Pages

Despite the explosive growth of the internet over the past couple of decades, much of the digitized knowledge has been curated for human understanding and has stayed unfriendly for machine comprehension. Even promising efforts towards creating semantic web like the Resource Description Framework in Attributes (RDFA), Ontology Web Language (OWL), JSON-LD, and Open Graph Protocol are in infancy and fall short for commercial applications due to data sparsity and high variance in data quality across websites. Hence Web Information Extraction (WIE), colloquially known as scraping, is the dominant knowledge acquisition strategy for several organizations in advertising, commerce, search engines, travel, etc. For our purposes, Pinterest uses this approach to bring high-level information (like price and product description) from saved websites to the Pin-level, to help provide Pinners with more information, along with a link back to the original website for more details, and to ultimately take action.

Debugging Deadlock in PininfoService Ubuntu18 Upgrade: Part 1 of 2

Reading both parts of this series will give you insight into some debugging techniques we are using in the Pinterest Engineering Key Value Systems team (a team derived from the previous Serving System). Related projects owned by this team can be seen in blogs and presentations on Terrapin, Rocksplicator (1 and 2), Aperture and Realpin.

Experiment without the wait: Speeding up the iteration cycle with Offline Replay Experimentation

Ideas fuel innovation. Innovation drives our product toward our mission of bringing everyone the inspiration to create a life they love. The speed of innovation is determined by how quickly we can get a signal or feedback on the promise of an idea so we can learn whether to pursue or pivot. Online experimentation is often used to evaluate product ideas, but it is costly and time-consuming. Could we predict experiment outcomes without even running an experiment? Could it be done in hours instead of weeks? Could we rapidly pick only the best ideas to run an online experiment? This post will describe how Pinterest uses offline replay experimentation to predict experiment results in advance.

Pre-Submit UI Tests at Pinterest

In our efforts to shift left (in which testing is performed earlier, or moved left on the project timeline), this blog covers how we began running a large end-to-end UI test suite before every commit to our Android and iOS repositories. This project involved careful coordination of UI testing, test infrastructure, and developer productivity.

Pinterest Druid Holiday Load Testing

Like many companies, Pinterest sees an increase in traffic in the last three months of the year. We need to make sure our systems are ready for this increase in traffic so we don’t run into any unexpected problems. This is especially important as Pinners come to Pinterest at this time for holiday planning and shopping. Therefore, we do a yearly exercise of testing our systems with additional load. During this time, we verify that our systems are able to handle the expected traffic increase. On Druid we look at several checks to verify:

  • Queries: We make sure the service is able to handle the expected increase in QPS while at the same time supporting the P99 Latency SLA our clients need.
  • Ingestion: We verify that the real-time ingestion is able to handle the increase in data.
  • Increase in Data size: We confirm that the storage system has sufficient capacity to handle the increased data volume.

In this post, we’ll provide details about how we run the holiday load test and verify Druid is able to handle the expected increases mentioned above.

How Pinterest powers a healthy comment ecosystem with machine learning

As Pinterest continues to evolve from a place to just save ideas to a platform for discovering content that inspires action, there’s been an increase in native content from creators publishing directly to Pinterest. With the creator ecosystem on Pinterest growing, we’re committed to ensuring Pinterest remains a positive and inspiring environment through initiatives like the Creator Code, a content policy that enforces the acceptance of guidelines (such as “be kind” and “check facts”) before creators can publish Idea Pins. We also have guardrails in place on Idea Pin comments including positivity reminders, tools for comment removal and keyword filtering, and spam prevention signals. On the technical side, we use cutting edge techniques in machine learning to identify and enforce against community policy-violating comments in near real-time. We also use these techniques to surface the most inspiring and highest quality comments first in order to bring a more productive experience and drive engagement.

Since machine learning solutions were introduced in March to automatically detect potentially policy-violating comments before they’re reported and take appropriate action, we’ve seen a 53% decline in comment report rates (user comment reports per 1 million comment impressions).

Here, we share how we built a scalable near-real time machine learning solution to identify policy-violating comments and rank comments by quality.

Campaign Budgets at Pinterest

Pinterest is a visual discovery engine that helps Pinners find inspirational ideas. Advertisers use Pinterest to connect with Pinners on these journeys to inspiration, and seek to promote products or services efficiently.

The Ads Intelligence team at Pinterest builds products that help advertisers maximize the value they get out of their ad campaigns. As part of that initiative, we have recently launched the Campaign Budget Optimization product for Pinterest Ads.

Campaign Budget Optimization, or CBO, is an automated ads product that benefits advertisers by distributing the advertising budget for each campaign across the underlying ad groups in an automated manner. The goal of Campaign Budget Optimization is to:

  • Maximize advertiser value, for example driving clicks or conversions, depending on the campaign
  • Improve the budget utilization of the campaign by allowing the budget to be shared across ad groups
  • Simplify the advertiser experience and eliminate the need for manual budget adjustments

MemQ: An efficient, scalable cloud native PubSub system

The Logging Platform powers all data ingestion and transportation at Pinterest. At the heart of the Pinterest Logging Platform are Distributed PubSub systems that help our customers transport / buffer data and consume asynchronously.

In this blog we introduce MemQ (pronounced mem — queue), an efficient, scalable PubSub system developed for the cloud at Pinterest that has been powering Near Real-Time data transportation use cases for us since mid-2020 and complements Kafka while being up to 90% more cost efficient.

SearchSage: Learning Search Query Representations at Pinterest

Pinterest surfaces billions of ideas to people every day, and the neural modeling of embeddings for content, users, and search queries are key in the constant improvement of these machine learning-powered recommendations. Good embeddings — representations of discrete entities as vectors of numbers — enable fast candidate generation and are strong signals to models that classify, retrieve and rank relevant content.

We began our representation learning workstream with Visual Embeddings, a convolutional neural network (CNN) based Image representation, then moved toward PinSage, a graph-based multi-modal Pin representation. We expanded into more use cases such as PinnerSage, a user representation based on clustering a user’s past Pin actions, and have since worked with even more entities including search queries, Idea Pins, shopping items and content creators.

In this blog post we focus on SearchSage, our search query representation, and detail how we built and launched SearchSage for search retrieval and ranking to increase relevance of recommendations and engagement in search across organic Pins, Product Pins, and ads. Now used for 15+ use cases, this embedding is one of the most important features in both our organic and ads relevance models, and has led to metric wins such as an 11% increase in 35s+ click-throughs on product Pins in search, and a 42% increase in related searches.

Главная - Вики-сайт
Copyright © 2011-2024 iteam. Current version is 2.129.0. UTC+08:00, 2024-07-01 10:37
浙ICP备14020137号-1 $Гость$