话题公司 › pinterest

公司:pinterest

Pinterest(中文译名:缤趣),是一个网络与手机的应用程序,可以让用户利用其平台作为个人创意及项目工作所需的视觉探索工具,同时也有人把它视为一个图片分享类的社交网站,用户可以按主题分类添加和管理自己的图片收藏,并与好友分享。其使用的网站布局为瀑布流(Pinterest-style layout)。

Pinterest由美国加州帕罗奥图的一个名为Cold Brew Labs的团队营运,创办人为Ben Silbermann、 Paul Sciarra 及 Evan Sharp。2010年正式上线。“Pinterest”是由“Pin”及“interest”两个字组成,在社交网站中的访问量仅次于Facebook、Youtube、VKontakte以及Twitter。

Pre-Submit UI Tests at Pinterest

In our efforts to shift left (in which testing is performed earlier, or moved left on the project timeline), this blog covers how we began running a large end-to-end UI test suite before every commit to our Android and iOS repositories. This project involved careful coordination of UI testing, test infrastructure, and developer productivity.

Pinterest Druid Holiday Load Testing

Like many companies, Pinterest sees an increase in traffic in the last three months of the year. We need to make sure our systems are ready for this increase in traffic so we don’t run into any unexpected problems. This is especially important as Pinners come to Pinterest at this time for holiday planning and shopping. Therefore, we do a yearly exercise of testing our systems with additional load. During this time, we verify that our systems are able to handle the expected traffic increase. On Druid we look at several checks to verify:

  • Queries: We make sure the service is able to handle the expected increase in QPS while at the same time supporting the P99 Latency SLA our clients need.
  • Ingestion: We verify that the real-time ingestion is able to handle the increase in data.
  • Increase in Data size: We confirm that the storage system has sufficient capacity to handle the increased data volume.

In this post, we’ll provide details about how we run the holiday load test and verify Druid is able to handle the expected increases mentioned above.

How Pinterest powers a healthy comment ecosystem with machine learning

As Pinterest continues to evolve from a place to just save ideas to a platform for discovering content that inspires action, there’s been an increase in native content from creators publishing directly to Pinterest. With the creator ecosystem on Pinterest growing, we’re committed to ensuring Pinterest remains a positive and inspiring environment through initiatives like the Creator Code, a content policy that enforces the acceptance of guidelines (such as “be kind” and “check facts”) before creators can publish Idea Pins. We also have guardrails in place on Idea Pin comments including positivity reminders, tools for comment removal and keyword filtering, and spam prevention signals. On the technical side, we use cutting edge techniques in machine learning to identify and enforce against community policy-violating comments in near real-time. We also use these techniques to surface the most inspiring and highest quality comments first in order to bring a more productive experience and drive engagement.

Since machine learning solutions were introduced in March to automatically detect potentially policy-violating comments before they’re reported and take appropriate action, we’ve seen a 53% decline in comment report rates (user comment reports per 1 million comment impressions).

Here, we share how we built a scalable near-real time machine learning solution to identify policy-violating comments and rank comments by quality.

Campaign Budgets at Pinterest

Pinterest is a visual discovery engine that helps Pinners find inspirational ideas. Advertisers use Pinterest to connect with Pinners on these journeys to inspiration, and seek to promote products or services efficiently.

The Ads Intelligence team at Pinterest builds products that help advertisers maximize the value they get out of their ad campaigns. As part of that initiative, we have recently launched the Campaign Budget Optimization product for Pinterest Ads.

Campaign Budget Optimization, or CBO, is an automated ads product that benefits advertisers by distributing the advertising budget for each campaign across the underlying ad groups in an automated manner. The goal of Campaign Budget Optimization is to:

  • Maximize advertiser value, for example driving clicks or conversions, depending on the campaign
  • Improve the budget utilization of the campaign by allowing the budget to be shared across ad groups
  • Simplify the advertiser experience and eliminate the need for manual budget adjustments

MemQ: An efficient, scalable cloud native PubSub system

The Logging Platform powers all data ingestion and transportation at Pinterest. At the heart of the Pinterest Logging Platform are Distributed PubSub systems that help our customers transport / buffer data and consume asynchronously.

In this blog we introduce MemQ (pronounced mem — queue), an efficient, scalable PubSub system developed for the cloud at Pinterest that has been powering Near Real-Time data transportation use cases for us since mid-2020 and complements Kafka while being up to 90% more cost efficient.

SearchSage: Learning Search Query Representations at Pinterest

Pinterest surfaces billions of ideas to people every day, and the neural modeling of embeddings for content, users, and search queries are key in the constant improvement of these machine learning-powered recommendations. Good embeddings — representations of discrete entities as vectors of numbers — enable fast candidate generation and are strong signals to models that classify, retrieve and rank relevant content.

We began our representation learning workstream with Visual Embeddings, a convolutional neural network (CNN) based Image representation, then moved toward PinSage, a graph-based multi-modal Pin representation. We expanded into more use cases such as PinnerSage, a user representation based on clustering a user’s past Pin actions, and have since worked with even more entities including search queries, Idea Pins, shopping items and content creators.

In this blog post we focus on SearchSage, our search query representation, and detail how we built and launched SearchSage for search retrieval and ranking to increase relevance of recommendations and engagement in search across organic Pins, Product Pins, and ads. Now used for 15+ use cases, this embedding is one of the most important features in both our organic and ads relevance models, and has led to metric wins such as an 11% increase in 35s+ click-throughs on product Pins in search, and a 42% increase in related searches.

Efficient Resource Management at Pinterest’s Batch Processing Platform

Pinterest’s Batch Processing Platform, Monarch, runs most of the batch processing workflows of the company. At the scale shown in Table 1, it is important to manage the platform resources to provide quality of service (QoS) while achieving cost efficiency. This article shares how we do that and future work.

Ensuring High Availability of Ads Realtime Streaming Services

The Pinterest Ad Business has grown multi-fold in the past couple years, with respect to both advertisers and users. As we scale our revenue, it becomes imperative to:

  • Distribute advertiser spend smoothly over the course of the day
  • Avoid over-spending beyond the advertiser’s daily / lifetime budget
  • Maximize advertiser value

Pinterest Home Feed Unified Lightweight Scoring: A Two-tower Approach

Pinterest is a place where users (Pinners) can save and discover content from both web and mobile platforms, and where increasingly Creators can publish native content right to Pinterest. We hold billions of content (Pins) in our corpus and serve personalized recommendations that inspire Pinners to create a life they love. One of the key and most complicated surfaces for Pinterest is the home feed, where Pinners will see personalized feeds based on their engagement and interests. In this blog, we will discuss how we unify our light-weight scoring layer across the various candidate generators that power home feed recommendations.

Pinterest’s Analytics as a Platform on Druid (Part 3 of 3)

In this blog post series, we are going to discuss Pinterest’s Analytics as a Platform on Druid and share some learnings on using Druid. This is the third of the blog post series, and will discuss learnings on optimizing Druid for real-time use cases.

Pinterest’s Analytics as a Platform on Druid (Part 2 of 3)

In this blog post series, we’ll discuss Pinterest’s Analytics as a Platform on Druid and share some learnings on using Druid. This is the second of the blog post series, and will discuss learnings on optimizing Druid for batch use cases.

Pinterest’s Analytics as a Platform on Druid (Part 1 of 3)

In this blog post series, we’ll discuss Pinterest’s Analytics as a Platform on Druid and share some learnings on using Druid. This is the first of the blog post series with a short history on switching to Druid, system architecture with Druid, and learnings on optimizing host types for Mmap.

Improving efficiency and reducing runtime using S3 read optimization

We describe a novel approach we took to improving S3 read throughput and how we used it to improve the efficiency of our production jobs. The results have been very encouraging. A standalone benchmark showed a 12x improvement in S3 read throughput (from 21 MB/s to 269 MB/s). Increased throughput allowed our production jobs to finish sooner. As a result, we saw 22% reduction in vcore-hours, 23% reduction in memory-hours, and similar reduction in run time of a typical production job. Although we are happy with the results, we are exploring additional enhancements in the future. They are briefly described at the end of this blog.

How we scaled the size of Pinterest’s ad corpus by 60x

In May 2020, Pinterest launched a partnership with Shopify that allowed merchants to easily upload their catalogs to the Pinterest platform and create Product Pins and shopping ads. This vastly increased the number of shopping ads in our corpus available for our recommendation engine to choose from, when serving an ad on Pinterest. In order to continue to support this rapid growth, we leveraged a key-value (KV) store and some memory optimizations in Go to scale the size of our ad corpus by 60x.

Fighting Spam using Clustering and Automated Rule Creation

One of our biggest priorities at Pinterest is keeping Pinners safe, and that includes protecting them from spam. The Trust & Safety team’s goal is not only to catch spam, but to remove it as quickly as possible to minimize Pinner impact.

The goal of spammers is to make money, and the best way to do this is to spam at scale. It’s a numbers game: one million spam emails are much more effective than one spam email. In order to remove spam quickly, we look at common trends in spam attacks to identify suspect behavior.

To achieve the scale required to be effective, spammers must automate their actions, and each of these “attacks” can be thought of as a cluster. Each event within the attack cluster may share some common features, but different clusters will have a different set of common features.

For example, during an attack where a large number of Pins are created, a spammer might point all Pins to the same domain. While the domain may change between attacks, spammers are still trying to direct traffic to the same spam site.

One of our spam mitigation tactics is our rule engine, Guardian, which helps to identify common features in spam attacks.

The machine learning behind delivering relevant ads

Pinterest is where people go to plan and shop, making ideas and ads from brands helpful in taking Pinners from inspiration to action. It’s our goal to ensure ads continue to be additive and not intrusive on Pinterest. Because of the unique and powerful first party signals on the platform, advertisers can reach Pinners based on their interests, intent and engagement on the platform.

To help in delivering the right ads to the right Pinners in an audience of hundreds of millions of people, we offer advertisers features to achieve relevance including Actalike (AAL) audiences, also known in the industry as Lookalike audiences. AAL audiences help advertisers reach potentially new users via audience expansion.

In this blog, we’ll focus on the machine learning model component of relevant ads delivery and explain how we achieve high quality audience expansion through universal user embedding representations together with per-advertiser classifier models. We demonstrate the power of the proposed combined approach by showing better performance over both regression-based and similarity-based approaches.

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.131.0. UTC+08:00, 2024-09-08 08:05
浙ICP备14020137号-1 $访客地图$