公司:pinterest
Pinterest(中文译名:缤趣),是一个网络与手机的应用程序,可以让用户利用其平台作为个人创意及项目工作所需的视觉探索工具,同时也有人把它视为一个图片分享类的社交网站,用户可以按主题分类添加和管理自己的图片收藏,并与好友分享。其使用的网站布局为瀑布流(Pinterest-style layout)。
Pinterest由美国加州帕罗奥图的一个名为Cold Brew Labs的团队营运,创办人为Ben Silbermann、 Paul Sciarra 及 Evan Sharp。2010年正式上线。“Pinterest”是由“Pin”及“interest”两个字组成,在社交网站中的访问量仅次于Facebook、Youtube、VKontakte以及Twitter。
Detecting Image Similarity in (Near) Real-time Using Apache Flink
Pinterest is a visual platform at its core, so the need to understand and act on images is paramount. A couple of years ago, the Content Quality team designed and implemented our own batch pipeline to detect similar images. The similarity signal is widely used at Pinterest for use cases varying from improving recommendations based on similar images to taking down spam and abusive content. However, it was taking several hours for the signal to be computed for newly created images, which was a long window for spammers and abusers to harm the platform. So recently, the team implemented a streaming pipeline to detect similar images in near-real-time.
Faster Creator Content Distribution at Pinterest
在Pinterest,我们致力于为创作者打造最好的体验,让他们接触到新的受众并激发他们的灵感,也让Piners轻松发现最适合他们的创意。我们相信,更多样化的Pin格式和高效的分发系统,建立一个健康的创作者内容市场,是我们成功的关键。
随着越来越多的创作者内容进入Pinterest,我们需要继续发展一个高效的发布系统,以更好地服务于创作者和Pinners的利益。为了实现这一目标,我们需要应对一系列独特的挑战。
首先,内容发布的时效性变得比以往任何时候都重要,以确保创作者分享后,新鲜的Pins能尽快展示给Pinners。
受众的相关性对于分发效率也至关重要。例如,将一个新鲜的菜谱Pin浮现给那些对烹饪感兴趣的人,以及那些可能作为相邻兴趣的人,可以帮助我们更好地了解Pin的表现和质量。
为了解决这些需求,我们以创作者为中心,构建了一个全新的、快速的、实时的内容分发系统。
Pinterest Flink Deployment Framework
Apache Flink是一个框架和分布式处理引擎,用于在无界和有界数据流上进行有状态计算。它提供的功能包括精确的唯一性保证、低延迟、高吞吐量和强大的计算模型。在Pinterest,我们采用Flink作为统一的流处理引擎。
How Pinterest fights misinformation, hate speech, and self-harm content with machine learning
为了保证Pinterest的安全和激励,除了用实时规则引擎打击垃圾信息外,我们还通过版主调查和自动系统主动对违反社区准则的内容采取行动。我们的机器学习模型可以识别出违反我们政策的内容,从健康错误信息到仇恨言论、自我伤害和图形暴力。多年来,我们还在使用Spark、LSH和TensorFlow检测类似图像的能力上取得了进步,并将其应用于信任和安全工作。
Fighting spam with Guardian, a real-time analytics and rules engine
作为信任与安全团队,我们的主要职责之一是保护Pinners免受垃圾信息的侵害。如果没有保护措施,垃圾信息就有可能遍布 Pinterest。
在 Pinterest,我们打击垃圾信息的最有价值的工具之一是我们使用的规则引擎。规则引擎允许我们查看事件流,如果满足规则中的标准,我们就可以采取行动(如阻止消息,停用发送人,或对用户进行标记并人工审查)。 在这里,我们将分享我们的垃圾信息打击规则和查询的演变,以及在整个过程中我们学到的东西。
How we designed our Continuous Integration System to be more than 50% Faster
工程效能团队的愿景是“建立一个能激励开发者把工作做到最好的开发平台”。持续集成(CI)是这个平台不可或缺的部分。CI负责验证代码变更,并生成可以部署到持续交付平台的发布项。公司有超过1000名工程师,我们的团队面临着一个挑战,提供可靠和快速的CI,服务于大规模的代码仓库。
Manas Two-stage Retrieval — The efficient architecture for hierarchical documents
As more use cases are onboarded to Manas, one special scalability and efficiency challenge emerges when serving documents with a hierarchical structure. Manas, as a traditional search engine, was designed and optimized to support flattened documents. As a result, we have to flatten attributes of root documents all the way to the leaf level regardless of the hierarchical structure, leading to inefficiencies in both indexing and serving pipelines.
Manas HNSW Realtime: Powering Realtime Embedding-Based Retrieval
在上一篇文章中,我们介绍了我们的内部搜索引擎--Manas,并分享了我们如何大规模地提供基于术语的搜索服务。自推出以来,Manas已经成长为Pinterest的关键候选生成器之一,服务于许多超出其最初目的的用例。
特别是,基于嵌入的检索是Pinterest的发现和推荐引擎的一个关键组成部分。Manas传统上支持通过位置敏感哈希(LSH)在反向索引上进行近似最近邻(ANN)搜索,这是基于术语搜索引擎的自然扩展。在Hierarchical Navigable Small World graphs (HNSW)等新的先进技术发布后,我们在Manas中建立了一个灵活的基于嵌入的检索框架,使我们能够轻松地搭载新的ANN技术。我们使用新的框架向我们的批量索引集群推出了HNSW(从几分钟到几天的索引延迟),与LSH相比,我们节省了巨大的服务成本,降低了延迟。
Manas Realtime — Enabling changes to be searchable in a blink of an eye
Manas是Pinterest内部的搜索引擎,是一个通用的信息检索平台。Manas被设计为一个具有高性能、可用性和可扩展性的搜索框架。今天,Manas为大多数Pinterest产品提供了搜索功能,包括Ads、搜索、Homefeed、Related Pins、Visual和Shopping。 搜索系统的关键指标之一是索引延迟,即更新搜索索引以反映变化所需的时间。随着系统能力的不断增强和新用例的上线,即时索引新文档的能力变得更加重要。Manas已经支持增量索引,能够提供数十分钟以内的索引延迟。不幸的是,这不能满足日益增长的来自Ads和follow feeds的业务需求。于是在Manas中构建了一个新的模块,进一步将索引延迟降低到几分之一秒。 在这篇文章中描述了系统的架构及其关键挑战,并给出了权衡的细节。
Improve user experience: solving core data inconsistencies at Pinterest
Challenges naturally occur with Pinterest’s rapid growth. As a Pinner, you might have noticed some instances where your data doesn’t look “correct,” and you may have had a negative experience because of it. For example: the “Pin count” in your profile shows the incorrect number of Pins.
Scaling Cache Infrastructure at Pinterest
Demand on Pinterest’s core infrastructure systems is accelerating faster than ever as more Pinners come to Pinterest to find inspiration. A distributed cache layer fronting many services and databases is one of our core storage systems that sits near the bottom of Pinterest’s infrastructure stack, responsible for absorbing the vast majority of backend traffic driven by this growth.
Multi-task Learning for Related Products Recommendations at Pinterest
People have always come to Pinterest for shopping inspiration, and we’ve made big strides over the years to make that as seamless as possible so Pinners (users) can go from inspiration to purchase, including evolving shoppable Product Pins, improving recommendations and making it easier for merchants to upload their catalogs to curate and feature their products.
Pinterest Visual Signals Infrastructure: Evolution from Lambda to Kappa Architecture
Ankit Patel | Software engineer, Content Acquisition and Media Platform
A better clickthrough rate: How Pinterest upgraded everyone’s favorite engagement metric
Philip Apps | Data Scientist, Ads Quality
Redesigning the Pinterest Homepage
How experimentation and cross-functional collaboration are key to making a redesign successful
How a one line change decreased our clone times by 99%
Urvashi Reddy | Software Engineer, Engineering Productivity Team Adam Berry | Tech Lead, Engineering Productivity Team Rui Li | Software…