Pinterest的Ray批量推理（第三部分）

Introduction

介绍

In Part 1 of our blog series, we discussed why we chose to use Ray(™) as a last mile data processing framework and how it enabled us to solve critical business problems. In Part 2 of our blog series, we described how we were able to integrate Ray(™) into our existing ML infrastructure. In this blog post, we will discuss a second type of popular application of Ray(™) at Pinterest: offline batch inference of ML models. We will also share how our implementation was able to deliver 4.5x throughput increases and 30x cost savings.

在我们博客系列的第 1 部分中，我们讨论了为什么选择 Ray™ 作为最后一公里数据处理框架以及它如何帮助我们解决关键业务问题。在我们博客系列的第 2 部分中，我们描述了如何将 Ray™ 集成到我们现有的 ML 基础设施中。在这篇博客文章中，我们将讨论 Ray™ 在 Pinterest 的第二种流行应用：离线批量推理 ML 模型。我们还将分享我们的实现如何能够提供 4.5 倍的吞吐量增加和 30 倍的成本节省。

Background

背景

Offline batch inference involves operating over a large dataset and passing the data in batches to a ML model which will generate a result for each batch. Offline batch inference jobs generally consist of a series of steps: dataloading, preprocessing, inference, post processing, and result writing. These offline batch inference jobs can be both I/O and compute intensive.

离线批量推理涉及对大型数据集进行操作，并将数据分批传递给机器学习模型，模型将为每个批次生成结果。离线批量推理作业通常包括一系列步骤：数据加载、预处理、推理、后处理和结果写入。这些离线批量推理作业可能既是I/O密集型的，也是计算密集型的。

At Pinterest, previous batch inference solutions were built using Apache Spark(™) or Torch Dataloader. These solutions had several drawbacks.

在 Pinterest，之前的批量推理解决方案是使用 Apache Spark(™) 或 Torch Dataloader 构建的。这些解决方案有几个缺点。

Lack of heterogeneous node instance types for preprocessing and inference
缺乏用于预处理和推理的异构节点实例类型
Difficulty to achieving pipelining and overlap steps such as dataloading, inference, and result writing
实现流水线和重叠步骤（如数据加载、推理和结果写入）的难度
Spark required b...