图片搜索在Dropbox是如何运作的

Photos are among the most common types of files in Dropbox, but searching for them by filename is even less productive than it is for text-based files. When you're looking for that photo from a picnic a few years ago, you surely don't remember that the filename set by your camera was 2017-07-04 12.37.54.jpg.

照片是Dropbox中最常见的文件类型之一，但通过文件名搜索它们的效率甚至比基于文本的文件还要低。当你在寻找几年前野餐的照片时，你肯定不记得你的相机设置的文件名是2017-07-04 12.37.54.jpg。

Instead, you look at individual photos, or thumbnails of them, and try to identify objects or aspects that match what you’re searching for—whether that’s to recover a photo you’ve stored, or perhaps discover the perfect shot for a new campaign in your company’s archives. Wouldn’t it be great if Dropbox could pore through all those images for you instead, and call out those which best match a few descriptive words that you dictated? That’s pretty much what our image search does.

取而代之的是，你查看每张照片或其缩略图，并试图找出与你搜索的内容相匹配的对象或方面--无论是恢复你存储的照片，还是在你公司的档案中发现一个新活动的完美镜头。如果Dropbox能够为您浏览所有这些图片，并呼出那些最符合您口述的描述性词语的图片，那不是很好吗？这几乎就是我们的图片搜索的作用。

Image content search results for “picnic”

图片内容搜索结果为 "野餐"

In this post we’ll describe the core idea behind our image content search method, based on techniques from machine learning, then discuss how we built a performant implementation on Dropbox’s existing search infrastructure.

在这篇文章中，我们将描述我们基于机器学习技术的图像内容搜索方法的核心思想，然后讨论我们如何在Dropbox现有的搜索基础设施上建立一个高性能的实现。

Here’s a simple way to state the image search problem: find a relevance function that takes a (text) query q and an image j, and returns a relevance score s indicating how well the image matches the query.

这里有一个陈述图像搜索问题的简单方法：找到一个相关性函数，它接受一个（文本）查询q和一个图像j，并返回一个相关性分数s，表明该图像与查询的匹配程度。

s = f(q, j)

Given this function, when a user does a search we run it on all their images and return those that produce a score above a threshold, sorted by their scores. We build this function using two key developments in machine learning: accurate image classification and word vectors.

有了这个函数，当用户进行搜索时，我们在他们所有的图片上运行这个函数，并返回那些产生的分数高于阈值的图片，并按分数排序...