如何建立一个简单的AI照片搜索引擎
Have you ever wondered how you could build your own Google Image Search? This tutorial will walk you through so you can build your own in 15 minutes.
你是否曾想过如何建立你自己的谷歌图片搜索?本教程将引导你,让你在15分钟内建立你自己的。
Often the most obvious kind of search engine to build would be to search for text. However, for all types of data that are not convertible to text - e.g Music, text-like searching is not always feasible.
通常情况下,建立的最明显的一种搜索引擎是搜索文本。然而,对于所有不能转换为文本的数据类型--如音乐,类似文本的搜索并不总是可行的。
So how would you build a image search engine? a few ideas come to mind: To compare the pixels of the photos: this will only find exact same photos or with different lighting conditions. Or we can use AI to convert a photo into a text that describes the photo: this, for example, will struggle to differentiate between different kinds of cats, if it can only infer that the photo contains a cat but not which breed.
那么,你将如何建立一个图片搜索引擎呢?我想到了几个想法。比较照片的像素:这只能找到完全相同的照片或不同的照明条件。或者我们可以使用人工智能将照片转换成描述照片的文字:例如,这将很难区分不同种类的猫,如果它只能推断出照片中包含一只猫,而不是哪个品种。
Instead, if we can somehow convert a photo into a vector, we can measure the distance then rank all the distances and find the closest photos.
相反,如果我们能以某种方式将照片转换成矢量,我们就可以测量距离,然后对所有的距离进行排序,找到最接近的照片。
We can do this by using almost any image neural network, however the most common, for good reasons, are ones trained on ImageNet.
我们可以通过使用几乎所有的图像神经网络来做到这一点,然而最常见的,有充分理由的,是那些在ImageNet上训练出来的。
ImageNet models are models trained to classify the 1000 different objects in the ImageNet dataset. The last layer of an ImageNet model would be a 1000 dimension - classification logits layer. In order to take image encoding from a ImageNet model, we can take the outputs from the second-to-last layer instead, which would contain the information derived from all the upstream layer and are important for the classification.
ImageNet模型是为了对ImageNet数据集中的1000个不同对象进行分类而训练的模型。ImageNet模型的最后一层将是一个1000维的分类Logits层。为了从ImageNet模型中获取图像编码,我们可以从倒数第二层中获取输出,这将包含从所有上游层中获取的信息,对分类很重要。
For illustration, I'm going to use MobileNet - one of the most lightweight ImageNet-traine...