用VP-树和OpenCV建立一个图像哈希搜索引擎
In this tutorial, you will learn how to build a scalable image hashing search engine using OpenCV, Python, and VP-Trees.
在本教程中,你将学习如何使用OpenCV、Python和VP-Trees建立一个可扩展的图像散列搜索引擎。
Image hashing algorithms are used to:
图像散列算法被用来。
- Uniquely quantify the contents of an image using only a single integer.
- 只用一个整数就能对图像的内容进行唯一的量化。
- Find duplicate or near-duplicate images in a dataset of images based on their computed hashes.
- 查找 重复的或 接近重复的图像基于计算出的哈希值,在一个图像数据集中寻找或。
Back in 2017, I wrote a tutorial on image hashing with OpenCV and Python (which is required reading for this tutorial). That guide showed you how to find identical/duplicate images in a given dataset.
早在2017年,我写了一篇关于用OpenCV和Python进行图像散列的教程(这是 必读为本教程)。)那份指南向你展示了如何在一个给定的数据集中找到相同/重复的图像。
However, there was a scalability problem with that original tutorial — namely that it did not scale!
然而,该原始教程存在一个可扩展性问题 -- 即它没有规模!"。
To find near-duplicate images, our original image hashing method would require us to perform a linear search, comparing the query hash to each individual image hash in our dataset.
为了找到接近重复的图像,我们原来的图像散列方法需要我们进行线性搜索,将查询的散列值与我们数据集中的每个图像散列值进行比较。
In a practical, real-world application that’s far too slow — we need to find a way to reduce that search to sub-linear time complexity.
在一个实际的、现实世界的应用中,这太慢了--我们需要找到一种方法,将这种搜索减少到亚线性的时间复杂度。
But how can we reduce search time so dramatically?
但是,我们如何能够如此大幅度地减少搜索时间?
The answer is a specialized data structure called a VP-Tree.
答案是一种叫做VP-树的专门数据结构。
Using a VP-Tree we can reduce our search complexity from O(n) to O(log n), enabling us to obtain our sub-linear goal!
使用VP-树,我们可以将我们的搜索复杂度从O(n)降低到O(log n),使我们能够获得我们的亚线性目标!
In the remainder of this tutorial you will learn how to:
在本教程的其余部分,你将学习如何。
- Build an image hashing search engine to find both identical and near-identical images in a dataset.
- 建立一个图像散列搜索引擎,在数据集中找到相同和接近相同的图像。
- Utilize a specialized data structure, called a VP-Tree, that can be used used to scale image hashing search engines to millions of images.
- 利用一种专门的数据结构,称为VP-树,可以用来将图像散列搜索引擎扩展到数百万张图像。
To learn how to build your first im...