Elasticsearch神经搜索教程
Hi readers!
Here we are with a new episode to find out how Open source technologies approach text vectorization and vector-based search (also known as Neural Search because of the usage of the deep neural network to encode the vectors from text).
We have already published three blog posts about:
嗨,读者朋友们!
,我们在这里有一个新的情节,以了解开源技术如何处理文本矢量化和基于矢量的搜索(也被称为神经搜索,因为使用深度神经网络来编码文本中的矢量)。
,我们已经发表了三篇博文,关于:
- OpenSearch Neural Search Plugin Tutorial: Indexing and Searching
- OpenSearch Neural Search Plugin教程:索引和搜索
- Apache Solr Neural Search Tutorial: Indexing and Searching
- Apache Solr神经搜索教程:索引和搜索
- Vespa Neural Search Tutorial
- Vespa神经搜索教程
Now it is Elasticsearch’s turn!
This blog post explores how Vector Search is managed in Elasticsearch, providing a detailed description of what is already available through an end-to-end tutorial.
现在轮到Elasticsearch了!
这篇博文探讨了Elasticsearch中如何管理矢量搜索,通过一个端到端的教程详细介绍了已经有的东西。
Elasticsearch Neural Search Pipeline
Elasticsearch神经搜索管道
Vector-based search and NLP (natural language processing) capabilities are available from Elastic Version 8.0, released on February 2022.
基于矢量的搜索和NLP(自然语言处理)功能从2022年2月发布的Elastic 8.0版本开始提供。
The following is just a diagram to easily show how a vector search engine work; it involves:
以下只是一个图表,以方便展示矢量搜索引擎的工作原理;它涉及到:
- transforming the original entity, such as a song, an image, or some text into a numeric representation (vector embeddings)
- 将原始实体,如歌曲、图像或一些文本转化为数字表示(矢量嵌入)。
- using distance metrics to represent the similarity** between vectors**
- 使用距离度量来表示向量之间的相似性**。
- searching for related data (of your query) using approximate nearing neighbor (ANN) algorithms
- 使用近似近邻(ANN)算法搜索 相关数据(你的查询)。
Vector search diagram (Source: https://www.elastic.co/what-is/vector-search)
矢量搜索图 (来源:https://www.elastic.co/what-is/vector-search)
Like Apache Solr, Elasticsearch also uses Apache Lucene internally as its search engine, so many of the low-level concepts, data structures, and algorithms apply equally to both.
Even in this case, vector-based search is built on top of Apache Lucene HNSW (Hierarchical Nav...