Elasticsearch神经搜索教程

Hi readers!
Here we are with a new episode to find out how Open source technologies approach text vectorization and vector-based search (also known as Neural Search because of the usage of the deep neural network to encode the vectors from text).
We have already published three blog posts about:

嗨,读者朋友们!
,我们在这里有一个新的情节,以了解开源技术如何处理文本矢量化和基于矢量的搜索(也被称为神经搜索,因为使用深度神经网络来编码文本中的矢量)。
,我们已经发表了三篇博文,关于:

Now it is Elasticsearch’s turn!
This blog post explores how Vector Search is managed in Elasticsearch, providing a detailed description of what is already available through an end-to-end tutorial.

现在轮到Elasticsearch了!
这篇博文探讨了Elasticsearch中如何管理矢量搜索,通过一个端到端的教程详细介绍了已经有的东西。

Elasticsearch Neural Search Pipeline

Elasticsearch神经搜索管道

Vector-based search and NLP (natural language processing) capabilities are available from Elastic Version 8.0, released on February 2022.

基于矢量的搜索和NLP(自然语言处理)功能从2022年2月发布的Elastic 8.0版本开始提供。

The following is just a diagram to easily show how a vector search engine work; it involves:

以下只是一个图表,以方便展示矢量搜索引擎的工作原理;它涉及到:

  • transforming the original entity, such as a song, an image, or some text into a numeric representation (vector embeddings)
  • 原始实体,如歌曲、图像或一些文本转化为数字表示(矢量嵌入)。
  • using distance metrics to represent the similarity** between vectors**
  • 使用距离度量来表示向量之间的相似性**。
  • searching for related data (of your query) using approximate nearing neighbor (ANN) algorithms
  • 使用近似近邻(ANN)算法搜索 相关数据(你的查询)。

Vector search diagram (Source: https://www.elastic.co/what-is/vector-search)

矢量搜索图 (来源:https://www.elastic.co/what-is/vector-search)

Like Apache Solr, Elasticsearch also uses Apache Lucene internally as its search engine, so many of the low-level concepts, data structures, and algorithms apply equally to both.
Even in this case, vector-based search is built on top of Apache Lucene HNSW (Hierarchical Nav...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.124.0. UTC+08:00, 2024-04-23 14:07
浙ICP备14020137号-1 $访客地图$