
Hi readers!
Here we are with a new episode to find out how Open source technologies approach text vectorization and vector-based search (also known as Neural Search because of the usage of the deep neural network to encode the vectors from text).
We have already published three blog posts about:


Now it is Elasticsearch’s turn!
This blog post explores how Vector Search is managed in Elasticsearch, providing a detailed description of what is already available through an end-to-end tutorial.


Elasticsearch Neural Search Pipeline


Vector-based search and NLP (natural language processing) capabilities are available from Elastic Version 8.0, released on February 2022.

基于矢量的搜索和NLP(自然语言处理)功能从2022年2月发布的Elastic 8.0版本开始提供。

The following is just a diagram to easily show how a vector search engine work; it involves:


  • transforming the original entity, such as a song, an image, or some text into a numeric representation (vector embeddings)
  • 原始实体,如歌曲、图像或一些文本转化为数字表示(矢量嵌入)。
  • using distance metrics to represent the similarity** between vectors**
  • 使用距离度量来表示向量之间的相似性**。
  • searching for related data (of your query) using approximate nearing neighbor (ANN) algorithms
  • 使用近似近邻(ANN)算法搜索 相关数据(你的查询)。

Vector search diagram (Source: https://www.elastic.co/what-is/vector-search)

矢量搜索图 (来源:https://www.elastic.co/what-is/vector-search)

Like Apache Solr, Elasticsearch also uses Apache Lucene internally as its search engine, so many of the low-level concepts, data structures, and algorithms apply equally to both.
Even in this case, vector-based search is built on top of Apache Lucene HNSW (Hierarchical Nav...


inicio - Wiki
Copyright © 2011-2024 iteam. Current version is 2.136.0. UTC+08:00, 2024-10-18 18:21
浙ICP备14020137号-1 $mapa de visitantes$