使用MUVERA实现更高效的多向量嵌入

Weaviate 1.31 implements the MUVERA encoding algorithm for multi-vector embeddings. In this blog, we dive the algorithm in detail, including what MUVERA is, how it works, and whether it might make sense for you.

Weaviate 1.31 实现了用于多向量嵌入的 MUVERA 编码算法。在这篇博客中,我们详细探讨了该算法,包括 MUVERA 是什么,它是如何工作的,以及它是否对您有意义。

Let's start by reviewing what multi-vector models are, and the challenges that MUVERA looks to solve.

让我们先回顾一下什么是多向量模型,以及MUVERA希望解决的挑战。

Challenges with multi-vector embeddings

多向量嵌入的挑战

State-of-the-art multi-vector models can dramatically improve retrieval performance by capturing more semantic information than single-vector models. ColBERT models preserve token-level meanings in text, while ColPali/ColQwen models identify and preserve information from different parts of an image, like figures in PDFs as well as textual information.

最先进的多向量模型可以通过捕捉比单向量模型更多的语义信息来显著提高检索性能。ColBERT 模型在文本中保留了令牌级别的含义,而 ColPali/ColQwen 模型则识别并保留来自图像不同部分的信息,如 PDF 中的图形以及文本信息。

Single vector to multi-vector comparison

Single vector to multi-vector comparison

单向量与多向量比较

These advantages make multi-vector models a great fit for many use cases. However, multi-vector embeddings carry two potential disadvantages over their single-vector cousins, owing to their size and relative complexity.

这些优势使得多向量模型非常适合许多用例。然而,多向量嵌入相较于单向量嵌入有两个潜在的缺点,主要是由于它们的大小和相对复杂性。

Multi-vector embeddings comprise multiple vectors, each one representing a part of the object, such as a token (text) or a patch (image). Although each vector in a multi-vector embedding is smaller, the whole embedding tends to be larger than a typical single-vector embedding.

多向量嵌入由多个向量组成,每个向量表示对象的一部分,例如一个标记(文本)或一个补丁(图像)。尽管多向量嵌入中的每个向量较小,但整个嵌入往往比典型的单向量嵌入要大。

Multi-vector embeddings memory comparison

Multi-vector embeddings memory comparison

多向量嵌入内存比较

This can lead to a higher memory footprint in use, as many vector search systems use in-memory indexes such as HNSW.

这可能会导致更高的内存占用,因为许多向量搜索系统使用内存索引,如 HNSW

How much larger? Well, as you can see from the image above, the total number of vectors in a multi-vector index will be greater than the single-vector index by average_vectors_per_embeddin...

开通本站会员,查看完整译文。

Accueil - Wiki
Copyright © 2011-2025 iteam. Current version is 2.148.0. UTC+08:00, 2025-11-12 14:39
浙ICP备14020137号-1 $Carte des visiteurs$