使用 Voyager-3 和 LangGraph 构建强大的多模态搜索

Embedding images and text in the same space will allow us to perform highly accurate searches for multimodal content like web pages, PDF files, magazines, books, brochures, and various papers. Why is this technique so interesting? The main exciting thing about embedding text and images into the same space is that you can search and retrieve text related to a particular image and vice-versa. For example, if you are searching for a cat, you will find the pictures displaying a cat, but you will also get the texts referring to those images even if the text doesn’t explicitly say the word cat.

在同一空间中嵌入图像和文本将使我们能够对网页、PDF文件、杂志、书籍、宣传册和各种论文等多模态内容进行高度准确的搜索。为什么这种技术如此有趣?将文本和图像嵌入到同一空间的主要令人兴奋之处在于,你可以搜索和检索与特定图像相关的文本,反之亦然。例如,如果你在搜索一只猫,你会找到显示猫的图片,但你也会得到那些即使文本中没有明确提到“猫”这个词的相关文本。

Let me show the difference between a traditional text embedding similarity search and a multi-modal embedding space:

让我展示一下传统文本嵌入相似性搜索和多模态嵌入空间之间的区别:

Example Question: What does the magazine say about cats?

示例问题:杂志对猫有什么看法?

Screenshot from a photography magazine — OUTDOOR

摄影杂志的截图 — OUTDOOR

Regular Similarity Search Answer

常规相似性搜索答案

The search results provided do not contain specific information about cats. They mention animal portraits and photography techniques, but there is no explicit mention of cats or details related to them.

提供的搜索结果不包含关于猫的具体信息。它们提到了动物肖像和摄影技巧,但没有明确提到猫或与之相关的细节。

As shown in the image above, the word “cat” is not mentioned; there is just an image and an explanation of how to take pictures of animals. The regular similarity search yielded no results since the word “cat” was not mentioned.

如上图所示,单词“cat”并未提及;只有一张图片和关于如何拍摄动物的解释。常规的相似性搜索没有结果,因为单词“cat”没有被提及。

Multi-Modal Search Answer

多模态搜索答案

The magazine features a portrait of a cat, highlighting the detailed capture of its facial features and character. The text emphasizes how well-done animal portraits can delve into the subject’s soul and create an emotional connection with the viewer through compelling eye contact.

杂志刊登了一张猫的肖像,突出了其面部特征和性格的细致捕捉。文本强调了精心制作的动物肖像如何能够深入到主题的灵魂,并通过引人注目的眼神交流与观众建立情感联系。

...
开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.139.0. UTC+08:00, 2025-01-09 04:24
浙ICP备14020137号-1 $访客地图$