使用sqlite-vector的嵌入

About a year and a half ago I wrote about using sqlite-vss to store and query embedding vectors in a SQLite database. Much has changed since then and I’m working on a project that motivated another pass at querying embeddings on a local system for smallish datasets. The sqlite-vector project seemed like an interesting one to try for this purpose.

大约一年半前,我 写过关于使用 sqlite-vss 在 SQLite 数据库中存储和查询嵌入向量的文章。从那时起,情况发生了很大变化,我正在进行一个项目,这促使我再次在本地系统上查询小型数据集的嵌入。 sqlite-vector 项目似乎是一个有趣的尝试。

I am going to use the same news dataset as last time and the nomic-embed-text-v1.5 model to generate 768-dimensional embeddings.

我将使用与上次相同的 新闻数据集nomic-embed-text-v1.5 模型来生成 768 维的嵌入。

I also downloaded the vector.dylib file from the sqlite-vector GitHub repo and placed it in my working directory for this example. I’ve tried exercises similar to this one with both the macOS and Linux versions of the library.

我还从 sqlite-vector GitHub 仓库 下载了 vector.dylib 文件,并将其放置在我的工作目录中以供本示例使用。我已经尝试过类似的练习,使用了该库的 macOS 和 Linux 版本。

To get started, we’ll install the libraries we will need to load the data, create the database and generate embeddings.

要开始,我们将安装加载数据、创建数据库和生成嵌入所需的库。

pip install -q pandas scikit-learn sentence-transformers

Next, we’ll create the database and table and load the sqlite-vector extension, verifying that it loaded correctly.

接下来,我们将创建数据库和表,并加载sqlite-vector扩展,验证其是否正确加载。

conn = sqlite3.connect('news.db')conn.enable_load_extension(True)conn.load_extension('./vector.dylib')conn.enable_load_extension(False)version = conn.execute("SELECT vector_version()").fetchone()[0]print(f"SQLite Vector extension version: {version}")
!----
SQLite Vector extension version: 0.9.37

Looks good! The query shows the same version I downloaded.

看起来不错!查询显示了我下载的相同 版本

If you haven’t taken a look at the dataset yet, here’s a sample:

如果你还没有查看数据集,这里有一个示例:

# Load the first few records to see the structuredata_path = Path('data/News_Category_Dataset_v3.json')with open(data_path, 'r') as f: for i, line in enumerate(f): articles.append(json.loads(line))df = pd.DataFrame(arti...
开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2025 iteam. Current version is 2.148.1. UTC+08:00, 2025-11-23 17:15
浙ICP备14020137号-1 $お客様$