如何构建一个 Deep Researcher
A 100% open-source, self-hostable Deep Research Stack That Beat OpenAI, Gemini, and Perplexity
一个 100% 开源、可自托管的深度研究堆栈,击败了 OpenAI、Gemini 和 Perplexity
If you need AI to do research for you today, you're probably using ChatGPT Deep Research, Claude, or Perplexity. All three are genuinely capable. All three are also closed-source SaaS running in someone else's cloud.
如果你今天需要AI为你做研究,你可能正在使用ChatGPT Deep Research、Claude或Perplexity。这三者都确实很强大。它们也都是运行在别人云端的闭源SaaS。
Every query you send and every internal document you connect sits on their servers, not yours.
你发送的每一个查询和连接的每一份内部文档都存储在他们的服务器上,而不是你的服务器上。
For most teams, that's been the trade-off: accept it, or don't use AI for serious research.
对于大多数团队来说,这一直是一个权衡:要么接受它,要么不要将 AI 用于严肃的研究。
In this article, you'll see a third option: a fully open-source deep research stack that runs on your own infrastructure.
在本文中,你将看到第三种选择:一个完全开源的深度研究堆栈,运行在你自己的基础设施上。
Three tools, all open source: Onyx for retrieval, CrewAI for orchestration, Voxtral for voice.
三款工具,全部开源:用于检索的 Onyx,用于编排的 CrewAI,以及用于语音的 Voxtral。
Here's the full system running end-to-end, from voice query to narrated research report:
以下是完整系统的端到端运行过程,从语音查询到生成带旁白的研究报告:
0:01 / 0:34
0:01 / 0:34
The rest of this article breaks down how it works and walks you through building the same stack yourself. Before any of that, though, it's worth being clear about why this is worth building at all.
本文的其余部分将剖析其工作原理,并引导您亲自构建相同的堆栈。不过,在开始之前,有必要先弄清楚为什么这值得构建。
Why self-hosting actually matters
为什么自托管实际上很重要
Every major AI research tool is a closed cloud service. That has real consequences:
每个主要的AI研究工具都是封闭的云服务。这会产生实际的后果:
- Your queries go to their servers. The questions you ask reveal what you're working on.
- 您的查询会发送到他们的服务器。您提出的问题会透露您正在处理的工作内容。
- Your connected data gets indexed on their infrastructure. Integration is convenient, but the index lives on their side.
- 您连接的数据会在他们的基础设施上建立索引。集成很方便,但索引存在于他们那边。
- Retention, logging, and audit are their call, not yours. Enterprise tiers soften this but don't eliminate it.
- 数据保留、日志记录和审计由他们决定,而非您。企业版层级可缓解此问题,但无法彻底消除。
- Quotas and prici...