为 Mintlify 的 AI 助手构建虚拟文件系统
RAG is great, until it isn't.
RAG 很棒,直到它不棒为止。
Our assistant could only retrieve chunks of text that matched a query. If the answer lived across multiple pages, or the user needed exact syntax that didn't land in a top-K result, it was stuck. We wanted it to explore docs the way you'd explore a codebase.
我们的助手只能检索匹配查询的文本块。如果答案跨越多个页面,或者用户需要不在 top-K 结果中的确切语法,它就卡住了。我们希望它能像探索代码库一样探索文档。
Agents are converging on filesystems as their primary interface because grep, cat, ls, and find are all an agent needs. If each doc page is a file and each section is a directory, the agent can search for exact strings, read full pages, and traverse the structure on its own. We just needed a filesystem that mirrored the live docs site.
代理正在将文件系统作为其主要接口,因为 grep、cat、ls 和 find 是代理所需的一切。如果每个文档页面是一个文件,每个部分是一个目录,代理就可以搜索确切字符串、读取完整页面,并自行遍历结构。我们只需要一个镜像实时文档站点的文件系统。
The Container Bottleneck
容器瓶颈
The obvious way to do this is to just give the agent a real filesystem. Most harnesses solve this by spinning up an isolated sandbox and cloning the repo. We already use sandboxes for asynchronous background agents where latency is an afterthought, but for a frontend assistant where a user is staring at a loading spinner, the approach falls apart. Our p90 session creation time (including GitHub clone and other setup) was ~46 seconds.
实现这一点显而易见的方法就是直接给 agent 一个真实的 filesystem。大多数 harness 通过启动一个隔离的 sandbox 并克隆 repo 来解决这个问题。我们已经为异步后台 agent 使用 sandbox,在那里延迟是次要考虑,但对于前端 assistant,用户盯着 loading spinner 时,这种方法就失效了。我们的 p90 会话创建时间(包括 GitHub clone 和其他设置)是 ~46 秒。
Beyond latency, dedicated micro-VMs for reading static documentation introduced a serious infrastructure bill.
除了延迟之外,专用于读取静态文档的专用 micro-VMs 引入了严重的 infrastructure 成本。
At 850,000 conversations a month, even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM). Longer session times double that. (This is based on a purely naive approach, a true production workflow would probably hav...