构建永不遗忘的 Agents

A first-principles walk through agent memory: from Python lists to markdown files to vector search to graph-vector hybrids, and finally, a clean, open-source solution for all of this.

agent memory 的第一性原理之旅：从 Python lists 到 markdown files，到 vector search，到 graph-vector hybrids，最后是一个干净的、open-source 解决方案，涵盖所有这些。

An LLM is stateless by design. Every API call starts fresh. The "memory" you feel when chatting with ChatGPT is an illusion created by re-sending the entire conversation history with every request.

LLM 按设计是无状态的。每个 API 调用都从头开始。你在与 ChatGPT 聊天时感受到的“记忆”是每次请求重新发送整个对话历史所创造的幻觉。

That trick works for casual chat. It falls apart the moment you try to build a real agent.

那个把戏在随意聊天中有效。一旦你试图构建一个真正的智能体，它就会失效。

Here are 7 failure modes show up the instant you skip memory:

以下是 7 种跳过内存立即出现的故障模式：

Context amnesia: the agent asks for information you already gave it
Context amnesia: 代理询问你已经提供给它的信息
Zero personalization: every interaction feels generic
零个性化： 每次交互都感觉通用
Multi-step task failure: intermediate state silently drops mid-task
多步骤任务失败： 中间状态在任务中途悄无声息地丢失
Repeated mistakes: no episodic recall means the same errors, forever
Repeated mistakes: 没有 episodic recall 意味着同样的错误，永远
No knowledge accumulation: every session starts from scratch
无知识积累： 每个会话都从零开始
Hallucination from gaps: when context overflows, the model invents
来自间隙的幻觉： 当上下文溢出时，模型会发明
Identity collapse: no continuity, no trust
身份崩溃： 没有连续性，没有信任

The obvious response is "throw more context at it." That's why 128K and 200K token windows feel like they should solve everything.

显而易见的回应是“向它扔更多上下文”。这就是为什么 128K 和 200K 令牌窗口感觉它们应该解决一切。

They don't.

它们不会。

Accuracy drops over 30% when relevant information sits in the middle of a long context. This is the well-documented "lost in the middle" effect.

当相关信息位于长上下文中间时，准确率下降超过 30%。 这是众所周知的 "lost in the middle" 效应。

Context is a shared budget: system prompts, retrieved docs, conversation history, and output all fight for the same tokens.

Context 是一个共享预算：system prompts、retrieved docs、conversation history 和 output 都争夺相同的 tokens。

Even at 100K tokens, the abs...