从构建 Claude Code 中学到的教训:Prompt Caching 就是一切

[

[

](https://x.com/trq212)

](https://x.com/trq212)

[

[

Image

](https://x.com/trq212/article/2024574133011673516/media/2024562809271726080)

](https://x.com/trq212/article/2024574133011673516/media/2024562809271726080)

Lessons from Building Claude Code: Prompt Caching Is Everything

Lessons from Building Claude Code: Prompt Caching Is Everything

It is often said in engineering that "Cache Rules Everything Around Me", and the same rule holds for agents.

工程领域常说 "Cache Rules Everything Around Me",这个规则同样适用于 agents。

Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.

像 Claude Code 这样的长期运行的 agentic 产品是通过 prompt caching 实现的,它允许我们重用前几轮的计算,从而显著降低延迟和成本。

At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they're too low.

在 Claude Code,我们将整个 harness 围绕 prompt caching 构建。高 prompt cache hit rate 可以降低成本,并帮助我们为订阅计划创建更慷慨的 rate limits,因此我们对 prompt cache hit rate 运行 alerts,并在它们太低时声明 SEVs。

These are the (often unintuitive) lessons we've learned from optimizing prompt caching at scale.

这些是我们从大规模优化 prompt caching 中学到的(往往不直观的)教训。

[

[

Image

](https://x.com/trq212/article/2024574133011673516/media/2024553977430646784)

](https://x.com/trq212/article/2024574133011673516/media/2024553977430646784)

Prompt caching works by prefix matching — the API caches everything from the start of the request up to each cache_control breakpoint. This means the order you put things in matters enormously, you want as many of your requests to share a prefix as possible.

Prompt caching 通过前缀匹配工作 — API 从请求开始缓存到每个 cache_control 断点的一切。这意味着您放置事物的顺序极其重要,您希望尽可能多的请求共享前缀。

The best way to do this is static content first, dynamic content last. For Claude Code this looks like:

这样做的最佳方式是静态内容优先,动态内容最后。对于 Claude Code,这看起来像:

  1. Static system prompt & Tools (globally...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.0. UTC+08:00, 2026-03-24 03:16
浙ICP备14020137号-1 $访客地图$