使用一队并行的 Claudes 构建 C 编译器
*Written by Nicholas Carlini, a researcher on our Safeguards team.
*由 Nicholas Carlini 撰写,他是我们 Safeguards 团队的研究员。
I've been experimenting with a new approach to supervising language models that we’re calling "agent teams."
我一直在实验一种监督语言模型的新方法,我们称之为“agent teams”。
With agent teams, multiple Claude instances work in parallel on a shared codebase without active human intervention. This approach dramatically expands the scope of what's achievable with LLM agents.
使用代理团队,多个 Claude 实例并行工作在共享代码库上,而无需主动人类干预。这种方法极大地扩展了 LLM 代理所能实现的内容范围。
To stress test it, I tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V.
为了对其进行压力测试,我指派了 16 个代理从零开始编写一个基于 Rust 的 C 编译器,能够编译 Linux kernel。经过近 2,000 个 Claude Code 会话和 $20,000 的 API 成本,代理团队生成了一个 100,000 行的编译器,能够在 x86、ARM 和 RISC-V 上构建 Linux 6.9。
The compiler is an interesting artifact on its own, but I focus here on what I learned about designing harnesses for long-running autonomous agent teams: how to write tests that keep agents on track without human oversight, how to structure work so multiple agents can make progress in parallel, and where this approach hits its ceiling.
这个编译器本身就是一个有趣的产物,但我在这里重点关注从为长期运行的自主代理团队设计测试框架中学到的经验:如何编写测试以在没有人工监督的情况下让代理保持正轨,如何结构化工作以使多个代理能够并行取得进展,以及这种方法在何处达到极限。
Enabling long-running Claudes
启用长运行的 Claude
Existing agent scaffolds like Claude Code require an operator to be online and available to work jointly. If you ask for a solution to a long and complex problem, the model may solve part of it, but eventually it will stop and wait for continued input—a question, a status update, or a request for clarification.
现有的代理脚手架如 Claude Code 需要操作员在线并可用以共同工作。如果你要求解决一个漫长而复杂的问题,模型可能会解决其中的一部分,但最终它会停止并等待持续输入——一个问题、状态更新或澄清请求。
To elicit sustained, autonomous progress, I built a harness that sticks Claude in a simple loop (if you’ve seen Ralph-loop, this should look fami...