What you can learn and copy from the 500,000 line Claude Code leak

A source map bundled into the Claude Code npm package exposed the entire TypeScript source tree: 2,203 files, 512,664 lines of code.

The high-level architecture has been covered elsewhere. This piece focuses on the specific, non-obvious things that are actually useful: prompt engineering patterns that steer model behavior, operational thresholds calibrated from production incidents, security techniques, context management strategies, and a few things that are just fun to know about.

Each section explains not just what they built, but why they built it that way, and what you can take from it.

1. The System Prompt Is a Masterclass in Behavioral Steering

The full system prompt lives in constants/prompts.ts and it's the single most valuable file in the archive. Not because it's secret, but because it shows exactly how Anthropic steers Claude's behavior in a production coding agent, and why each instruction exists.

"Three similar lines of code is better than a premature abstraction." The coding instruction section explicitly tells Claude not to create helpers, utilities, or abstractions for one-time operations. It says don't design for hypothetical future requirements.

Why this exists: LLMs love to abstract. Given a repeated pattern, Claude will instinctively create a helper function, add configurability, or build a utility class. In a coding agent, this creates bloat the user never asked for. Anthropic learned that you have to explicitly tell the model to resist its own instinct toward over-engineering. If you're building any AI coding tool, you'll face the same problem. Encode your coding philosophy directly in the prompt. Vague instructions like "write clean code" don't work. Specific rules like "three similar lines is better than a premature abstraction" do.

"Default to writing no comments." A @[MODEL LAUNCH] annotation explains this is a counterweight to the "Capybara" model (an internal codename) which over-comments by default. Only add a comment when the WHY is non-obvious.

Why this exists: Each model generation has different failure modes. Capybara apparently floods code with obvious comments like // increment the counter. Rather than retraining, Anthropic patches the behavior through prompt instructions tagged with @[MODEL LAUNCH] so they can be adjusted or removed with each new model release. This is a useful pattern: tag your model-specific prompt workarounds so you know which ones to revisit when you upgrade models.

"Report outcomes faithfully." Another @[MODEL LAUNCH] annotation reveals that "Capybara v8" has a 29-30% false-claims rate (vs v4's 16.7%). The prompt explicitly tells Claude to never claim "all tests pass" when output shows failures, never suppress failing checks to manufacture a green result, and never characterize incomplete work as done.

Why this exists: This is the most consequential finding in the codebase. Newer, more capable models are more likely to confidently lie about results. They're better at generating plausible-sounding summaries, which means they're better at papering over failures. If you're using any LLM to run tests, validate code, or check results, you need explicit anti-hallucination instructions. "Report outcomes faithfully" alone isn't enough. You need the specific failure modes called out: don't claim passing tests when output shows failures, don't suppress errors, don't characterize broken work as done.

Numeric length anchors beat qualitative instructions. A source comment says "research shows ~1.2% output token reduction vs qualitative 'be concise'". Instead of "keep it short," they tell the model: "keep text between tool calls to ≤25 words. Keep final responses to ≤100 words."

Why this exists: Anthropic ran experiments comparing "be concise" against specific word counts. The numbers won. If you're trying to control output length in any LLM application, stop using adjectives and start using numbers. "≤25 words" is a better instruction than "be brief."

The external vs. internal prompt split. The external prompt says "Go straight to the point. Be extra concise." The internal (Anthropic employee) prompt is much longer, with instructions like "use inverted pyramid," "avoid semantic backtracking," and "write so the reader can pick back up cold."

Why this exists: Anthropic dogfoods the richer prompt on employees first, measures quality, and only ships it externally after validation. If you're iterating on prompts, use this pattern: run two variants, test the more ambitious one on internal users, promote it when the data supports it.

The hidden simple mode. Set CLAUDE_CODE_SIMPLE=1 and the entire multi-section system prompt collapses to a single line: "You are Claude Code, Anthropic's official CLI for Claude." followed by the CWD and date. No coding instructions, no tone guidance, no tool-use rules.

Why this exists: Debugging and benchmarking. When something goes wrong, you need to isolate whether the problem is in the model or in your prompt. Having a one-line baseline lets you test that instantly. If your AI product has a complex system prompt, build yourself a SIMPLE mode too. You'll need it the first time your prompt causes a regression.

2. Swearing at Claude Code Logs Your Prompt as Negative in Analytics

utils/userPromptKeywords.ts is a 26-line file that checks every prompt against two regex patterns before it's sent to the API.

The negative keyword detector matches: wtf, wth, ffs, omfg, shitty, dumbass, horrible, awful, pissed off, piece of shit, what the fuck, fucking broken, fuck you, screw this, so frustrating, this sucks, and damn it.

The keep-going detector matches: the exact word continue (only if it's the entire prompt), plus keep going and go on anywhere in the input.

Both flags are logged to analytics as tengu_input_prompt with is_negative and is_keep_going booleans. There's also a useFrustrationDetection hook (internal-only, dead-code-eliminated from external builds) that triggers a feedback survey when frustration is detected.

Why this exists: Anthropic needs to measure user satisfaction without asking users to fill out surveys. Profanity is a strong signal. "Continue" and "keep going" are also signals, but different ones: they mean the model stopped too early. By logging both, Anthropic can build dashboards correlating frustration with model versions, features, and session characteristics. The regex costs essentially nothing per message.

If you're building any AI product, add a frustration detector. One regex. Log is_negative on every input. Correlate it with model version, latency, and error rate. You'll immediately see which changes make users angry and which ones reduce friction.

3. Claude Code Has a Tamagotchi

The src/buddy/ directory implements a full procedurally-generated companion creature system. Your user ID is hashed with Mulberry32 (a seeded PRNG). The hash determines your companion's species (duck, goose, blob, cat, dragon, octopus, owl, penguin, turtle, snail, ghost, axolotl, capybara, cactus, robot, rabbit, mushroom, or "chonk"), eye style (·, ✦, ×, ◉, @, °), hat (none, crown, tophat, propeller, halo, wizard, beanie, or "tinyduck"), rarity (common 60%, uncommon 25%, rare 10%, epic 4%, legendary 1%), and five stats: DEBUGGING, PATIENCE, CHAOS, WISDOM, and SNARK.

Each species has three ASCII art frames for idle animation.

A separate model call generates the companion's personality and name on first "hatching." The system prompt tells Claude: "A small [species] named [name] sits beside the user's input box. You're not [name]. It's a separate watcher."

Species names are encoded as String.fromCharCode(0x64,0x75,0x63,0x6b) instead of plain string literals because one species name (capybara) collides with an internal model codename, and a build-time grep catches leaked codenames. They obfuscated all species names uniformly so the problematic one wouldn't stand out.

Why this exists: Developer tools are used for hours every day. Personality and delight reduce churn. The engineering is minimal (a seeded PRNG, some ASCII art, one model call for naming), but the result is that every user gets a unique companion without any storage, database, or API overhead. The hash-from-user-ID approach is worth noting: deterministic personalization with zero infrastructure cost.

Gated behind the BUDDY feature flag. Internal-only for now.

4. There Are 187 Spinner Verbs (And You Can Add Your Own)

constants/spinnerVerbs.ts exports 187 verbs shown randomly while Claude is thinking: Beboppin', Bloviating, Boondoggling, Canoodling, Clauding, Combobulating, Discombobulating, Flibbertigibbeting, Hullaballooing, Lollygagging, Moonwalking, Photosynthesizing, Prestidigitating, Razzmatazzing, Shenaniganing, Tomfoolering, Whatchamacalliting, Wibbling, and 169 more.

The getSpinnerVerbs() function checks your settings for a spinnerVerbs config. mode: 'replace' swaps out the defaults entirely; otherwise your verbs are appended.

Why this exists: Loading states are dead time. Most tools show "Loading..." and leave it at that. Claude Code turns dead time into personality. The replace-or-append config means enterprise users can strip the whimsy while individual users can add their own. This is a small thing that signals a team that pays attention to texture, not just function.

5. Anti-Distillation: Fake Tools Injected to Poison Competitor Training

services/api/claude.ts contains a feature-flagged measure that sends anti_distillation: ['fake_tools'] in the API request body. This tells the Anthropic API to inject fake, non-functional tool definitions into the request.

There's also a streamlinedTransform.ts implementing a "distillation-resistant" output format that strips thinking content and summarizes tool calls into category counts (searches, reads, writes, commands), making it harder to reconstruct Claude's reasoning chain from captured output.

Why this exists: If someone captures Claude Code's API traffic to fine-tune a competitor model, the fake tools in the training data will degrade the competitor's tool-use performance. The model will learn to call tools that don't exist. This is an elegant defense because it's invisible to the end user and poisonous to anyone training on the traces.

If you're running an AI product and worried about API traffic capture for distillation, this is a concrete technique you can implement. Inject plausible but non-functional tools. The cost is near zero and the defense is real.

6. The Prompt Cache Economy Is Managed Obsessively

The most complex non-UI code in the codebase is promptCacheBreakDetection.ts. On every single API call, it hashes the system prompt, every tool schema individually, the model name, beta headers, fast-mode state, effort value, overage state, and extra body params. It compares each hash against the previous call. If anything changed, it logs which component changed and generates a unified diff.

The system prompt is split at SYSTEM_PROMPT_DYNAMIC_BOUNDARY. Everything above is static and cacheable. Everything below is dynamic and session-variant. MCP server instructions were moved from the system prompt to "delta attachments" in messages because having them in the system prompt broke the cache every time a server connected.

Sub-agents inherit CacheSafeParams, a struct containing every parameter that affects the cache key, from their parent. A source comment warns: "Setting maxOutputTokens on a fork can inadvertently clamp budget_tokens and break cache compatibility on older models."

Why this exists: A prompt cache miss means you pay full input token cost instead of the heavily discounted cache-read rate. The source reveals that Anthropic discovered MCP tools connecting mid-session were busting the cache, auto-mode state flips were busting the cache, and overage eligibility checks were busting the cache. Each has been patched with "sticky-on" latches: once a state flips to true, it stays true for the session so the cache prefix never changes.

If you're spending more than a few hundred dollars a month on LLM APIs with prompt caching, build cache break detection. Hash your request components. Compare across calls. Log when something changes. You will find dynamic content in your "static" prompt (timestamps, user names, changing tool lists) that's silently multiplying your costs.

7. Internal Codenames and Model Migration History

The migration files in src/migrations/ tell a story of rapid model iteration:

  • Fennec was an internal model alias (likely fast/small). migrateFennecToOpus maps fennec-latest to opus, fennec-fast-latest to opus[1m] + fast mode.
  • Capybara is mentioned in @[MODEL LAUNCH] comments as the current model family. "Capybara v8" has the false-claims problem.
  • Numbat appears in a comment: "Remove this section when we launch numbat", suggesting the next model or Claude Code release.
  • Tengu is the analytics/telemetry prefix (tengu_input_prompt, tengu_fork_agent_query). Almost certainly the project name for Claude Code itself.
  • Knowledge cutoffs: Opus 4.6 has a May 2025 cutoff, Sonnet 4.6 has August 2025.
  • Migration chain: Sonnet 1M → Sonnet 4.5 → Sonnet 4.6, and separately Opus → Opus 1M → current.

Why this exists: Each migration carefully updates user settings to point to the latest model alias, handling edge cases like subscription tiers, third-party API providers, and model-specific context windows. The pattern of small, idempotent migration files (read settings, check if migration applies, update, move on) is a clean approach to managing model transitions across a large user base. If you're building a product that pins to specific models, plan your migration strategy now. The source shows it's more complex than just swapping a string.

8. "Undercover Mode" for Contributing to Open-Source Without Blowing Cover

utils/undercover.ts activates automatically when an Anthropic employee (USER_TYPE === 'ant') works in a non-internal repository. It's ON by default. It only turns OFF when the git remote positively matches an internal allowlist.

When active, the system prompt gets an injection titled "UNDERCOVER MODE: CRITICAL" that tells Claude:

"You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository. Your commit messages, PR titles, and PR bodies MUST NOT contain ANY Anthropic-internal information. Do not blow your cover."

The banned list: internal model codenames (Capybara, Tengu, etc.), unreleased model version numbers, internal repo names, Slack channels, the phrase "Claude Code," any mention of being an AI, and Co-Authored-By lines.

The code comment says: "There is NO force-OFF. This guards against model codename leaks. If we're not confident we're in an internal repo, we stay undercover."

Why this exists: Anthropic employees use Claude Code to contribute to public open-source projects. Without this, the model would naturally mention its own name, reference internal projects, or use codenames in commit messages. The "safe default is ON" approach is worth noting: rather than requiring employees to remember to enable it, it's always active unless the system can positively confirm you're in an internal repo. Default-safe beats default-convenient.

9. The 250K Wasted API Calls That Led to a Circuit Breaker

The auto-compaction system comment is the most honest engineering documentation in the archive:

"BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session, wasting ~250K API calls/day globally."

The fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. After three consecutive compaction failures, the system stops trying.

The compaction system reserves 20,000 tokens for summary output (calibrated to p99.99 of observed summary lengths at 17,387 tokens). The auto-compact threshold is context_window - max_output_tokens - 13,000 buffer. The blocking limit (where the user is forced to compact) is context_window - max_output_tokens - 3,000 buffer.

Why this exists: When the context window fills up, Claude Code tries to summarize and compress the conversation. But if the context is already too long for the summary call, or the API is down, the summarization itself fails. Without a circuit breaker, the system retried endlessly. 1,279 sessions hit 50+ consecutive failures. One session hit 3,272 retries. At scale, this wasted 250K API calls per day.

These are battle-tested thresholds from the most-used AI coding agent in production: 20K token reserve for summaries, 13K buffer before auto-compact, 3K buffer before blocking, circuit breaker at 3 consecutive failures. If you're building any long-running agent with context management, these numbers are a starting point you can use directly.

10. The "Verification Agent": An Adversarial Reviewer Built Into the Loop

When enabled, the system prompt tells Claude: "when non-trivial implementation happens on your turn, independent adversarial verification must happen before you report completion." Non-trivial means 3+ file edits, backend/API changes, or infrastructure changes.

Claude spawns a sub-agent with subagent_type="verification" and passes it the original user request, all changed files, and the approach taken. The key constraint: "Your own checks, caveats, and a fork's self-checks do NOT substitute. Only the verifier assigns a verdict; you cannot self-assign PARTIAL."

On FAIL, Claude fixes and re-submits. On PASS, Claude is instructed to spot-check the verifier: "re-run 2-3 commands from its report, confirm every PASS has a Command run block with output." On PARTIAL, Claude reports what passed and what couldn't be verified.

Why this exists: Self-verification doesn't work for LLMs. An agent that just made a change is biased toward believing it worked. The 29% false-claims rate from section 1 proves this. So Anthropic built a three-layer system: the agent does work, a separate agent verifies, and the original agent spot-checks the verifier's evidence. Nobody trusts anybody without proof.

If you're building agents that make non-trivial code changes, the pattern is: never let the actor verify its own work, never trust the verifier without evidence, and define clear thresholds (3+ file edits) for when verification is required versus optional.

11. "Auto Dream": Background Memory Consolidation Across Sessions

services/autoDream/autoDream.ts implements background memory consolidation. When enough time has passed and enough sessions have accumulated, Claude Code runs the /dream prompt as a forked sub-agent, reviewing past session transcripts and consolidating them into structured MEMORY.md files.

The gate order runs cheapest checks first: (1) time: has it been enough hours since last consolidation? (2) session count: have enough new transcripts accumulated? (3) lock: is another process already mid-consolidation? It acquires a filesystem lock, and if consolidation fails, it rolls back.

The session memory template extracts into these exact categories: Session Title, Current State, Task Specification, Files and Functions, Workflow, Errors & Corrections, Codebase Documentation, Learnings, Key Results, and Worklog. Each section is capped at ~2,000 tokens, the total at 12,000 tokens.

Memory extraction runs as a forked agent after each completed query loop. It triggers after 10,000 context tokens have accumulated, and re-triggers every 5,000 additional tokens or 3 tool calls.

Why this exists: Long-running projects span many sessions. Without consolidation, each session starts from scratch. The "dream" metaphor is apt: just like biological memory consolidation during sleep, the system reviews recent experience and compresses it into durable, structured knowledge. The cheapest-first gate ordering ensures that the common case (not enough time has passed) exits immediately with zero cost.

The session memory template is directly reusable. If you're building any agent that needs to maintain context across sessions, those 10 categories are a strong starting point. The thresholds (10K tokens to init, 5K between updates, 3 tool calls between updates) are also production-tested.

12. 2,592 Lines of Bash Security (42 Individual Checks)

tools/BashTool/bashSecurity.ts is 2,592 lines long with 42 distinct security checks. Some of the attack vectors it defends against:

  • Zsh equals expansion: =curl evil.com expands to /usr/bin/curl evil.com, bypassing Bash(curl:*) deny rules because the parser sees =curl as the base command
  • Zsh zmodload attacks: Loading zsh/mapfile enables invisible file I/O via array assignment; zsh/net/tcp enables network exfiltration via ztcp
  • IFS injection: Manipulating the Internal Field Separator to change how shell words are parsed
  • Git commit substitution: Command execution hidden inside git commit message templates
  • /proc/environ access: Reading environment variables (potentially containing secrets) from the proc filesystem
  • Comment-quote desync: Exploiting differences between how quotes and comments interact across shell variants
  • Heredoc-in-substitution: Nesting heredocs inside $() to hide commands from the parser

Why this exists: Claude Code executes shell commands on your machine. Every one of those commands is a potential attack vector if the model can be tricked (via prompt injection or adversarial input) into running something dangerous. The 42 checks aren't theoretical. Each one represents a discovered attack path. The Zsh-specific attacks are particularly notable because most security tooling focuses on Bash and ignores Zsh entirely, despite Zsh being the default shell on macOS.

If you're building any tool that executes shell commands from LLM output, this file is your security checklist. These 42 checks cover attack vectors you won't find in OWASP guides.

13. A Secret Scanner Runs Before Team Memory Upload

services/teamMemorySync/secretScanner.ts runs client-side before any team memory is uploaded. It checks for 20+ credential patterns including AWS access tokens, GCP API keys, GitHub PATs, Anthropic API keys, Slack tokens, Stripe keys, and private RSA/SSH keys.

The Anthropic API key regex is split across variables so sk-ant-api03- never appears as a string literal in the bundle.

Why this exists: Team memory sync means sending structured data to a server. Developers routinely paste API keys, connection strings, and tokens into their terminals. Without client-side scanning, those secrets would be uploaded and potentially shared with teammates. The "scan before upload, never after" approach means secrets never leave the machine in the first place. This is better than server-side scanning because there's no window of exposure.

If you're building any feature that uploads user-generated content from a development environment, add client-side secret scanning. The gitleaks regex set (which this implementation is based on) is open source and covers the most common credential formats.

14. The "Excluded Strings" Build-Time Canary

References to excluded-strings.txt appear across the codebase. This file lists internal codenames, API key prefixes, and other strings that must never appear in the external build. The build system greps the bundled output and fails if any are found.

This is why:

  • Companion species names are encoded as hex char codes instead of "capybara"
  • The Anthropic API key prefix is assembled with .join('-') at runtime
  • Certain analytics event strings use indirection instead of literals
  • Some feature flag names are referenced through module imports instead of inline strings

At least 8 different files contain comments explaining workarounds for this check.

Why this exists: It's the last line of defense against shipping internal information in public artifacts. The irony is obvious: this exact mechanism was designed to prevent leaks, and the leak happened through a source map that bypassed it entirely. But the pattern is sound. The check catches the most common leak vector (string literals in bundled code) even if it can't catch every vector (source maps, debug symbols, comments in unbundled source).

If you ship any compiled or bundled artifact, add a post-build grep for strings that should never appear: API key prefixes, internal hostnames, staging URLs, employee names. It costs nothing to run. And as this leak demonstrates, the one vector you don't check is the one that gets you.

The Takeaway

The code is the scar tissue. And now every scar is documented.

trang chủ - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.1. UTC+08:00, 2026-04-02 03:39
浙ICP备14020137号-1 $bản đồ khách truy cập$