Architecture: The Engine Room

The Heart: tt Control Loop

The entire Claude Code system revolves around a single async generator function called tt. This function orchestrates every interaction, from user input to LLM communication to tool execution. Let's dissect this remarkable piece of engineering:

async function* tt( currentMessages: CliMessage[], baseSystemPromptString: string, currentGitContext: GitContext, currentClaudeMdContents: ClaudeMdContent[], permissionGranterFn: PermissionGranter, toolUseContext: ToolUseContext, activeStreamingToolUse?: ToolUseBlock, loopState: { turnId: string, turnCounter: number, compacted?: boolean, isResuming?: boolean } ): AsyncGenerator<CliMessage, void, void>

This signature reveals the sophisticated state management at play. The function yields CliMessage objects that drive UI updates while maintaining conversation flow. Let's examine each phase:

{ yield { type: "ui_state_update", uuid: `uistate-${loopState.turnId}-${Date.now()}`, timestamp: new Date().toISOString(), data: { status: "thinking", turnId: loopState.turnId } }; let messagesForLlm = currentMessages; let wasCompactedThisIteration = false; if (await shouldAutoCompact(currentMessages)) { yield { type: "ui_notification", data: { message: "Context is large, attempting to compact..." } }; try { const compactionResult = await compactAndStoreConversation( currentMessages, toolUseContext, true ); messagesForLlm = compactionResult.messagesAfterCompacting; wasCompactedThisIteration = true; loopState.compacted = true; yield createSystemNotificationMessage( `Conversation history automatically compacted. Summary: ${ compactionResult.summaryMessage.message.content[0].text }` ); } catch (compactionError) { yield createSystemErrorMessage( `Failed to compact conversation: ${compactionError.message}` ); } } }

Performance Profile of Phase 1:

The system prompt isn't static—it's assembled fresh for each turn:

{ const [toolSpecs, dirStructure] = await Promise.all([ Promise.all( toolUseContext.options.tools .filter(tool => tool.isEnabled ? tool.isEnabled() : true) .map(async (tool) => convertToolDefinitionToToolSpecification(tool, toolUseContext)) ), getDirectoryStructureSnapshot(toolUseContext) ]); const systemPromptForLlm = assembleSystemPrompt( baseSystemPromptString, currentClaudeMdContents, currentGitContext, dirStructure, toolSpecs ); const apiMessages = prepareMessagesForApi( messagesForLlm, true ); }

The assembly process follows a strict priority order:

Priority 1: Base Instructions (~2KB) ↓ Priority 2: Model-Specific Adaptations (~500B) ↓ Priority 3: CLAUDE.md Content (Variable, typically 5-50KB) ↓ Priority 4: Git Context (~1-5KB) ↓ Priority 5: Directory Structure (Truncated to fit) ↓ Priority 6: Tool Specifications (~10-20KB)

{ const llmStream = callLlmStreamApi( apiMessages, systemPromptForLlm, toolSpecificationsForLlm, toolUseContext.options.mainLoopModel, toolUseContext.abortController.signal ); let accumulatedAssistantMessage: Partial<CliMessage> & { message: Partial<ApiMessage> & { content: ContentBlock[] } } = { type: "assistant", uuid: `assistant-${loopState.turnId}-${loopState.turnCounter}-${Date.now()}`, timestamp: new Date().toISOString(), message: { role: "assistant", content: [] } }; let currentToolUsesFromLlm: ToolUseBlock[] = []; let currentThinkingContent: string = ""; let currentToolInputJsonBuffer: string = ""; }

This is where the magic happens—processing streaming events in real-time:

{ for await (const streamEvent of llmStream) { if (toolUseContext.abortController.signal.aborted) { yield createSystemNotificationMessage("LLM stream processing aborted by user."); return; } switch (streamEvent.type) { case "message_start": accumulatedAssistantMessage.message.id = streamEvent.message.id; accumulatedAssistantMessage.message.model = streamEvent.message.model; accumulatedAssistantMessage.message.usage = streamEvent.message.usage; yield { type: "ui_state_update", data: { status: "assistant_responding", model: streamEvent.message.model } }; break; case "content_block_start": const newBlockPlaceholder = { ...streamEvent.content_block }; if (streamEvent.content_block.type === "thinking") { currentThinkingContent = ""; newBlockPlaceholder.thinking = ""; } else if (streamEvent.content_block.type === "tool_use") { currentToolInputJsonBuffer = ""; newBlockPlaceholder.input = ""; } else if (streamEvent.content_block.type === "text") { newBlockPlaceholder.text = ""; } accumulatedAssistantMessage.message.content.push(newBlockPlaceholder); break; case "content_block_delta": const lastBlockIndex = accumulatedAssistantMessage.message.content.length - 1; if (lastBlockIndex < 0) continue; const currentBlock = accumulatedAssistantMessage.message.content[lastBlockIndex]; if (streamEvent.delta.type === "text_delta" && currentBlock.type === "text") { currentBlock.text += streamEvent.delta.text; yield { type: "ui_text_delta", data: { textDelta: streamEvent.delta.text, blockIndex: lastBlockIndex } }; } else if (streamEvent.delta.type === "input_json_delta" && currentBlock.type === "tool_use") { currentToolInputJsonBuffer += streamEvent.delta.partial_json; currentBlock.input = currentToolInputJsonBuffer; const parseAttempt = tryParsePartialJson(currentToolInputJsonBuffer); if (parseAttempt.complete) { yield { type: "ui_tool_preview", data: { toolId: currentBlock.id, preview: parseAttempt.value } }; } } break; case "content_block_stop": const completedBlock = accumulatedAssistantMessage.message.content[streamEvent.index]; if (completedBlock.type === "tool_use") { try { const parsedInput = JSON.parse(currentToolInputJsonBuffer); completedBlock.input = parsedInput; currentToolUsesFromLlm.push({ type: "tool_use", id: completedBlock.id, name: completedBlock.name, input: parsedInput }); } catch (e) { completedBlock.input = { error: "Invalid JSON input from LLM", raw_json_string: currentToolInputJsonBuffer, parse_error: e.message }; } currentToolInputJsonBuffer = ""; } yield { type: "ui_content_block_complete", data: { block: completedBlock, blockIndex: streamEvent.index } }; break; case "message_stop": const finalAssistantMessage = finalizeCliMessage( accumulatedAssistantMessage, loopState.turnId, loopState.turnCounter ); yield finalAssistantMessage; break; } } }

Stream Processing Performance:

First token latency: 300-800ms (varies by model)

Token throughput: 50-100 tokens/second

UI update frequency: Every token for text, batched for tool inputs

Memory usage: Constant regardless of response length

When the LLM requests tool use, the architecture shifts into execution mode:

{ if (finalAssistantMessage.message.stop_reason === "tool_use" && currentToolUsesFromLlm.length > 0) { yield { type: "ui_state_update", data: { status: "executing_tools" } }; let toolResultMessages: CliMessage[] = []; for await (const toolOutcomeMessage of processToolCallsInParallelBatches( currentToolUsesFromLlm, finalAssistantMessage, permissionGranterFn, toolUseContext )) { yield toolOutcomeMessage; if (toolOutcomeMessage.type === 'user' && toolOutcomeMessage.isMeta) { toolResultMessages.push(toolOutcomeMessage); } } if (toolUseContext.abortController.signal.aborted) { yield createSystemNotificationMessage("Tool execution aborted by user."); return; } const sortedToolResultMessages = sortToolResultsByOriginalRequestOrder( toolResultMessages, currentToolUsesFromLlm ); yield* tt( [...messagesForLlm, finalAssistantMessage, ...sortedToolResultMessages], baseSystemPromptString, currentGitContext, currentClaudeMdContents, permissionGranterFn, toolUseContext, undefined, { ...loopState, turnCounter: loopState.turnCounter + 1 } ); return; } }

The tt function is tail-recursive, allowing for unlimited conversation depth (bounded by safeguards):

if (loopState.turnCounter >= 10) { yield createSystemMessage( "Maximum conversation depth reached. Please start a new query." ); return; } const estimatedMemory = estimateConversationMemory(messagesForLlm); if (estimatedMemory > MEMORY_THRESHOLD) { const compacted = await forceCompaction(messagesForLlm); messagesForLlm = compacted; }

Claude Code implements a clean layered architecture where each layer has distinct responsibilities:

Communication between layers follows strict patterns:

Downward Communication: Direct function calls

Upward Communication: Events and callbacks

Cross-Layer Communication: Shared context objects

class UIToAgentBridge { async handleUserInput(input: string) { const action = await pd(input, this.context); switch (action.type) { case 'normal_prompt': for await (const message of tt(...)) { this.uiRenderer.handleMessage(message); } break; } } } class ToolToUIBridge { async *executeWithProgress(tool: ToolDefinition, input: any) { for await (const event of tool.call(input, this.context)) { if (event.type === 'progress') { yield { type: 'ui_progress', toolName: tool.name, progress: event.data }; } } } }

The entire system is built on streaming primitives:

class StreamBackpressureController { private queue: StreamEvent[] = []; private processing = false; private pressure = { high: 1000, critical: 5000 }; async handleEvent(event: StreamEvent) { this.queue.push(event); if (this.queue.length > this.pressure.critical) { this.queue = this.queue.filter(e => e.type === 'error' || e.type === 'message_stop' ); } else if (this.queue.length > this.pressure.high) { this.queue = this.queue.filter(e => e.type !== 'content_block_delta' || e.delta.type !== 'text_delta' ); } if (!this.processing) { await this.processQueue(); } } private async processQueue() { this.processing = true; while (this.queue.length > 0) { const batch = this.queue.splice(0, 100); await this.processBatch(batch); await new Promise(resolve => setImmediate(resolve)); } this.processing = false; } }

Multiple concurrent operations need coordinated progress reporting:

class ProgressAggregator { private progressStreams = new Map<string, AsyncIterator<ProgressEvent>>(); async *aggregateProgress( operations: Array<{ id: string, operation: AsyncIterator<any> }> ): AsyncGenerator<AggregatedProgress> { for (const { id, operation } of operations) { this.progressStreams.set(id, operation); } while (this.progressStreams.size > 0) { const promises = Array.from(this.progressStreams.entries()).map( async ([id, stream]) => { const { value, done } = await stream.next(); return { id, value, done }; } ); const result = await Promise.race(promises); if (result.done) { this.progressStreams.delete(result.id); } else if (result.value.type === 'progress') { yield { type: 'aggregated_progress', source: result.id, progress: result.value }; } } } }

Claude Code uses a pragmatic approach to state management:

class SessionState { private static instance: SessionState; sessionId: string = crypto.randomUUID(); cwd: string = process.cwd(); totalCostUSD: number = 0; totalAPIDuration: number = 0; modelTokens: Record<string, { inputTokens: number; outputTokens: number; cacheReadInputTokens: number; cacheCreationInputTokens: number; }> = {}; incrementCost(amount: number) { this.totalCostUSD += amount; this.persistToDisk(); } updateTokenUsage(model: string, usage: TokenUsage) { if (!this.modelTokens[model]) { this.modelTokens[model] = { inputTokens: 0, outputTokens: 0, cacheReadInputTokens: 0, cacheCreationInputTokens: 0 }; } const tokens = this.modelTokens[model]; tokens.inputTokens += usage.input_tokens || 0; tokens.outputTokens += usage.output_tokens || 0; tokens.cacheReadInputTokens += usage.cache_read_input_tokens || 0; tokens.cacheCreationInputTokens += usage.cache_creation_input_tokens || 0; } private async persistToDisk() { clearTimeout(this.persistTimer); this.persistTimer = setTimeout(async () => { await fs.writeFile( '.claude/session.json', JSON.stringify(this, null, 2) ); }, 1000); } }

class ReadFileState { private cache = new Map<string, WeakRef<FileContent>>(); private registry = new FinalizationRegistry((path: string) => { this.cache.delete(path); }); set(path: string, content: FileContent) { const ref = new WeakRef(content); this.cache.set(path, ref); this.registry.register(content, path); } get(path: string): FileContent | undefined { const ref = this.cache.get(path); if (ref) { const content = ref.deref(); if (!content) { this.cache.delete(path); return undefined; } return content; } } checkFreshness(path: string): 'fresh' | 'stale' | 'unknown' { const cached = this.get(path); if (!cached) return 'unknown'; const stats = fs.statSync(path); if (stats.mtimeMs !== cached.timestamp) { return 'stale'; } return 'fresh'; } }

Security is implemented through multiple independent layers:

class PermissionEvaluator { private ruleCache = new Map<string, CompiledRule>(); async evaluate( tool: ToolDefinition, input: any, context: ToolPermissionContext ): Promise<PermissionDecision> { const scopes: PermissionRuleScope[] = [ 'cliArg', 'localSettings', 'projectSettings', 'policySettings', 'userSettings' ]; for (const scope of scopes) { const decision = await this.evaluateScope( tool, input, context, scope ); if (decision.behavior !== 'continue') { return decision; } } return { behavior: 'ask', suggestions: this.generateSuggestions(tool, input) }; } private compileRule(rule: string): CompiledRule { if (this.ruleCache.has(rule)) { return this.ruleCache.get(rule)!; } const match = rule.match(/^(\\w+)(?:\$(.+)\$)?$/); if (!match) throw new Error(`Invalid rule: ${rule}`); const [, toolPattern, pathPattern] = match; const compiled = { toolMatcher: new RegExp( `^${toolPattern.replace('*', '.*')}$` ), pathMatcher: pathPattern ? picomatch(pathPattern) : null }; this.ruleCache.set(rule, compiled); return compiled; } }

class MacOSSandboxManager { generateProfile( command: string, restrictions: SandboxRestrictions ): string { const profile = ` (version 1) (deny default) ; Base permissions (allow process-exec (literal "/bin/bash")) (allow process-exec (literal "/usr/bin/env")) ; File system access ${restrictions.allowRead ? '(allow file-read*)' : '(deny file-read*)'} ${restrictions.allowWrite ? '(allow file-write*)' : '(deny file-write*)'} ; Network access ${restrictions.allowNetwork ? ` (allow network-outbound) (allow network-inbound) ` : ` (deny network*) `} ; System operations (allow sysctl-read) (allow mach-lookup) ; Temporary files (allow file-write* (subpath "/tmp")) (allow file-write* (subpath "/var/tmp")) `.trim(); return profile; } async executeSandboxed( command: string, profile: string ): Promise<ExecutionResult> { const profilePath = await this.writeTemporaryProfile(profile); try { const result = await exec( `sandbox-exec -p '${profilePath}' ${command}` ); return result; } finally { await fs.unlink(profilePath); } } }

class PathValidator { private boundaries: Set<string>; private deniedPatterns: RegExp[]; constructor(context: SecurityContext) { this.boundaries = new Set([ context.projectRoot, ...context.additionalWorkingDirectories ]); this.deniedPatterns = [ /\\/\\.(ssh|gnupg)\\//, /\\/(etc|sys|proc)\\//, /\\.pem$|\\.key$/, /\\.(env|envrc)$/ ]; } validate(requestedPath: string): ValidationResult { const absolute = path.resolve(requestedPath); const inBoundary = Array.from(this.boundaries).some( boundary => absolute.startsWith(boundary) ); if (!inBoundary) { return { allowed: false, reason: 'Path outside allowed directories' }; } for (const pattern of this.deniedPatterns) { if (pattern.test(absolute)) { return { allowed: false, reason: `Path matches denied pattern: ${pattern}` }; } } return { allowed: true }; } }

The ANR system uses a worker thread to monitor the main event loop:

class CacheArchitecture { private schemaCache = new LRUCache<string, JSONSchema>(100); private patternCache = new LRUCache<string, CompiledPattern>(500); private gitContextCache = new TTLCache<string, GitContext>(30_000); private fileContentCache = new WeakRefCache<FileContent>(); private diskCache = new DiskCache('.claude/cache'); async get<T>( key: string, generator: () => Promise<T>, options: CacheOptions = {} ): Promise<T> { if (this.schemaCache.has(key)) { return this.schemaCache.get(key) as T; } const weakRef = this.fileContentCache.get(key); if (weakRef) { return weakRef as T; } if (options.persistent) { const diskValue = await this.diskCache.get(key); if (diskValue) { return diskValue; } } const value = await generator(); if (options.weak) { this.fileContentCache.set(key, value); } else if (options.persistent) { await this.diskCache.set(key, value, options.ttl); } else { this.schemaCache.set(key, value as any); } return value; } }

The three-pillar approach provides comprehensive visibility:

class ErrorBoundary { static wrap<T extends (...args: any[]) => any>( fn: T, context: ErrorContext ): T { return (async (...args: Parameters<T>) => { const span = Sentry.startTransaction({ name: context.operation, op: context.category }); try { const result = await fn(...args); span.setStatus('ok'); return result; } catch (error) { span.setStatus('internal_error'); Sentry.captureException(error, { contexts: { operation: context, state: this.captureState() }, fingerprint: this.generateFingerprint(error, context) }); throw error; } finally { span.finish(); } }) as T; } private static captureState() { return { sessionId: SessionState.instance.sessionId, conversationDepth: , activeTools: , memoryUsage: process.memoryUsage(), eventLoopDelay: this.getEventLoopDelay() }; } }

class MetricsCollector { private meters = { api: meter.createCounter('api_calls_total'), tokens: meter.createHistogram('token_usage'), tools: meter.createHistogram('tool_execution_duration'), streaming: meter.createHistogram('streaming_latency') }; recordApiCall(result: ApiCallResult) { this.meters.api.add(1, { model: result.model, status: result.status, provider: result.provider }); this.meters.tokens.record(result.totalTokens, { model: result.model, type: 'total' }); } recordToolExecution(tool: string, duration: number, success: boolean) { this.meters.tools.record(duration, { tool, success: String(success), concurrent: }); } }

class FeatureManager { async checkGate( gate: string, context?: FeatureContext ): Promise<boolean> { return statsig.checkGate(gate, { userID: SessionState.instance.sessionId, custom: { model: context?.model, toolsEnabled: context?.tools, platform: process.platform } }); } async getConfig<T>( config: string, defaultValue: T ): Promise<T> { const dynamicConfig = statsig.getConfig(config); return dynamicConfig.get('value', defaultValue); } }

class ProcessManager { private processes = new Map<string, ChildProcess>(); private limits = { maxProcesses: 10, maxMemoryPerProcess: 512 * 1024 * 1024, maxTotalMemory: 2 * 1024 * 1024 * 1024 }; async spawn( id: string, command: string, options: SpawnOptions ): Promise<ManagedProcess> { if (this.processes.size >= this.limits.maxProcesses) { await this.killOldestProcess(); } const child = spawn('bash', ['-c', command], { ...options, env: { ...options.env, NODE_OPTIONS: `--max-old-space-size=${this.limits.maxMemoryPerProcess / 1024 / 1024}` } }); const monitor = setInterval(() => { this.checkProcessHealth(id, child); }, 1000); this.processes.set(id, child); return new ManagedProcess(child, monitor); } private async checkProcessHealth(id: string, proc: ChildProcess) { try { const usage = await pidusage(proc.pid); if (usage.memory > this.limits.maxMemoryPerProcess) { console.warn(`Process ${id} exceeding memory limit`); proc.kill('SIGTERM'); } } catch (e) { this.processes.delete(id); } } }

class NetworkPool { private pools = new Map<string, ConnectionPool>(); getPool(provider: string): ConnectionPool { if (!this.pools.has(provider)) { this.pools.set(provider, new ConnectionPool({ maxConnections: provider === 'anthropic' ? 10 : 5, maxIdleTime: 30_000, keepAlive: true })); } return this.pools.get(provider)!; } async request( provider: string, options: RequestOptions ): Promise<Response> { const pool = this.getPool(provider); const connection = await pool.acquire(); try { return await connection.request(options); } finally { pool.release(connection); } } }

This architecture analysis is based on reverse engineering and decompilation. The actual implementation may vary. The patterns presented represent inferred architectural decisions based on observable behavior and common practices in high-performance Node.js applications.