Meta 如何使用 AI 映射大规模数据管道中的部落知识
AI coding assistants are powerful but only as good as their understanding of your codebase. When we pointed AI agents at one of Meta’s large-scale data processing pipelines – spanning four repositories, three languages, and over 4,100 files – we quickly found that they weren’t making useful edits quickly enough.
AI 编码助手很强大,但它们的表现仅与其对代码库的理解程度相当。当我们将 AI 代理指向 Meta 的大型规模数据处理管道之一——跨越四个仓库、三种语言和超过 4,100 个文件——时,我们很快发现它们无法快速进行有用的编辑。
We fixed this by building a pre-compute engine: a swarm of 50+ specialized AI agents that systematically read every file and produced 59 concise context files encoding tribal knowledge that previously lived only in engineers’ heads. The result: AI agents now have structured navigation guides for 100% of our code modules (up from 5%, covering all 4,100+ files across three repositories). We also documented 50+ “non-obvious patterns,” or underlying design choices and relationships not immediately apparent from the code, and preliminary tests show 40% fewer AI agent tool calls per task. The system works with most leading models because the knowledge layer is model-agnostic.
我们通过构建一个预计算引擎来修复这个问题:50+ 个专门的 AI 代理群,它们系统地阅读每个文件,并生成了 59 个简洁的上下文文件,这些文件编码了之前仅存在于工程师头脑中的部落知识。结果:AI 代理现在为我们 100% 的代码模块提供了结构化的导航指南(从 5% 提高,覆盖三个仓库中的所有 4,100+ 个文件)。我们还记录了 50+ 个“non-obvious patterns”,或者 从代码中不易立即显现的底层设计选择和关系,初步测试显示每个任务的 AI 代理工具调用减少 40%。该系统适用于大多数领先模型,因为知识层是 model-agnostic 的。
The system also maintains itself. Every few weeks, automated jobs periodically validate file paths, detect coverage gaps, re-run quality critics, and auto-fix stale references. The AI isn’t a consumer of this infrastructure, it’s the engine that runs it.
系统还会自我维护。每隔几周,自动化任务会定期验证文件路径、检测覆盖空白、重跑质量审查,并自动修复过时的引用。AI 不是这个基础设施的消费者,而是驱动它的引擎。
The Problem: AI Tools Without a Map
问题:没有地图的 AI 工具
Our pipeline is config-as-code: Python configurations, C++ services, and Hack automation scripts working together across multiple repositories. A single data field onboarding touches configuration registries, routing logic, DAG composition, validation rules, C++ code gen...