动态化与参数化 RAG 技术探索

如果无法正常显示，请先停止浏览器的去广告插件。

相关话题： #RAG

1. 动态化与参数化RAG 技术探索艾清遥

2. 目录 00 背景与动机 01 生成中的动态信息需求建模 02 检索与生成的动态信息解耦 03 基于参数化知识注入的检索增强 04 未来展望

4. 大语言模型的崛起 • 作为信息工具，大模型有诸多优点： • 自然的用户交互方式 • 自然语言理解与推理能力 • 强大的任务泛化能力获取1亿用户所用时间

5. 大语言模型的崛起 • 与传统检索系统相比，大模型也存在致命缺陷： • 严重的幻觉问题 • 无法追溯的信息来源 • 极高的成本和极低的灵活性

6. 检索增强的生成范式 RAG • 检索增强的生成范式（RAG ）是解决大语言模型上述问题的最有效方式 “What is RAG?” RAG stands for Retrieval- Augmented Generation. It’s an AI framework that combines information retrieval and text generation to improve the quality and accuracy of generated content. How RAG Works:…

7. 检索增强生成与知识注入 • 检索增强生成的本质是一种对生成式大模型进行外部知识注入的方式 User Input Do Do c Do Do Do Do c Do c Do c Do Do c c Do c Do c c c Do c c c LLM prompt context Retrieval Systems Do Do c cDo c instructio n Respon se

8. 检索增强生成与知识注入 • 大模型何时需要外部知识的注入？ • 常见方法 • 默认大模型时刻都需要外部知识 • 让用户点选是否大模型需要外部知识 • “ Deep Research ” • 人工定义工作流，并按工作流进行检索调用 • 把检索功能当成工具，并训练大模型使用 User Input ? Retrieval Systems LLM prompt context Do Do c cDo c instructio n

9. 检索增强生成与知识注入 ? • 大模型如何生成合适的查询来检索外部知识？ • 常见方法 • 直接把用户输入当成查询词 User Input • “ Deep Research ” • 人工定义查询词生成工作流 • 提示或者训练大模型来提出查询词 LLM prompt context Retrieval Systems Do Do c c Do c instructio n

10. 检索增强生成与知识注入 • 如何将检索到的外部知识注入大模型？ • 常见方法 • 直接把外部文档放进提示词 • “ Deep Research ” • 人工定义工作流进行多步拆解 • 最终将外部知识文字放进提示词执行 User Input LLM prompt context Retrieval Systems Do Do c cDo c instructio n ?

11. 检索增强生成与知识注入 • 本质上，已有的检索增强技术均把大模型当成一个静态黑盒使用 User Input LLM 问题： instructio ， prompt 和信息流动是否可以深入探索大模型的内部状态 n context Do Do 构造出更加性能更好且效率更高的 Do c Do Do Do Respon c Do Do c Do Do Retrieval Do 检索增强框架呢？ Do c Do c c Do c Do c c c c Do c c c Systems c c se

12. 动态化与参数化的检索增强生成 • 深入探索大模型的内部状态，实现动态化、参数化的高效外部知识注入 • 在且只在大模型需要时进行检索 LLM prompt context • 分析大模型的内部激活情况 X [Su et al. 2024] [Quevedo et al.] [Herrlein et al.] … instructio n Hallucinati on! • 量化生成过程中的不确定性大小 • 实时监测可能幻觉的生成 [Su et al. 2024] [Ji et al. 2024] [Song et al. 2024] Respon se

13. 动态化与参数化的检索增强生成 • 深入探索大模型的内部状态，实现动态化、参数化的高效外部知识注入 • 精准建模大模型需求的查询信息 LLM prompt context query Retrieval Systems • 实时捕捉大模型推理过程中的信息需求 instructio n Ask this! Emm, this is what I need next [Su et al. 2024][Dong et al.2025] • 基于大模型的内部状态构建查询词/向量 [Su et al. 2024][Dong et al.2025] [Jiang et al.2024]

14. 动态化与参数化的检索增强生成 • 深入探索大模型的内部状态，实现动态化、参数化的高效外部知识注入 Do Do c cDo c LLM prompt context Retrieval Systems User Input instructio n • 准确且高效的将外部知识动态注入生成过程 • 检索与生成过程的解耦运行 [Dong et al.2025] • 参数化的知识模块构建 [Su et al. 2025] Respon se

15. 01 生成中的动态信息需求建模

16. 检索增强的静态范式与动态范式检索增强范式分类 : • 静态范式 : • 在生成过程开始前，基于用户指令或者提前定义好的智能体工作流进行检索每次生成过程激活并只激活一次检索过程 . • 动态范式 : • 在生成过程之中，根据大模型的需求动态调用检索过程每次生成过程可以不激活或激活多次检索过程

17. 静态检索生成的局限简单信息任务：大模型的信息需求可以根据初始指令直接确定。复杂信息任务：大模型的信息需求可能在推理过程中发生变化。指令: 撰写一篇关于阿根廷在卡塔尔世界杯胜利的长篇评论。 LLM: 阿根廷在 2022 年卡塔尔世界杯的胜利是一场凝聚了戏剧性、激情与……的辉煌。XXX 曾说过：世界上只有一种真正的英雄主义，那就是看清世界的本来面目，并热爱它。初始信息需求： • 阿根廷队 • 卡塔尔世界杯推理过程中的信息需求 : “世界上只有一种真正的英雄主义..." → 这句话出自于谁?

18. 动态检索生成核心问题：如何在推理过程中发现并解决大型语言模型的信息需求？ • 在生成过程中何时触发检索？ • 一旦触发检索，如何判断目标信息并生成对应查询？

19. DRAGIN: 基于大模型信息需求的动态检索增强生成框架 • 在大语言模型生成的过程中，实时根据其内部状态（如logits 、 token熵和注意力分布等）对模型的信息需求进行建模。 [Su et al. 2024]

20. DRAGIN: 基于内部状态监测的检索时机预测 𝑣:词表中一个token 𝑖:位置 𝑡 𝑖 :在位置 𝑖 的token 𝑑 𝑘 :向量维度 𝑆:停用词表不确定性重要性语义价值混合计算 Su et al. 2024 当 𝑆 𝑅𝐼𝑁𝐷 𝑡 𝑖 > 𝜃阈值时进行检索调用

21. DRAGIN: 基于注意力分布的动态检索查询生成步骤一：提取每个token在网络最后一层的注意力得分。步骤二：根据注意力得分对token进行排序，并选择前 𝑛 个标记。步骤三：使用前 𝑛 个token构建查询。步骤四：将检索到的知识添加到提示词上下文中，并让大型语言模型从截断点继续生成内容。 [Su et al. 2024]

22. DRAGIN: 实验结果性能显著高于其他使用和不使用 RAG的模型框架

23. DRAGIN: 效率分析 DRAGIN 检索模块触发频率： • 低于RALM 和IR- CoT • 基于固定规则确定检索调用时机 • 高于FLARE • 仅基于token不确定性的触发 [Su et al. 2024]

24. DRAGIN ：部分消融实验检索时机判断检索查询生成 • 更好的检索时机带来更高性能 • 更好的查询生成带来更高性能用当前已生成的最后一个句子当成查询用DRAGIN 的方法判断检索时机

25. 02 检索与生成的动态信息解耦

26. 静态检索增强框架中的知识注入 • 传统RAG ：基于上下文的知识注入 • 提前定义指令模板 • 使用用户问题检索文档 Prompt Template You are a helpful assistant. Your response must be in JSON format: {“answer”: {response}}. Who is the President of the United States? Online computation • 将问题与文档放入提示词中 • 将提示词输入大模型来激活生成过程 External knowledge LLM The current President of the United States is Joe Biden. He is...... Ignored instruction Responded slowly Factual correct

27. 静态检索增强框架中的知识注入 • 效率方面： Prompt Template • 存在上下文长度瓶颈 You are a helpful assistant. Your response must be in JSON format: {“answer”: {response}}. • 文档可能较长 • 推理成本增加（FLOPs ） • 响应时间延长（时间） • 频繁打断生成与推理 • 文档必须输入到提示中 • 每次检索都必须停止生成，并在注入检索内容后重头开始 • 消耗大量重复的token Who is the President of the United States? External knowledge Online computation LLM The current President of the United States is Joe Biden. He is...... Ignored instruction Responded slowly Factual correct

28. 静态检索增强框架中的知识注入 • 性能方面 • 信息损失 • 模型退化 • “ Lost in the middle” • 输入顺序敏感 • 干扰大型语言模型的内部推理 • 影响指令遵循能力 Prompt Template You are a helpful assistant. Your response must be in JSON format: {“answer”: {response}}. Who is the President of the United States? External knowledge 是否可以将检索知识注入与上下文输入解耦，实现动态且实时的检索增强？ Online computation LLM The current President of the United States is Joe Biden. He is...... Ignored instruction Responded slowly Factual correct

29. 检索与生成的过程解耦 Prompt Template You are a helpful assistant. Your response must be in JSON format: {“answer”: {response}}. • 我们的目标 • 将检索的知识文本与输入提示词解耦。 Who is the President of the United States? Online computation LLM External knowledge Knowledge aggregation via cross attention {“answer”: {Joe Biden}} Factual correct Followed instruction Responded swiftly Offline knowledge encoding Knowledge representations LLM

30. 利用交叉注意力机制实现动态知识注入 Knowledge Encoding ... Token Legend Prompt Template Offline Caching You are a helpful assistant. Your response must be in JSON format: {“answer”: {response}}. LLM LLM ... LLM Knowledge Representation Database Token from instruction Token from quesion ... Who is the President of the United States? Last token in context Knowledge Injection Knowledge Aggregation Tokens from distinct external documents Layer L LLM ... Self Attention Operation Layer l ... Layer 1 {“answer”: {Joe Biden}} Cross Attention Operation Trainable Parameters Next token w.r.t. external knowledge Next token w.r.t. internal context LLM Next token aggregated w.r.t. both Factual correct Followed instruction Responded swiftly

31. DecoupledRAG ：检索与生成的动态信息解耦 • 知识编码 • 将外部知识预先编码为键值对表示形式（KV Representation） Knowledge Encoding ... Prompt Template Offline Caching You are a helpful assistant. Your response must be in JSON format: {“answer”: {response}}. LLM LLM ... Knowledge Representation Database ... Who is the President of the United States? Knowledge Injection Layer L Knowledge Aggregation Self Attention LLM

32. DecoupledRAG ：检索与生成的动态信息解耦 • 通过交叉注意力进行知识聚合 • 利用交叉注意力注入相关知识的键值表示。 Knowledge Injection LLM Layer L ... Knowledge Aggregation Self Attention Operation Layer l ... Layer 1 LLM Cross Attention Operation Trainable Parameters

33. DecoupledRAG ：检索与生成的动态信息解耦上下文注入 (传统静态RAG) 我们的方法 (DecoupledRAG ）效率在线文档编码离线文档编码扩展性串行文档处理并行文档处理鲁棒性输入顺序敏感混淆文档与指令输入输入顺序不敏感独立文档与指令输入流

34. DecoupledRAG ：实验设置 • 基线模型 • RAG FT [Jin et al. 2025]: • 根据数据集微调RAG 模型. • 实验数据集 • 多条问答 • 2WikiMultihopQA, ComplexWebQuestions • 完形填空 • Zero - Shot RE, T- REx • 基于知识的对话生成 • Wizard of Wikipedia (WoW) • 测试模型 • Llama3 - 8B - Instruct • Llama2 - 7B - Chat • 实现细节: • 文档库: Wikipedia • 文档分块:256 词 • 检索模型: RetroMAE

35. DecoupledRAG ：实验结果 • 多任务性能比较 • 当注入更多知识时，DecoupledRAG 的表现显著优于RAG FT (Fine- Tuning) 文档越多，提升越大

36. DecoupledRAG ：效率分析 • 传统静态RAG • 在线编码过程 • 复杂度随文档数平方增长 N: 文档数量 |D|: 文档长度 |A|: 输出回答的长度 • Decoupled RAG • 离线编码过程 • 复杂度随文档数线性增加

37. DecoupledRAG ：效率分析 VanillaRAG : 在线编码所有文档知识 OfflineDecoupledRAG 离线编码文档知识 : Online DecoupledRAG 在线编码文档知识 :

38. 03 参数化知识注入的检索增强

39. 大模型的知识架构 • 前馈神经网络 (FFN) • 内部知识存储 • 推理能力的基础 [Nanda et al.2023, Yu and Ananiadou 2024] • 注意力网络 • 外部知识编码 • 动态上下文建模

40. 基于注意力网络的知识注入 • 基于上下文提示词的知识注入 • 通过提示词输入将检索到的文档注入大模型的注意力网络 • 几乎所有现有的检索增强生成（RAG ）方法都依赖于上下文知识的注入。 Retrieved Documents User Input Question Prompt Template {Retrieved Documents} …… ----------------------------------------------------------------------- Answer the following Question based on the provided information: Question: {Question} Original LLM LLM Weight: 𝜽 Response

41. 基于注意力网络的知识注入 Retrieved Documents 除了提示词输入，是否可以直接将外部知识 LLM 注入大模型的内部参数? Question 𝜽 User Input Prompt Template Original LLM {Retrieved Documents} …… ----------------------------------------------------------------------- Answer the following Question based on the provided information: Question: {Question} Weight: • 基于上下文提示词知识注入的缺点 • 模型上下文长度限制 • 额外的token计算开销 • 结构性缺陷： • 大模型或许永远无法像使用内部知识那样利用外部知识 Response

42. 参数化检索增强生成 Parametric RAG Traditional RAG: Inject Retrieved Documents to the Input Context Retrieved Documents Prompt Template User Input Fill in the Prompt Question Template Original LLM {Retrieved Documents} --------------------------------- ------ ------ ------ ------ ------ ------ -- Answer the following Question based on the provided information: Question: {Question} ······ Tokenize ······ ······ LLM Weight: 𝜽 Response

43. 参数化检索增强生成 Parametric RAG Traditional RAG: Inject Retrieved Documents to the Input Context Prompt Template Retrieved Documents Fill in the Prompt Template Original LLM {Retrieved Documents} --------------------------------- ------ ------ ------ ------ ------ ------ -- Answer the following Question based on the provided information: Question: {Question} User Input ······ Tokenize ······ ······ LLM Response Weight: 𝜽 让大模型像使用内部知识一样使用外部知识 Question Merged Document Representation Merge the Corresponding Parametric Representations of Documents Original LLM LLM Weight: 𝜽 Parameters ∆𝜽 = 𝒇(𝒌, 𝜽) 将文档直接注入前馈神经网络 𝜽 LLM Weight: 𝜽 ′ 𝜽 ′ = 𝜽 + ∆𝜽 Parametric RAG: Inject Retrieved Documents to the LLM’s Parameter Response

44. Parametric RAG ：核心框架 • 离线文档编码 • 构建文档的参数化表示 • 线上推理 • 检索相关文档，合并文档参数，生成最终回复 Merged Document Representation Original LLM User Input LLM Question Weight: 𝜽 ∆𝜽 = 𝒇(𝒌, 𝜽) 𝜽 LLM Weight: 𝜽 ′ 𝜽 ′ = 𝜽 + ∆𝜽 Response

45. Parametric RAG ：离线文档编码 • 两步流程： • 文档数据增广 • 构造文档平行语料和问答样例 • 文档参数编码 • 基于增广数据学习文档的LoRA 表示 Document Augmentation Rewrite 𝒅 𝒊 Input LLM … Weight: 𝜽 Parametric Document Encoding 𝒒 𝟏 𝒂 𝟏 𝒒 𝟐 𝒂 𝟐 Generate 𝒒 𝟏 , 𝒂 𝟏 … QA Pairs 𝒒 𝒎 , 𝒂 𝒎 𝒒 𝒎 … 𝒂 𝒎 𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐞𝐝 𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭 𝑫 𝒊 𝒑 𝒊 Fine-tuning LLM Using 𝑫 𝒊 Weight: 𝜽 Random LoRA Weight Parametric Representation of 𝒅 𝒊

46. Parametric RAG ：离线文档编码 • 文档数据增广 • 文档重写: • 保证事实信息不变的情况下生成平行语料. • 问答样例生成: • 依据原始文档，利用大模型生成潜在问题和答案样例. Document Augmentation Rewrite 𝒅 𝒊 Input LLM … Weight: 𝜽 Parametric Document Encoding 𝒒 𝟏 𝒂 𝟏 𝒒 𝟐 𝒂 𝟐 Generate 𝒒 𝟏 , 𝒂 𝟏 … QA Pairs 𝒒 𝒎 , 𝒂 𝒎 𝒒 𝒎 … 𝒂 𝒎 𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐞𝐝 𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭 𝑫 𝒊 Fine-tuning LLM Using 𝑫 𝒊 Weight: 𝜽 Random LoRA Weight

47. Parametric RAG ：离线文档编码 • 文档参数编码 • 利用可插拔的高效微调模块构建（如LORA ）。 • 基于next token predictionloss训练1个epoch. ocument Augmentation te … Parametric Document Encoding 𝒒 𝟏 𝒂 𝟏 𝒒 𝟐 𝒂 𝟐 rate 𝒒 𝟏 , 𝒂 𝟏 … airs 𝒒 𝒎 , 𝒂 𝒎 𝒒 𝒎 … 𝒂 𝒎 𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐞𝐝 𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭 𝑫 𝒊 𝒑 𝒊 Fine-tuning LLM Using 𝑫 𝒊 Weight: 𝜽 Random LoRA Weight Parametric Representation of 𝒅 𝒊

48. Parametric RAG ：线上推理 • 三步过程 • 检索: 检索top k文档，并获取对应文档的参数化表示 • 更新: 合并文档参数，并用其更新大模型参数 • 生成: 基于更新后的大模型生成答案

49. Parametric RAG ：线上推理 • 多文档信息聚合 + +

50. Parametric RAG ：实验设置 • 基线模型 • • • • 标准 RAG DARAG: 标准 RAG + 数据增强模块 FLARE [Jiang et al.2023] DRAGIN [Su et al.2024] • 实验数据集 • 2WikiMultihopQA • Hotpot QA • Complex Web Questions • Pop QA • 测试模型 • LLaMA - 3- 8B - Instruct • LLaMA - 3.1- 1.5B- Instruct • Qwen - 2.5- 1.5B- Instruct • 实现细节: • 文库: Wikipedia • 检索模型: BM25, top- 3文档

51. Parametric RAG ：实验结果

52. Parametric RAG ：实验结果优势来源于参数化编码，而不是简单的数据增强

53. Parametric RAG ：实验结果参数化方式与上下文方式能力互补，结合后性能更佳

54. Parametric RAG ：效率分析 • 理论复杂度分析 |D|: 文档长度 |Q|: 用户输入长度 h: 模型隐藏层大小 • 离线编码成本 : • 数据增强与LoRA 模块训练复杂度: Ο( 𝑑 2 ℎ + |𝑑|ℎ) • 约等效于解码 12|𝑑| token的成本. • 在线编码成本: • Parametric RAG 推理复杂度: Ο( 𝑞 2 ℎ + |𝑞|ℎ 2 ) • 没有上下文输入成本，效率显著高于传统RAG 方案

55. Parametric RAG ：效率分析 • 实验复杂度分析 • PRAG 有效降低推理时延: 相比传统RAG 提速29 % –36 % • 融合传统方法和参数化方法后: • 效率与传统RAG 相当 • 任务性能显著提升 Inference time comparison of LLaMA3-8B across different baselines, measuring online latency per question.

56. 04 未来展望

57. ？未来展望（生成式）人工智能时代信息管理时代互联网时代时代塑造着信息检索，信息检索也塑造着时代 1960 1990 2020

58. 未来展望学习能力认知能力思维同步记忆能力打通知识结构交流壁垒 ⚫ 检索模型与生成模型的参数共享 ⚫ 参数化信息检索增强生成 ⚫ 基于检索需求的生成模型输出对齐 ⚫ … 构建高效可持续学习框架检索能力逻辑能力感知记忆布鲁德曼大脑分区创造能力通用智能体系结构设计 ⚫ 多模块协同架构设计 ⚫ 检索与生成性能联合优化 ⚫ 复杂用户意图拆解与模块对齐 ⚫ 动态性能与效率平衡 ⚫ … ⚫ 基于参数与非参优化的知识编辑技术 ⚫ 考虑信息数据与知识结构特性的学习路由自动化定制与应用适配 ⚫ 用户反馈的收集、过滤、利用机制 ⚫ 自动化定制应用评价体系 ⚫ … ⚫ 基于任务描述的工作流规划 ⚫ 利用用户反馈的提示词改写提升动态分析与规划能力思维行为协同 ⚫ 实时模型信息需求识别 ⚫ 基于非参优化的系统持续迭代 ⚫ 分阶段的信息任务拆解 ⚫ … ⚫ 基于实时检索结果的思维链重构 ⚫ …

59.

60. THANKS 大模型正在重新定义信息检索的发展而信息检索也在重新塑造大模型的未来 Website: www.thuir.cn Twitter: @thuir Linkedin: THUIR