MAD-Logic- Multi-Agent Debate Enhances Symbolic Translation and Reasoning

如果无法正常显示，请先停止浏览器的去广告插件。

1. MAD-Logic: Multi-Agent Debate Enhances Symbolic Translation and Reasoning 汇报人：Fengxiang Cheng 美团搜索和推荐平台部

2. Logical Question-Answering (QA) Tasks Decide whether a statement can be logically deduced from the given information. n Premises: Metals conduct electricity. If something is made of iron, it is metal. Nails are made of iron. (in Symbolic Language) ∀𝑥 (𝑀𝑒𝑡𝑎𝑙(𝑥) → 𝐶𝑜𝑛𝑑𝑢𝑐𝑡𝑖𝑣𝑒(𝑥)). ∀x (Iron(x) → Metal(x)). Iron(Nails) n Question: Is the following statement true, false, or unknown? “Nails cannot conduct electricity.” (in Symbolic Language) ¬ 𝐶𝑜𝑛𝑑𝑢𝑐𝑡𝑖𝑣𝑒(𝑁𝑎𝑖𝑙𝑠) n Logical Reasoning Chain: nails → made of iron → metal → conduct electricity. n Answer: False.

3. Solver-Based Approaches 1. Prompt LLMs to translate natural language problems into symbolic formulas. 2. Leverage a corresponding logic solver to infer the answer 3. Generate the answer using ensemble methods, e.g. majority voting.

4. Prompt-Based Approaches • The second strategy uses prompts to guide LLMs in symbolizing natural language, reasoning step by step, and verifying results. LLM Input Premise + Conclusion Step 1: Translation Step 2: Plan and solve Step 3: Verify Output True or False

5. 基于提示的方法流程引导提示推理+翻译基于求解器的方法流程步骤1：翻译求解器步骤2：规划与求解翻译过程中无信息丢失强大的推理能力推理存在幻觉现象因翻译错误导致执行率低本方法动机将多智能体系统引入大型语言模型（LLM）逻辑推理 • 翻译阶段：智能体在自然语言（NL）到符号语言（SL）的翻译过程中进行交叉核验与协同编辑，拓宽信息瓶颈→提升翻译质量。 • 求解阶段：智能体结合答案+推理过程（求解器证明）进行辩论，随后通过多数投票→提高准确性与稳定性。 5

6. 翻译阶段先前的工作：将自然语言（NL）问题翻译成特定的符号语言（SL）（例如LP、FOL、SAT等）。然而，每种符号语言都有其自身的优缺点。因此，我们采用多智能体辩论的方式。动机及贡献：不同的符号语言能够捕捉到原始自然语言的不同重要特征。如果将同一自然语言翻译成多种符号语言，可以相互改进来提升彼此的翻译效果。推理阶段先前的工作：要么通过 SL 求解器进行神经符号推理，要么提示 LLM 进行神经符号推理动机及贡献：SL求解器--推理能力强，鲁棒性弱。LLM提示--鲁棒性强，推理能力弱。因此，我们引入多智能体辩论来综合两种方法的优势，从而在推理阶段实现最佳性能。 6

7. 方法流程图 7

8. 稀疏通信的高效算法稀疏通信算法 • 形式化符号翻译：将自然语言逻辑问题转为一阶逻辑符号表达式。 • 以偏好分数决定通信剪枝：该分数根据大模型的置信度比例、信息增益（通过与其他大模型输出的语义差异性来量化）加权计算得到。以此决定通信链路是否打开（即动态剪枝）。 • 选择性记忆更新：每个智能体根据上一轮的通信结果，选择性地将 “有价值”的信息整合到记忆库中。 • 聚合生成最终答案 8

9. 稀疏通信结构图 Instructional Prompt P1 Empty memory Instructional Prompt P2 Empty memory Instructional Prompt P3 Empty memory Instructional Prompt P1Instructional Prompt P2Instructional Prompt P3Instructional Prompt P1Instructional Prompt P2Instructional Prompt P3 ResponseResponseResponseResponseResponseResponse ResponseResponseResponseResponseResponseResponse ResponseResponseResponseResponseResponseResponse ResponseResponseResponse ResponseResponseAgent1’s Memory M1Agent2’s Memory M2Agent3’s Memory M3Agent1’s Memory M1Agent2’s Memory M2Agent3’s Memory M3Agent1’s Memory M1Agent2’s Memory M2Agent3’s Memory M3 agent1agent2agent3agent1agent2agent3agent1agent2agent3 ResponseResponseResponseResponse Compute Pre$!→# andResponseResponseResponseResponseResponse Compute Pre$!→# and $ 𝑂!→# for all (i,j) Compute Pre$!→# and $ 𝑂!→# for all (i,j) $ 𝑂!→# for all (i,j) &'% 𝑂%→% =1&'% 𝑂%→( =1&'% 𝑂%→) =1&'( 𝑂%→% =1&'( 𝑂%→( =1&'( 𝑂%→) =0&') 𝑂%→% =⋯&') 𝑂%→( =⋯&') 𝑂%→) =⋯ &'% 𝑂(→% =1&'% 𝑂(→( =1&'% 𝑂(→) =1&'( 𝑂(→% =0&'( 𝑂(→( =1&'( 𝑂(→) =0&') = ⋯ 𝑂(→%&') = ⋯ 𝑂(→(&') 𝑂(→) =⋯ $'% = 1 𝑂)→%$'% 𝑂)→( =1$'% = 1 𝑂)→)$'( 𝑂)→% =1$'( 𝑂)→( =0$'( 𝑂)→) =1$') 𝑂)→% =⋯$') 𝑂)→( =⋯$') 𝑂)→) =⋯ Round1 (d=1) Round2 (d=2) Round3 (d=3) 9

10. 合成数据集实验结果在多种模型的测试下，基于多智能体讨论的方法始终优于其他基线方法。 10

11. 真实世界数据集、更小模型的实验结果 11

12. 效率提升 • 在多种模型的测试下，我们的多智能体和稀疏方法始终优于其他基线方法。 • 在效率提升的同时，稀疏通信方法也优于全局通信方法。 12

13. 消融实验两阶段的辩论过程能取得最优成绩参与辩论的符号语言和推理方式种类最多时，能取得最优成绩 13

14. 案例分析 14

15. Q&A