Database Copilot 在数据库领域的落地

1. Database Copilot 在数据库领域的应用李粒，PingCAP AI Lab

2.

3. Agenda • • • • 数据库运维领域的挑战 LLM 带来的机遇 PingCAP 的实践 Database Copilot 的未来挑战

4. 数据库运维领域的挑战

5. 来自业务的挑战数据规模增长更低的延迟在线分析故障恢复

6. 来自业务的挑战数据规模增长更低的延迟

7. 来自业务的挑战数据规模增长最佳实践更低的延迟 • 选用更合适的数据库 • 设计库表结构 • 索引优化 • 并发控制 • 内存管理 • Hint • Binding •…

8. 来自业务的挑战数据规模增长 TiDB Cloud 文档：582 TiDB 文档：1095 更低的延迟

9. 来自业务的挑战在线分析

10. 来自业务的挑战 HTAP PM Finance Operation 在线分析不熟悉 SQL Marketing

11. 来自业务的挑战故障恢复

12. 来自业务的挑战 Metrics 故障恢复 TiDB 1000+ Logs Queries Profiling 1000+/s Slow Queries SQL Statement TopSQL 1min

13. 来自业务的挑战数据规模增长更低的延迟在线分析数据库是一个复杂系统，难以被用户完全掌握故障恢复

14. LLM 带来的机遇

15. 过去的方法（传统 AIOPS）任务泛化能力 Rule-Based / ML / DL 变点分析相关性分析下钻分析 Rule-Based / DL / RL 异常分析故障分类 https://github.com/TsinghuaDatabaseGroup/AIDB/ 任务复杂度

16. LLM 带来的新能力 In-context Learning Coding / SQL Reasoning Function Call 任务泛化能力 LLM Capability （Corpus + API + Code + Loop） Rule-Based / DL / RL 任务复杂度

17. 一些概念

18. 概念 - LLM 应用类型 https://zhuanlan.zhihu.com/p/679371205 应用类型每一步的产出决定下一步怎么做行动的流程行动的目标 Wrapper 单次调用产出人类的业务代码人类的业务代码人类提供 Flow（DAG）多次调用产出 LLM 作为路由器，参与到流程判断中，判断需要使用的工具、召回、逻辑路径等。人类的业务代码人类提供 Agent（Loop）多次调用产出 LLM 可以重复进行某些循环步骤，以确保流程能够持续进行，或确保结果满足人类最初的目标。人类提供基本逻辑和人类提供提示，LLM 根据人类逻辑进行优化和调整，自主完成流程。 Autonomous Agent 多次调用产出 LLM 可以重复进行某些循环步骤，以确保流程能够持续进行，或确保结果满足人类最初的目标。 LLM 自主完成所有行动流程的初始化、工具接入、流程执行。人类提供 Silicon-based Life LLM 可以重复进行某些循环步骤，以确保流程能够持续进行，或确保结果满足人类最初的目标。 LLM 自主完成所有行动流程的初始化、工具接入、流程执行。基于反馈，LLM 自主发起和确认每一次行动的目标多次调用产出

19. 概念 - 什么是 Copilot？ https://www.emcap.com/thoughts/your-success-with-generative-ai-may-come-down-to-these-ux-decisions/

20. PingCAP 的实践应用场景 / 已有工作 / 数据飞轮

21. 数据库常见的 LLM 应用场景面向用户 ChatBot NL2SQL 数据库诊断

22. PingCAP AI Lab - LLM 技术逻辑框架

23. PingCAP LLM 应用的业务架构

24. Chatbot - 来自业务的挑战数据规模增长 TiDB Cloud 文档：582 TiDB 文档：1095 更低的延迟

25. Chatbot App - Flow

26. Chatbot App 的毒性检测对齐使人工通用智能（AGI）与人类价值观保持一致，并遵循人类意图。引自 OpenAI, 2022, Our approach to alignment research 有害内容当代文本生成模型能够生成有害语言，包括仇恨言论、侮辱、亵渎和威胁。这些危害通常被归类为“有害内容”这一总称。引自 Deepmind, 2021, Challenges in Detoxifying Language Models 毒性检测即插即用语言模型（PPLM）

27. Chatbot App 的毒性检测

28. Chatbot 的毒性检测

29. Chatbot App 的毒性检测

30. Chatbot App 的语料增强最初 300 个问题的反馈分类数量百分占比超出应答范围 9 8.74% 错误检索结果 60 58.25% 没有相关文档 23 22.33% LLM 幻觉 11 10.68% 总共 103 100%

31. Chatbot App 的语料增强 RAG Human: What’s TiDB Cloud? Retrieve Rank Chunk Score 1 TiKV is ... 0.91 2 TiDB is ... 0.87 3 TiFlash is ... 0.83 4 PD is ... 0.81 5 TiUP is ... 0.79 6 TiDB Cloud is ... 0.77

32. Chatbot App 的语料增强 ReRank Documentation Corpora Adjusted Question-Chunk Pairs Rank Chunk Score Rank Question Chunk Score 1 TiKV is ... 0.91 1 What is TiDB Cloud? TiDB Cloud is ... 1 2 TiDB is ... 0.87 2 ... ... 0.87 3 TiFlash is ... 0.83 3 ... ... 0.83 4 PD is ... 0.81 4 ... ... 0.81 5 TiUP is ... 0.79 5 ... ... 0.79 6 TiDB Cloud is ... 0.77 6 ... ... 0.77 合并与排序，获取总分 top 5 的语料

33. Chatbot App 的语料增强

34. Chatbot App - Flow

35. Chatbot - Copilot Type 在用户提问同时，获取到用户所在页面、集群信息，一起更合理的为用户达成目标。

36. Chatbot 效果 • TiDB Community 活跃提升 30%。 • 覆盖全渠道回答 TiDB 和 TiDB Cloud 相关问题。 • 点踩率低于 2%。 • 成为公司内员工学习 TiDB 和工作查询标配。

37. NL2SQL - 来自业务的挑战 HTAP PM Finance Operation 在线分析不熟悉 SQL Marketing

38. NL2SQL App - Agent

39. NL2SQL App 的 Schema 增强

40. NL2SQL App 的 Prompt Self-Ask（Ofir Press, 2022） • • • • • 改写用户提来的问题；抽取问题的关键词；抽取可能潜在的 repo、user 等。尝试对用户的问题进行分解，并自己回答。最终根据所有的思考逻辑，写出对应的 SQL。

41. NL2SQL App 的报错自动调整

42. NL2SQL App - Agent

43. NL2SQL - Copilot https://tiinsight.vercel.app/

44. NL2SQL 效果 • 终端用户问题的可执行率超过 95%，准确率超过 90%。 • 在内部用于业务人员的信息快速获取。

45. Diagnosis - 来自业务的挑战 Metrics Logs Queries Profiling 1000+/s Slow Queries SQL Statement TopSQL 1min 故障恢复 TiDB 1000+

46. Diagnosis App - Agent

47. Diagnosis App - Multi Agents

48. Diagnosis App - Agents Define Agents Define Planner Engineer Excutor Critic System Prompt Description Develop a plan based on the Task and Standard Operating Procedure (SOP). I am tasked with creating the plan. If we consider SOP as a class, then the plan serves as an instance I am only allowed to speak inmediately after `User` or `Critic`. of SOP for a specific Task. Only `Engineer` is allowed to speak immediately after `Planner`. Incorporate pertinent details from the Task into the steps of SOP. If `Planner` is succeeded by `Critic`, the `Critic` is not Ensure clarity, specificity, and unambiguity in each step of the permitted to directly utter "TERMINATE" at this time. plan to facilitate execution by the `Engineer`. … … I am responsible for execute the plan from `Planner`. Following `Engineer`, only `Executor` or `Critic` is allowed to Accountable for implementing plans of `Planner`. speak immediately. … If `Engineer` outputs "TERMINATE", only `Critic` can speak immediately. … I can only immediately speak when prompted by the `Engineer` for a function call. Execute only the functions explicitly specified by the Engineer; Post `Executor`, only the `Engineer` is permitted to speak refrain from engaging in additional tasks. immediately!!! … `Executor` is prohibited from speaking immediately after `Critic`!!! … After the `Engineer` outputs "TERMINATE", evaluate whether the Do not select me unless the final non-system-role message from final message resolves the tasks mentioned in the first message. `Engineer` concludes with "TERMINATE"!!! In case of affirmation, replicate the output `Engineer` and add Only `Planner` is allowed to speak after `Critic` immediately. "FINALLYTERMINATE" to the end. If there is no GAP, `Critic` duplicates the output `Engineer` If negative, provide details on the pending tasks. and appends 'FINALLYTERMINATE' at the end immediately. … …

49. Diagnosis App - SOP https://tidb.net/blog/3cf4615e https://tidb.net/blog/d3d4465f https://tidb.net/blog/5e10a92c

50. Diagnosis App - Multi-Agents & Loop

51. Diagnosis App - 产出稳定性 https://microsoft.github.io/autogen/blog/2024/02/11/FSM-GroupChat/ https://zhuanlan.zhihu.com/p/682218860

52. Diagnosis App - 知识库/SOP 的生成 https://github.com/pingcap/LinguFlow

53. Diagnosis - Alpha 版本未来会以 Copilot 形式，提供在 TiDB Cloud 中

54. Diagnosis 效果 • 支持对客户和技术支持的每轮对话进行相关知识搜索、现状分析和下一步建议。 • 海外社区问题所有的工单，完全解放人力。 • TiDB Cloud 工单的首轮回复，提高首单回复效率和整体的结单时间。 • 内部诊断系统的实时建议，缩短 MTTR 时间。

55. 数据飞轮 - 准确率以 NL2SQL 为例 • 不同行为会自动化异步优化： • 点赞、分享：加入到语料中。 • 点踩、修改 SQL：使用 Agent 对用户的提问进行多次理解，生成多个 SQL。对多个 SQL 进行执行，LLM 根据执行结果评分是否满足用户问题。评分最高的加入到语料中。 • 在生成多个 SQL 并执行之后，对 SQL 执行时长和执行计划进行评分，选择效率最高的执行计划。 • 使用 Agent 自动对热门 SQL 进行索引优化。

56. 数据飞轮 - Embedding Using Agents

57. 数据飞轮 - 测试以 NL2SQL 为例 • 正面的反馈进入到测试集中，保证未来的迭代不会破坏之前的用例。 • 负面的反馈，通过优化 Agents 的处理，将正确结果也存储到测试集中。 • 所有的运行结果，通过分类算法进行分类，使用问题增强方法对每一类问题补充对应的测试集。

58. PingCAP AI Lab - 技术栈拥抱社区 TiDB TiDB Vector LLamaIndex LinguFlow LangFuse AutoGen NL2SQL: Spider GPT Llama

59. PingCAP AI Lab - 框架扩展性 ChatBot NL2SQL Diagnosis More Scenarios 测试增强代码检查信息整理知识库管理

60. Database Copilot 的未来挑战

61. Database Copilot 的未来挑战人际交互完全融入工作流评价和优化对 Flow 和 Agent 知识积累超出人类效率诊断和运维自主探索根因并修复

62.

63.