构建 AI 原生的全球化数据分析架构跨越实时、成本与合规鸿沟
如果无法正常显示,请先停止浏览器的去广告插件。
1. 演讲人:杨勇强
2. AI 时代的实时性、全球化、成本与复杂性鸿沟
AI 时代实时数据分析进化
03
AI 实时数据分析实践案例
3.
4. 01
AI 时代的实时性、全球化、成本与复杂性鸿沟
5. Apache Doris:Open-Source Real-Time Analytics and Search Database
20 13 Project Creation
13k+ GitH ub Stars
#1 Daily Page View s of Apache Project Websites
20 17 Open Source
640+
Contributors
2022 ASF Top Project
500 0+
#1 Monthly Activ e C ontributo rs
Enterprises
6. AI 时代的实时性、全球化、成本与复杂性鸿沟
AI 对数据实时性和查询速度要求越来越高
LLM 带来新的数据安全和合规风险
海量数据存储分析成本高昂
数据技术栈臃肿,维护复杂性高
7. 02
AI 时代实时数据分析进化
8. AI 时代数据分析进化
AI-boosted AI-driven
existing analytical workloads new analytical workloads
9. AI-boosted existing analytical workload
Faster, more real-time, more data, more unstructured text
Real-Time Analytics
Data Warehousing
Observability
10. Real-Time Analytics
From User-Facing Analytics to Agent-Facing Analytics, more real-time and faster
User-Facing Analytics
Agent-Facing Analytics
As cloud computing and SaaS software become more popular, With the rise of AI technologies, especially AI Agents, more
embedding analytics into applications is crucial. This is also called analytical decisions will be made automatically by AI. This will
customer-facing or user-facing analytics. improve efficiency and accuracy in decision-making.
Order Analytics | Advertising Analytics | Inventory Analytics Fraud Detection| Ad Serving| Personalized Recommendation
Agent-Facing
Analytics
- Real-Time
Ingestion & Update: ~ 1s minimum data latency
Why Choose Doris
- Blazing-Fast Analytics:
< 100ms average query latency
- High-Concurrent Queries:
- Agent Native: Doris
> 10,000QPS maximum query concurrency
MCP Server
11. Data Warehousing
From triditional warehouse to Lakehouse + Real-Time Analytics Database
Open Data Lakehouse
Lakehouse is designed for both
AI and analytics workload
Why Choose Doris
(Doris)
Build the Fastest Real-Time Data
Warehouse in the Lakehouse, replacing
Trino/Presto,SparkSQL,...
- Fast: 3x to Trino/Presto
- Open
- Unified
12. Observability
From Cloud Native + Microservice to LLM + AI Agent, more data and text
2013: The concept of observability began to gain traction as companies like Twitter started to adopt it to manage
History of Observability
their increasingly complex distributed systems.
2020: Observability became a hot topic in DevOps, with more companies recognizing its importance in managing
complex IT architectures
Metrics
Traces
- 10x Cost Effective Compared to Elasticsearch
Why Choose Doris
- Flexible Semi-Structured Data Variant Type
- Open Integration for ELK, OpenTelemetry, Grafana and more
Logs
13. AI-driven New Analytical Workload
AI-boosted AI-driven
existing analytical workloads new analytical workloads
14. Hybrid Search
Why Vector-Capable General-Purpose Databases are Better for Enterprise GenAI ?
- Lack of Hybrid Query Capabilities
- Limited Integration with Structured Data
- Operational Complexity and Increased Costs
- Large scale data volume, eg PBs
15. AI Operators in SQL
Integrating GenAI for Enhanced Text Analysis in Database
LLM_SUMMARIZE
LLM_TRANSLATE
LLM_SENTIMENT
LLM_CLASSIFY
LLM_GENERATE
LLM_EXTRACT
LLM_FILTER
LLM_SIMILARITY
LLM_FIXGRAMMAR
LLM_MASK
16. 03
实践案例
17. “基于 SelectDB 的新系统已接入 MiniMax 内部所有业务线日志数据,满足 10GB/s 高吞吐实时写入和秒级查询的需求
通过存算分离比自建 Doris 计算资源降低 40%,热数据存储资源降低 50%”
18. “腾讯音乐内容库基于 Doris 替换 ES 和 CK,同时满足搜索和分析的需求,存储成本降低 80%,写入性能提升 4倍,
增强大模型与 OLAP 交互效率、结果输出的准确率,最终提供更智能化的问答交互服务”
19. “ByteDance 基于 Doris 开发了向量索引和全文检索相关性打分,向量检索性能提升10倍
支持 HybridSearch,满足多个业务 RAG 和 GraphRAG 需求”
20.
21.