Pinterest的分析作为Druid的一个平台(第1部分,共3部分
Jian Wang, Jiaqi Gu, Yi Yang, Isabel Tallam, Lakshmi Narayana Namala, Kapil Bajaj | Real Time Analytics Team
王健,顾佳琪,杨毅,Isabel Tallam,Lakshmi Narayana Namala,Kapil Bajaj | 实时分析团队
In this blog post series, we’ll discuss Pinterest’s Analytics as a Platform on Druid and share some learnings on using Druid. This is the first of the blog post series with a short history on switching to Druid, system architecture with Druid, and learnings on optimizing host types for Mmap.
在这个博文系列中,我们将讨论Pinterest在Druid上的分析平台,并分享使用Druid的一些心得。这是该系列博文的第一篇,有关于切换到Druid的简短历史,用Druid的系统架构,以及关于优化Mmap的主机类型的学习。
A Short History on Switching to Druid
关于转为德鲁伊的简短历史
Historically, most of the analytical use cases in Pinterest were powered by Hbase, which was then a well-supported, key value store in the company. All the reporting metrics were precomputed in an hourly or daily batch job, transformed into a key value data model, and stored in Hbase. This approach worked fine for a while, but eventually the cons of the Hbase-based precomputed key value look up system became increasingly visible:
历史上,Pinterest的大部分分析用例都是由Hbase提供的,当时Hbase是公司里支持度很高的一个关键值存储。所有的报告指标都是在每小时或每天的批处理工作中预先计算出来的,转化为一个关键值数据模型,然后存储在Hbase中。这种方法在一段时间内运行良好,但最终基于Hbase的预计算键值查询系统的缺点变得越来越明显。
- The key value data model doesn’t naturally fit into the analytics query pattern, and more work is needed on the application side to do aggregation
- 关键值数据模型并不自然地适合分析查询模式,在应用方面需要做更多的工作来进行聚合
- Cardinality explodes any time a new column is added
- 每当有新的列加入时,cardinality就会爆炸。
- It’s too expensive to precompute all filter combinations, leading to limited filter choices in the UI
- 预先计算所有的过滤器组合过于昂贵,导致用户界面中的过滤器选择有限。
- Hbase cluster stability and operation cost became increasingly unmanageable as data sets grew larger
- 随着数据集的增大,Hbase集群的稳定性和运行成本变得越来越难以管理了
For all these reasons, we evaluated and decided to adopt Druid as Pinterest’s next-gen analytical data store. Since then, we have onboarded many critical use cases including reporting partner and advertiser business metrics, organic pin stats, experiment metrics, spam metrics analy...