Pinterest的分析作为Druid的一个平台(第三部分)。

Jian Wang, Jiaqi Gu, Yi Yang, Isabel Tallam, Lakshmi Narayana Namala, Kapil Bajaj | Real Time Analytics Team

王健,顾佳琪,杨毅,Isabel Tallam,Lakshmi Narayana Namala,Kapil Bajaj | 实时分析团队

This is a three-part blog series. Click to read part 1 and part 2.

这是一个由三部分组成的博客系列。点击阅读第一部分第二部分

In this blog post series, we are going to discuss Pinterest’s Analytics as a Platform on Druid and share some learnings on using Druid. This is the third of the blog post series, and will discuss learnings on optimizing Druid for real-time use cases.

在这个博文系列中,我们将讨论Pinterest在Druid上的分析平台,并分享一些关于使用Druid的学习经验。这是该系列博文的第三篇,将讨论关于优化Druid的实时使用案例的学习。

Learnings on Optimizing Druid for Real Time Use Cases

针对实时用例优化Druid的经验之谈

When we first brought Druid to Pinterest, it was mainly used to serve queries for batch ingested data. Over time, we have been shifting to a real-time based reporting system to make metrics ready for query within minutes of arrival. Use cases are increasingly onboarded to a lambda architecture, with streaming pipelines on Flink, in addition to the source of truth batch pipeline. This creates a big challenge for the reporting layer on Druid: the biggest real-time use cases we onboarded has the following requirement: the upstream streaming ETL pipeline produces to the kafka topic Druid consumes with over 500k QPS, and expects the ingestion delay on Druid to be within one min. The expected query QPS is ~1,000 and P99 latency is ~250 ms.

当我们第一次把Druid带到Pinterest时,它主要用于为批量摄入的数据提供查询服务。随着时间的推移,我们一直在向基于实时的报告系统转变,使指标在到达后几分钟内就可以查询。用例越来越多地采用lambda架构,除了真相来源的批处理管道外,还在Flink上采用流式管道。这给Druid的报告层带来了巨大的挑战:我们接手的最大的实时用例有如下要求:上游的流式ETL管道以超过500k的QPS产生到Druid消费的kafka主题,并希望Druid的摄入延迟在1分钟之内。预期的查询QPS是~1,000,P99延迟是~250ms。

In memory bitmap

内存中的位图

While onboarding the use case, we first ran into bottlenecks on serving the real time segments running on peon processes on middle managers. Initially, considerable numbers of hosts were added in order to meet the SLA, but the infra cost soon became non-linear to the gain. In addition, with so many task counts and replicas, Ov...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.3. UTC+08:00, 2024-05-20 03:14
浙ICP备14020137号-1 $访客地图$