Columnar DB文件阅读器V2:完全重写

One of the main pillars of Mixpanel is our proprietary columnar store database, ARB, which we specifically designed to meet the needs of our customers. In this blog post, we delve into a comprehensive rewrite of the event reader code responsible for parsing the columnar files. The primary objective is to significantly enhance query performance, particularly for those with selective filters.

Mixpanel的主要支柱之一是我们专有的列式存储数据库ARB,我们专门设计它以满足我们客户的需求。在本博客文章中,我们深入探讨了负责解析列式文件的事件读取器代码的全面重写。主要目标是显著提高查询性能,特别是对于具有选择性过滤器的查询。

Given that the new abstractions deviated quite substantially from the old ones, we seized this opportunity to also migrate from C to C++ to modernize our codebase. With the introduction of the V2 reader, queries are now 12% faster on average, with some of the slower ones showing improvements of up to 75%.

鉴于新的抽象与旧的抽象相差很大,我们抓住这个机会从C语言迁移到C++语言,以使我们的代码库现代化。引入V2读取器后,平均查询速度提高了12%,其中一些较慢的查询速度提高了高达75%。

slower and more selective queries benefit the most from V2 implementation

较慢且更具选择性的查询最能从 V2 实现中受益

Reader V1

Reader V1

Let’s imagine a streaming service Mixpanel customer who wants to find the number of times users played romantic movies released before they were born. Typically, the user’s year of birth is fetched from user profiles and the movie year and genre from a lookup table by movie name/ID. However, for the sake of simplicity let’s assume they are just event properties for the time being. An eval node (a recursive tree-like structure for representing a selector expression) for the given filter would look like this:

假设有一个流媒体服务Mixpanel客户希望找到用户在他们出生前播放的浪漫电影的次数。通常,用户的出生年份从用户配置文件中获取,电影的年份和类型从电影名称/ID的查找表中获取。然而,为了简单起见,我们暂时假设它们只是事件属性。给定过滤器的eval节点(一种递归树状结构,用于表示选择器表达式)如下所示:

Filters’ root nodes always output a boolean. The properties eval nodes are highlighted in yellow

过滤器的根节点始终输出布尔值。属性的eval节点以黄色突出显示

In V1, the ARB reader first loads all necessary columns for filter evaluation via file memory mapping (mmap). It then proceeds to instantiate column cursor objects for each loaded column. Subsequently, the ARB reader iterates...

开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2024 iteam. Current version is 2.139.0. UTC+08:00, 2024-12-27 01:29
浙ICP备14020137号-1 $お客様$