Shopify's Path to a Faster Trino Query Execution: Custom Verification, Benchmarking, and Profiling Tooling

摘要

Data scientists at Shopify expect fast results when querying large datasets across multiple data sources. We use Trino (a distributed SQL query engine) to provide quick access to our data lake and recently, we’ve invested in speeding up our query execution time.

On top of handling over 500 Gbps of data, we strive to deliver p95 query results in five seconds or less. To achieve this, we’re constantly tuning our infrastructure. But with each change comes a risk to our system. A disruptive change could stall the work of our data scientists and interrupt our engineers on call.

That’s why Shopify’s Data Reliability team built custom verification, benchmarking, and profiling tooling for testing and analyzing Trino. Our tooling is designed to minimize the risk of various changes at scale.

Below we’ll walk you through how we developed our tooling. We’ll share simple concepts to use in your own Trino deployment or any other complex system involving frequent iterations.

欢迎在评论区写下你对这篇文章的看法。

评论

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.124.0. UTC+08:00, 2024-04-28 21:37
浙ICP备14020137号-1 $访客地图$