使用压缩,卢克:通过一次代码更改削减20%的云成本!
Okay, yes, the title is a bit clickbaity — but stick with me because this is a real story about SRE work, cost optimization, Golang, and open source.
好的,是的,标题有点吸引眼球——但请继续,因为这是一个关于 SRE 工作、成本优化、Golang 和开源的真实故事。
An Introduction and Context
介绍与背景
I apologize for the clickbait title, but I promise this is a real story that gives you a glimpse into what SREs do daily at Booking.com. It’s based on a talk I gave at one of our internal engineering meetups, adapted for a blog format. Some numbers have been replaced with percentages to protect the innocent. Let’s start by introducing the main character of our story:
我为这个吸引眼球的标题道歉,但我保证这是一个真实的故事,让你一窥 SRE 在 Booking.com 的日常工作。这是我在我们内部工程会议上发表的一次演讲的基础,经过调整以适应博客格式。一些数字已被替换为百分比,以保护无辜者。让我们先介绍一下我们故事的主角:
Grafana Mimir
Grafana Mimir
Grafana Mimir is an open-source scalable metrics storage system that’s compatible with Prometheus. It’s a fork of the Cortex project and is governed by Grafana Labs. The Booking.com Observability team (which I’m proud to be part of) uses it to store metrics at scale. Mimir is a complex distributed system with multiple components, and we deploy & run on AWS Managed service Elastic Kubernetes Services (EKS).
Grafana Mimir 是一个 开源 可扩展的指标存储系统,兼容 Prometheus。它是 Cortex 项目的一个分支,由 Grafana Labs 管理。Booking.com 的可观察性团队(我很自豪能成为其中一员)使用它来大规模存储指标。Mimir 是一个复杂的分布式系统,具有多个组件,我们在 AWS 管理服务 Elastic Kubernetes Services (EKS) 上部署和运行。
Here’s a simplified excerpt from the Mimir architecture diagram showing the metrics write path:
这是 Mimir 架构图中显示指标写入路径的简化摘录:
(picture Copyright 2025 © Grafana Labs)
(图片版权 2025 © Grafana Labs)
Incoming writes go to a component called the distributor, which checks various limits and then sends the data to another component called the ingester. For resiliency, the distributor sends three copies of the data to three separate ingesters. Our Mimir installation is deployed in Amazon AWS, and these three ingesters are distributed across separate availability zones (AZ). Think of AZs as geographically close but independent data centers — this way, the system keeps r...