Use Compression, Luke: Cut 20% of the Cloud Cost with a Single Code Change!

Okay, yes, the title is a bit clickbaity — but stick with me because this is a real story about SRE work, cost optimization, Golang, and open source.

An Introduction and Context

I apologize for the clickbait title, but I promise this is a real story that gives you a glimpse into what SREs do daily at Booking.com. It’s based on a talk I gave at one of our internal engineering meetups, adapted for a blog format. Some numbers have been replaced with percentages to protect the innocent. Let’s start by introducing the main character of our story:

Grafana Mimir

Grafana Mimir is an open-source scalable metrics storage system that’s compatible with Prometheus. It’s a fork of the Cortex project and is governed by Grafana Labs. The Booking.com Observability team (which I’m proud to be part of) uses it to store metrics at scale. Mimir is a complex distributed system with multiple components, and we deploy & run on AWS Managed service Elastic Kubernetes Services (EKS).

Here’s a simplified excerpt from the Mimir architecture diagram showing the metrics write path:

Grafana Mimir write path architecture — traffic coming to distributor, then duplicate to 3 ingesters, then going to object storage, where it’s also accessible by compactor.

Incoming writes go to a component called the distributor, which checks various limits and then sends the data to another component called the ingester. For resiliency, the distributor sends three copies of the data to three separate ingesters. Our Mimir installation is deployed in Amazon AWS, and these three ingesters are distributed across separate availability zones (AZ). Think of AZs as geographically close but independent data centers — this way, the system keeps running even if one AZ goes down.

Components in Mimir (including distributors and ingesters) communicate using the gRPC protocol. I won’t dive into the details of what gRPC is, but for this story, all you need to know is that it’s a popular Remote Procedure Call (RPC) framework used in modern software development to build and scale distributed systems. One thing to note is that gRPC doesn’t use data compression by default, even though our incoming writes are already compressed using Snappy.

These two facts — no compression by default and sending three copies of the data — create an effect similar to write amplification in storage technologies. The internal traffic between distributors and ingesters ends up being 30–40 times larger than the incoming data!

That said, this is a known issue, and Grafana Labs recommends enabling gRPC compression in production systems. When we set up Mimir, we followed this advice and enabled gRPC compression using Snappy, as it’s the best compromise between speed and compression ratio.

Okay, enough context — let’s get to the story.

An Interesting Case of Cost Optimization

Cloud technologies have become the de-facto standard for hosting solutions, offering scalability and flexibility for deploying software. However, with the cloud comes a whole new set of challenges that weren’t as prominent in the days of bare metal servers. One of these challenges is cost optimization, also known as FinOps. As part of the Site Reliability Engineering (SRE) team at Booking.com, one of our responsibilities is to ensure we’re spending our budget wisely and not burning money unnecessarily.

So, one day, I opened the AWS Cost Explorer page for one of our Mimir installations and noticed something interesting:

AWS cost breakdown screenshot with the majority of the budget (47% on August 16) was spent on something called “EC2-Other”.

The majority of the budget (47% on August 16) was spent on something called “EC2-Other”. If you’re familiar with AWS, you might already have some guesses. EC2 is Amazon’s virtual machines service, so you might assume this is related to CPU or memory usage. But then, why is it labeled “other”?

Let’s dig deeper. On the right panel of the Cost Explorer, we can choose a new “Dimension.” If we select “Usage Type”, we can see what exactly “EC2-Other” refers to:

AWS cost breakdown screenshot with the majority of the budget (38% on August 16) was spent on something called “EUC1-DataTransfer-Regional-Bytes”.

Okay, so the majority of the “EC2-Other” cost comes from a usage type called “EUC1-DataTransfer-Regional-Bytes”. Still not clear? Don’t worry — after some Googling, I made an interesting discovery:

EC2-Other is cross-AZ Traffic!

Wait, what? Yes, really!

Remember, AWS availability zones (AZs) are separate locations, and AWS charges you for traffic that crosses AZ boundaries. How much? According to AWS pricing, it’s $0.01 per GB for each direction. That doesn’t sound too bad, right?

Well, not so fast. As I later found out, AWS’ documentation can be a bit misleading. The $0.01 per GB is charged per direction, which means per virtual machine. For example, if one of your Kubernetes nodes sends 1GB data to a node in another AZ, you’ll pay $0.01 USD for outgoing data AND at the same time another $0.01 USD for incoming data. So, the real price is $0.02 USD/GB. Now, imagine this happening with terabytes of internal data every month in a real production system. Ouch.

Compression to the Rescue

The issue is clear: we need to reduce cross-AZ traffic somehow. Some articles on the internet suggest ditching multi-AZ Kubernetes deployments and running everything in a single AZ. While this might save costs, it would drastically reduce resiliency — and AZ outages aren’t exactly rare in AWS.

When I checked the Mimir GitHub repo, I realized we weren’t alone in facing this issue. I stumbled upon issue 8522, where a Mimir user proposed using ZSTD compression between distributors and ingesters.

What is ZSTD?

ZSTD, or Zstandard, is a state-of-the-art lossless compression algorithm. It was developed at Facebook and open sourced in August 2016. ZSTD offers compression ratios comparable to GZIP but uses far fewer resources and is much faster, especially during decompression. It’s been widely adopted — integrated into the Linux kernel in 2017, added to Google Chrome in March 2024, and implemented in Cloudflare’s services in September 2024.

So, What’s the Catch?

There could be a few. In the case of gRPC compression, Grafana engineers reported severe memory allocation issues in tests. Additionally, the whole problem might become irrelevant with the upcoming Kafka-based Mimir architecture. That sounds promising, but implementing and polishing a new architecture could take months.

I decided not to wait. Instead, I rolled up my sleeves and tried implementing and testing ZSTD compression myself.

On the shoulder of giants.

How? As Sir Isaac Newton once said, “If I have seen further, it is by standing on the shoulders of giants.”

My thought process was simple: gRPC is open source, and so is ZSTD. Maybe someone has already implemented this before? And guess what — I found out that someone indeed had! I discovered that gRPC now supports a pluggable compression interface, and in a GitHub repository named go-grpc-compression there’s a perfect, Apache-licensed ZSTD gRPC compression implementation!

The rest wasn’t too hard. I took that implementation, integrated it into Mimir (well, technically into dskit, a library Grafana uses for all its projects), and tested it in one of our test Mimir clusters. Let’s take a look at the graph of network traffic between the distributor and ingester:

Screenshot of traffic bandwith graphs, significantly decreased after switching to ZSTD compression.

Wow! The results are impressive — as you can see traffic after enabling S2 (restart was initiated at 10:50, so, after stabilization — e.g. after 11:10 on both graphs) dropped to approximately 53% of its original volume! Additionally, CPU usage for both ingesters and distributors didn’t change much (at least at first glance), and the 99th percentile for write latency dropped from 90 ms to 50 ms. So, can we pop the champagne and start preparing this change for production?

Unfortunately, the memory graph shattered our hopes:

Screenshot of pod memory consumption, significantly increased after switching to ZSTD compression.

As you can see, the Grafana engineers were right — memory usage didn’t just increase; it exploded. Ingesters were consuming three times more memory than before! Initially, I wanted to test different ZSTD implementations. Mimir is written in Go, and ZSTD was originally written in C. However, we used a pure Go ZSTD implementation because Mimir doesn’t support CGO — meaning it can’t call C functions from Go code. So, that approach wouldn’t work.

Is this the end of the story? Not exactly.

S2 to the rescue

While exploring the go-grpc-compression repo, I stumbled upon another compressor in the experimental section: S2. Intrigued, I started reading more about it.

As it turns out, Klaus Post — the same engineer who implemented ZSTD and other compression algorithms in Go (famous for his work in the klauspost/compress repo) — published a Go implementation of a slightly tuned Snappy algorithm called S2 in 2019. You can check out his announcement: ANN: S2, Go data compression at GBs per second.

According to his article:

“S2 is an extension of Snappy. It can decode Snappy content, but S2 content cannot be decoded by Snappy. S2 is aimed at high throughput, so it features concurrent compression for bigger payloads.”

This sounded promising, so I decided to give it a try. Plus, it was already in the repo, and replacing ZSTD with S2 was a trivial task. I quickly built a new Mimir version and deployed it to the same test cluster.

Here’s the traffic graph:

Screenshot of traffic bandwith graphs, significantly decreased after switching to ZSTD compression and only slightly increased after switching to S2 compression

As you can see, S2 showed only a slightly worse compression ratio than ZSTD (the left part of the graph is Snappy, ZSTD is in the middle, and S2 is on the right). After crunching the numbers, I found that S2 reduced traffic by 45% compared to Snappy (ZSTD had achieved 54%) — still a very good result!

But what about CPU and memory? CPU usage didn’t change, which makes sense since S2 is based on Snappy internally. And memory usage? It actually dropped a bit, likely due to S2’s optimizations over the original Snappy implementation.

So, let’s talk about money

Okay, that was an impressive journey, but what about the real test — and, more importantly, the real budget impact? I deployed the patched Mimir binary to the same cluster I checked at the beginning of this article and waited a couple of days for the results:

Screenshot of AWS cost and usage report, showing significant usage decrease in the test cluster between September 2nd and September 9th.

As you can see above, EUC1-DataTransfer-Regional-Bytes dropped — exactly what we were aiming for! The rest of the parameters looked good too, so I decided to roll it out to one of our smaller production installations.

But then, you can imagine my disappointment when I saw this:

Screenshot of AWS cost and usage report, not showing significant usage decrease in the production cluster between September 2nd and September 9th.

In production, the traffic didn’t drop — it actually increased slightly!

However, the mystery was solved pretty quickly. I compared the data with the graph of incoming traffic for the same period:

Screenshot of graph of incoming traffic, showing significant increase between September 2nd and September 9th.

As you can see, between September 2 and September 9, the incoming traffic increased by roughly 45%. This means that S2 compression made that increase invisible, and the actual traffic savings matched my test results — around 45%!

The real monetary impact in both cases was also significant — around 20% of the entire budget!

And yes, that’s exactly what I promised in the clickbait title. But that’s not the end of our journey yet.

Road to upstream

Okay, that was an impressive journey, but what’s next? Running a patched version of Mimir is cool and all, but manually applying the patch before every upgrade is tedious and could significantly delay updates.

Luckily, after showing the graphs and results above, it was pretty easy to convince Grafana that this change could be included in Mimir as an experimental feature. Unfortunately for me, they decided to implement it differently, so my changes didn’t make it into the final implementation.

That said, S2 gRPC compression is now part of Mimir and was included in version 2.15.0, released in January 2025.

PS: ZSTD, again

I’m still subscribed to notifications for the initial issue 8522, even though it’s already closed. Recently, Mostyn Bramley-Moore, the author of the go-grpc-compression repo, mentioned that he’s testing a change to the ZSTD implementation that should help with memory consumption. I quickly tried to test it too, but for some reason, the unit tests in my code were failing.

So, I decided to tackle the problem differently — I implemented a new zstd gRPC compressor, mimicking the same S2 implementation that worked so well for us.

And the results were promising! For the same distributor/ingester traffic, memory consumption remained stable, and ZSTD reduced traffic by 22% compared to S2:

Screenshot of traffic bandwith graphs, significantly decreased after switching to ZSTD compression.

But this time, there was a clear trade-off: CPU consumption. Those 22% savings in traffic came at the cost of a 40% increase in CPU usage for the distributor component. Still, that’s better than GZIP, which costs 65% more CPU. The graph below shows CPU consumption normalized to S2 compression (left), with ZSTD using 40% more (middle) and GZIP using 65% more (right):

Screenshot of CPU consumption graph, significantly increased after switching to ZSTD compression, and increased even more after switching to gzip compresssion.

Even with the 40% higher CPU usage, ZSTD is still better than GZIP and can be beneficial for some installations — especially since cross-AZ traffic in AWS is crazy expensive. That’s why I created a PR for this change, and it’s been preliminarily approved.

Conclusion

Cloud computing is very different from bare metal infrastructure — it’s crucial to understand what you’re paying for.

Be bold in experimenting.
Feel confident reusing code from experienced developers.
Don’t be afraid to contribute to open-source projects.
Be courageous in challenging the status quo.

If you want to learn more about how compression algorithms work, check out the 90-minute YouTube playlist titled Introducing Compressor Head.