How Pinterest Leverages Honeycomb to Enhance CI Observability and Improve CI Build Stability

Oliver Koo | Staff Software Engineer

Optimizing Mobile Builds and Continuous Integration Observability at Pinterest with Honeycomb

At Pinterest, our mobile infrastructure is core to delivering a high-quality experience for our users. In this blog, I’ll showcase how the Pinterest Mobile Builds team is leveraging Honeycomb (starting in 2021) to enhance observability and performance in our mobile builds and continuous integration (CI) workflows.

Building a Data-Driven Approach to Observability

Our mobile builds team relies on Honeycomb(™) as a robust data engine to visualize build metrics, analyze trends, and make data-driven decisions. From tracking build times to categorizing errors, Honeycomb(™) empowers us with critical insights into our CI workflows, enabling us to proactively address issues and optimize performance.

We’ve built dashboards that establish baseline metrics, monitoring key CI indicators like build times, pipeline success rates, and cluster usage for both iOS and Android builds. While many data platforms or CI providers can offer these capabilities out of the box, the real magic happens when we need to go deeper — when trends look abnormal, or when nuanced analysis is required to uncover hidden issues.

This is where Honeycomb truly excels. Its intuitive query builder makes slicing and dicing data seamless, enabling us to drill into granular details with ease. Features like derived columns let us create dynamic metrics on the fly, while its blazing-fast performance ensures that even with 1 million events sent daily[1] just for our CI build dataset, most queries are completed in under a second.

This unparalleled visibility transforms how we understand and improve our CI processes. We can pinpoint bottlenecks, diagnose issues in near real-time, and implement improvements faster than ever before — all with the confidence that we’re making informed, data-driven decisions. Honeycomb doesn’t just give us data, it gives us clarity.

Using Honeycomb to Pinpoint Problematic Builds and Analyze Root Causes

Here are a couple of examples of how I used Honeycomb to uncover interesting patterns in our CI builds and pinpoint bottlenecks.

Spotting Bottleneck Jobs in Builds

When querying our build counts along with p95 and p50 build times in a CI pipeline, I noticed two distinct scenarios:

On the left, there’s a spike in build count, but the p95 and p50 metrics remain unchanged. Since build times aren’t impacted, there’s no need to investigate further, allowing me to save time and focus elsewhere.
On the right, the build volume stays consistent, but there’s a noticeable spike in the p95 build time. This deviation is worth investigating further.

[2]

By clicking into the specific build causing the spike, I can view the build trace. In Honeycomb terms, a “trace” represents a complete unit of work for one or more services in an environment. In our case, the trace corresponds to a CI build, with child spans representing individual jobs within that build. These spans can include child traces for job steps, such as script execution or other tasks within a job.

[3]

The trace view revealed that one job — “super secretive tests” — was taking significantly longer to complete, becoming the bottleneck and causing the spike in p95 build time. Since one slow build isn’t enough to move the p95 metric, I hypothesized that similar slowdowns were occurring across other builds. To investigate further, I searched for the Buildkite URLs using the web_url attribute in Honeycomb to analyze more builds directly in Buildkite.

You might notice this trace view is very similar to the “Waterfall View” that Buildkite introduced in 2023. However, we continue to use Honeycomb’s trace view for several reasons:

Seamless Integration with Honeycomb: The trace view integrates directly with Honeycomb, allowing us to seamlessly transition from analyzing build trends to zooming into specific builds for a deeper dive.
Flexibility and Customization: Honeycomb’s trace view gives us the ability to break down Buildkite builds into more than just jobs, but into specific segments such as agent wait time and script execution. It allows us to log and analyze the parts of the build and job that are most relevant to our workflows, such as the execution of various build hooks or environment setups. We can even go deeper and instrument build scripts to log the build time of specific segments within the script. To illustrate my points, I created a demo image below using dummy data. This image demonstrates an example build where each Buildkite job is broken down into a sequence of executions within the job. Additionally, inside a Bazel build script, we instrumented the process to log the execution time of specific Bazel targets. If desired, you could even log the build time for each target individually. The possibilities are endless!

[4]

Later we can aggregate these segments to answer questions like, “What is the average repo cloning time across different pipelines?” or, “What are the p50 and p95 times for Bazel build and test stages in my PR pipeline?” These are valuable observability metrics that can help your team prioritize optimizations, reduce build times, and improve overall developer productivity.

3. Established Habits: We’ve been using Honeycomb’s trace view since 2021, long before the Waterfall View was introduced. By now, it’s become a familiar and trusted part of our process.

These advantages make Honeycomb’s trace view an invaluable tool for understanding our CI processes, diagnosing issues, and improving efficiency.

Using Correlation to Identify Potential Root Causes

Honeycomb’s correlation feature is another game changer. It allows us to overlay query results with other dashboards, creating a breadcrumb trail to identify abnormalities or outliers.

For instance, I observed a spike in p95 build times for iOS CI jobs. Using correlation, I compared the p95 data to CI cluster usage graphs and noticed a simultaneous spike in job wait times. Honeycomb’s synchronized dotted line across graphs confirmed the alignment, leading to a strong hypothesis: long CI agent wait times were causing the build time spike.

[5]

From there, I clicked into the build trace to confirm my hypothesis. Sure enough, the trace revealed that the build experienced unusually long wait times for CI agents. By sampling additional builds from the same time period, I could confirm the root cause and focus on solutions.

[6]

Without Honeycomb, conducting this type of investigation would be incredibly tedious — requiring a manual, build-by-build analysis. Honeycomb provides a holistic view that allows you to quickly pinpoint root causes, saving time and effort while improving our CI process efficiency.

Error Categorization: Deeper Insight into Build Failures and Streamlining On-Call

One of our recent initiatives with Honeycomb is error categorization for mobile builds. While still in its early stages, the results have been promising. Our primary goals are:

Deeper Insight into Build Failures: CI build failures can stem from various causes, such as compilation errors, flaky tests, or network issues. By analyzing logs and extracting specific errors, we’ve identified the top contributors to CI instability. This insight allows us to prioritize resources and address critical issues more effectively.
Streamlining On-Call and Reducing Noise: Historically, our team was notified of every CI issue, regardless of the root cause. With error categorization, we can now classify failure types in real time and automate alerts, routing them to the appropriate team’s on-call channel. This streamlines on-call duties and minimizes interruptions. For instance, test failures now automatically notify the responsible team without requiring our intervention.

While the system is still being refined, it has already proven to be a valuable tool for enhancing CI management efficiency. The diagram below illustrates the architecture of our error categorization system, showcasing how we integrate Buildkite logs with Honeycomb by leveraging AWS EventBridge and the Buildkite Jobs API.

A Versatile Tool Beyond CI Metrics

While Honeycomb(™) is essential for CI observability, its applications extend beyond build metrics. Teams across Pinterest use it to gain real-time insights into performance data and tailor observability to their needs.

For instance, we track iOS local build metrics alongside machine details in Honeycomb(™), which helps us prioritize laptop upgrades for developers. Another use case involves analyzing Android Develocity build data (read more about this in another Pinterest Engineering blog post).

Looking Forward

At Pinterest, we’re continuously improving our build processes, and Honeycomb(™) has been a crucial partner in this journey. We’re excited to explore new use cases and expand our data-driven observability practices, enabling our teams to focus on delivering exceptional user experiences.

Sources

[1] Pinterest Internal Data

[2] Pinterest Internal Data

[3] Pinterest Internal Data

[4] Pinterest Internal Data

[5] Pinterest Internal Data

[6] Pinterest Internal Data

To learn more about engineering at Pinterest, check out the rest of our Engineering Blog and visit our Pinterest Labs site. To explore and apply to open roles, visit our Careers page.