从自定义到开放:使用 Prometheus 进行可扩展网络探测和 HTTP/3 就绪度
Rafael ElviraStaff Software Engineer, Infrastructure
Rafael Elvira资深软件工程师,基础设施
Sebastian FelicianoSoftware Engineer Intern, Infrastructure
Sebastian Feliciano软件工程师实习生,基础设施
Carlo PreciadoSoftware Engineer, Infrastructure
Carlo Preciado软件工程师,基础设施

The Problem: Legacy Tooling and Its Limitations
问题:遗留工具及其局限性
Currently, Slack utilizes a hybrid approach to network measurement, incorporating both internal (such as traffic between AWS Availability Zones) and external (monitoring traffic from the public internet into Slack’s infrastructure) solutions. These tools comprise a combination of commercial SaaS offerings and custom-built network testing solutions developed by our internal teams over time. This was a suitable enough solution for our needs.
目前,Slack 使用混合方法进行网络测量,结合了内部(例如 AWS Availability Zones 之间的流量)和外部(监控从公共互联网进入 Slack’s 基础设施的流量)解决方案。这些工具包括商业 SaaS 服务和我们内部团队多年来开发的自定义网络测试解决方案。这对我们的需求来说已经足够合适。
When we began rolling out HTTP/3 support on the edge, there was a significant challenge that we encountered: A lack of client-side observability.
当我们在边缘开始推出 HTTP/3 支持时,我们遇到了一个重大挑战:缺乏客户端可观测性。
Since HTTP/3 is built on top of the QUIC transport protocol, it uses UDP instead of the traditional TCP. This fundamental shift to a new transport meant that existing monitoring tools and SaaS solutions were not capable of probing our new HTTP/3 endpoints for metrics.
由于 HTTP/3 是建立在 QUIC 传输协议之上的,它使用 UDP 而非传统的 TCP。这种向新传输方式的根本转变意味着现有的监控工具和 SaaS 解决方案无法探测我们新的 HTTP/3 端点以获取指标。
At that time, there was a major gap in the market:
当时,市场上存在一个重大差距:
- None of the SaaS observability tools we investigated supported HTTP/3 probing out of the box.
- 我们调查的所有 SaaS 可观测性工具都不开箱即用地支持 HTTP/3 探测。
- Our internal Prometheus Blackbox Exporter (BBE), a cornerstone of our monitoring, didn’t have native support for QUIC.
- 我们内部的 Prometheus Blackbox Exporter (BBE),是我们监控的基石,没有原生支持 QUIC。
Without the ability to probe hundreds of thousands of HTTP/3 endpoints in our new infrastructure, we couldn’t get the client-side visibility we needed to monitor regressions to HTTP/2 or accurate rou...
