Strobelight:一个基于开源技术的分析服务
- We’re sharing details about Strobelight, Meta’s profiling orchestrator.
- 我们正在分享关于Strobelight,Meta的分析协调器的详细信息。
- Strobelight combines several technologies, many open source, into a single service that helps engineers at Meta improve efficiency and utilization across our fleet.
- Strobelight 将多种技术(许多是开源的)结合成一个单一服务,帮助 Meta 的工程师提高我们整个系统的效率和利用率。
- Using Strobelight, we’ve seen significant efficiency wins, including one that has resulted in an estimated 15,000 servers’ worth of annual capacity savings.
- 使用Strobelight,我们已经看到了显著的效率提升,包括一项估计节省了15,000台服务器年容量的成果。
Strobelight, Meta’s profiling orchestrator, is not really one technology. It’s several (many open source) combined to make something that unlocks truly amazing efficiency wins. Strobelight is also not a single profiler but an orchestrator of many different profilers (even ad-hoc ones) that runs on all production hosts at Meta, collecting detailed information about CPU usage, memory allocations, and other performance metrics from running processes. Engineers and developers can use this information to identify performance and resource bottlenecks, optimize their code, and improve utilization.
Strobelight,Meta 的性能分析协调器,实际上并不是一种技术。它是几种(许多是开源的)结合而成的,创造出真正惊人的效率收益。Strobelight 也不是单一的分析器,而是许多不同分析器(甚至是临时的)的协调器,运行在 Meta 的所有生产主机上,收集关于 CPU 使用、内存分配和其他运行过程的性能指标的详细信息。工程师和开发人员可以利用这些信息识别性能和资源瓶颈,优化他们的代码,并提高利用率。
When you combine talented engineers with rich performance data you can get efficiency wins by both creating tooling to identify issues before they reach production and finding opportunities in already running code. Let’s say an engineer makes a code change that introduces an unintended copy of some large object on a service’s critical path. Meta’s existing tools can identify the issue and query Strobelight data to estimate the impact on compute cost. Then Meta’s code review tool can notify the engineer that they’re about to waste, say, 20,000 servers.
当你将有才华的工程师与丰富的性能数据结合时,可以通过创建工具在问题到达生产之前识别问题以及在已经运行的代码中寻找机会来获得效率提升。假设一名工程师进行了代码更改,导致在服务的关键路径上意外复制了一些大型对象。Meta现有的工具可以识别该问题并查询Strobelight数...