OpenTelemetry用于Zalando的JavaScript可观测性

Cover - OpenTelemetry & Zalando

"What’s happening inside my application?" - an age-old question bothering anyone who deploys a software service. Packaging source code for an application makes it a black box for its users who can only interact with it through explicitly available APIs. Fortunately, we’ve had several developments in the field of observability in recent years that help us peek into this black box and react to anomalies.

“我的应用程序内部发生了什么?”-这是一个古老的问题,困扰着部署软件服务的任何人。将应用程序的源代码打包后,对于用户来说,它就成了一个黑盒子,他们只能通过明确可用的API与之交互。幸运的是,近年来在可观测性领域我们有了几项发展,可以帮助我们窥探这个黑盒子并对异常做出反应。

OpenTelemetry has become the widely-accepted open standard for application observability across the software engineering community. It evolved from the previous OpenTracing project which introduced standards for distributed tracing and brought all observability signals under one umbrella, introducing specifications and implementations. At Zalando as well, OpenTelemetry is the adopted standard for observability and our platform teams provide SDKs in several languages for engineers to instrument their applications.

OpenTelemetry已成为软件工程社区中被广泛接受的应用可观测性开放标准。它是从之前的OpenTracing项目发展而来,该项目引入了分布式跟踪的标准,并将所有可观测信号统一在一个框架下,引入了规范和实现。在Zalando,OpenTelemetry也是可观测性的采用标准,我们的平台团队提供了多种语言的SDK,供工程师为他们的应用程序进行仪器化。

For applications running in a JavaScript environment, the story was quite different though. We have a significant number of Node.js applications, and before 2023 the observability state of these applications was quite poor. During an incident, on-call responders would try to locate the root cause of the issue only to find some applications in the request flow having no instrumentation at all. In one specific, very interesting example, we had almost zero visibility into what the affected application was doing, which made understanding the root cause more difficult than it should be.

然而,对于在JavaScript环境中运行的应用程序来说,情况则大不相同。我们有大量的Node.js应用程序,在2023年之前,这些应用程序的可观测性状态非常糟糕。在事故发生时,值班响应人员会尝试定位问题的根本原因,结果发现一些应用程序在请求流程中根本没有进行仪表化。在一个特定的、非常有趣的例子中,我们几乎无法了解受影响的应用程序在做什么,这使得理解根本原因比应该更困难。

Often, the reason for the miss...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-22 05:32
浙ICP备14020137号-1 $访客地图$