CRISP:微服务架构的关键路径分析

CRISP: Critical Path Analysis for Microservice Architectures

Uber’s backend is an exemplar of microservice architecture. Each microservice is a small, individually deployable program performing a specific business logic (operation). The microservice architecture is a type of distributed computing system, which is suitable for independent deployments and scaling of software programs, and so is widely used across modern service-oriented industries. Uber has a few thousand microservices interacting with one another via remote procedure calls (RPC). 

Uber的后端是微服务架构的一个典范。每个微服务都是一个可单独部署的小程序,执行一个特定的业务逻辑(操作)。微服务架构是一种分布式计算系统,适用于软件程序的独立部署和扩展,因此在现代面向服务的行业中被广泛使用。Uber有几千个微服务,通过远程过程调用(RPC)相互作用。

A service request arriving at an entry point (aka end-point) to the Uber backend systems undergoes multiple “hops” through numerous microservice operations before being fully serviced. The life of a request results in complex microservice interactions. These interactions are deeply nested, asynchronous, and invoke numerous other downstream operations. As a result of this complexity, it is very hard to identify which underlying service(s) contribute to the overall end-to-end latency experienced by a top-level request. Answering this question is critical in many situations, for example: 

一个到达Uber后台系统入口点(又称端点)的服务请求,在被完全服务之前,要经过许多微服务操作的多次 "跳转"。一个请求的生命周期会导致复杂的微服务互动。这些互动是深度嵌套的,异步的,并调用许多其他下游操作。由于这种复杂性,很难确定哪些底层服务对顶级请求所经历的整体端到端延迟有贡献。在许多情况下,回答这个问题是至关重要的,例如。

  • Identifying optimization opportunities for a top-level microservice
  • 识别顶层微服务的优化机会
  • Identifying common bottleneck operations affecting many services
  • 识别影响许多服务的共同瓶颈业务
  • Setting appropriate time-to-live values for downstream RPC calls
  • 为下游的RPC调用设置适当的生存时间值
  • Diagnosing outages and error conditions
  • 诊断停电和错误状况
  • Capacity planning and reduction 
  • 能力规划和减少

While latency is one of the metrics of interest, other metrics such as time-to-live, error rates, etc., also fall in the scope.

虽然延迟是感兴趣的指标之一,但其他指标,如生存时间、错误率等,也属于这个范围。

We have developed a tool, CRISP (named taking letters from critical and span), to pinpoint and quantify underlying services that impact the overall latency...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.139.0. UTC+08:00, 2025-01-10 12:40
浙ICP备14020137号-1 $mapa de visitantes$