利用持续测试管理基于服务的部署中的可用性
At Salesforce, trust is our number one value. What this equates to is that our customers need to trust us; trust us to safeguard their data, trust that we will keep our services up and running, and trust that we will be there for them when they need us.
在Salesforce,信任是我们的首要价值。这相当于我们的客户需要信任我们;信任我们会保护他们的数据,信任我们会保持我们的服务正常运行,信任我们会在他们需要我们的时候为他们提供服务。
In the world of Software as a Service (SaaS), trust and availability have become synonymous. Availability represents that percentage of time and/or requests successfully handled.
在软件即服务(SaaS)的世界里,信任和可用性已经成为同义词。可用性代表成功处理的时间和/或请求的百分比。
Availability can be calculated as the number of successful requests divided by the number of total requests.
可用性可以计算为成功请求的数量除以总请求的数量。
As a result, a few things become prevalent.
因此,有几件事变得很普遍。
- In order have high availability, you must have low mean time to recovery (MTTR).
- 为了拥有高可用性,你必须有低的平均恢复时间(MTTR)。
- In order to have low MTTR, you must have low mean time to detection (MTTD).
- 为了有低的MTTR,你必须有低的平均检测时间(MTTD)。
- How can we distinguish between server errors and client errors? Is our availability penalized for client errors?
- 我们如何区分服务器错误和客户端错误?我们的可用性是否因客户端错误而受到惩罚?
- How do we calculate the availability of our service in regards to dependent services? Should our availability metrics show when a dependent service is down?
- 我们如何计算我们的服务在附属服务方面的可用性?我们的可用性指标是否应该显示依赖服务发生故障时的情况?
The Solution
解决方案
In order to comprehensively tackle this issue, we implemented a three-pronged strategy for the Salesforce Commerce APIs: monitoring, continuous testing, and alerting. Additionally, availability is broken down into multiple categories to allow for pinpointing and tackling problematic areas.
为了全面解决这个问题,我们为Salesforce Commerce APIs实施了一个三管齐下的策略:监控、持续 测试和警报。此外,可用性被细分为多个类别,以便准确定位和处理有问题的区域。
Monitoring
监测
At the core of this problem is the ability to observe what is happening in the system. For a given API call, we need to know:
这个问题的核心是观察系统中正在发生什么的能力。对于一个特定的API调用,我们需要知道。
- The overall latency
- 总的延时
- The response code
- 响应代码
- The latency and response codes for calls to any dependent services
- 调用任何...