为Salesforce服务的入职SLO
At Salesforce, we operate thousands of services of various sizes: monolith and micro-services, both customer-facing and internal, across multiple substrates, i.e. first party and public cloud infrastructure. In our earlier blog “READS: Service Health Metrics,” we talked about the Service Level Objective (SLO) framework called READS that we developed at Salesforce to standardize SLO tracking for Salesforce services using a minimal set of indicators. Note that, at Salesforce, we consider SLO tracking for features as critical to our customers and our success as SLO tracking for the services that serve the features. So, from here on, when we use the word “service,” we mean both a service and feature. A natural question to ask in this context might be, how do we manage the SLO onboarding process for services at scale? How do we simplify the Developer Experience (DX) for service owners? What are some key takeaways from our approaches?
在Salesforce,我们运营着数以千计的各种规模的服务:单片机和微服务,包括面向客户的服务和内部服务,跨越多个基板,即第一方和公共云基础设施。在我们之前的博客 "READS:服务健康指标"中,我们谈到了名为READS的服务水平目标(SLO)框架,该框架是我们在Salesforce开发的,旨在使用一套最小的指标来规范Salesforce服务的SLO跟踪。请注意,在Salesforce,我们认为对功能的SLO跟踪与对服务于这些功能的服务的SLO跟踪一样,对我们的客户和我们的成功至关重要。因此,从这里开始,当我们使用 "服务 "这个词时,我们既指服务也指功能。在这种情况下,一个自然的问题可能是,我们如何大规模地管理服务的SLO入职过程?我们如何为服务所有者简化开发者体验(DX)?从我们的方法中可以得到哪些关键的启示?
Before we get to that, let us take a high-level look at the service ownership lifecycle at Salesforce. (We delve deeper into this topic in our recent post, “Transforming Service Reliability Through an SLOs-Driven Culture and Platform”) Service ownership encompasses the following steps:
在这之前,让我们先来看看Salesforce的服务所有权生命周期的高层次。(我们在最近的文章《通过SLOs驱动的文化和平台改造服务可靠性》中对这一主题进行了深入探讨)服务所有权包括以下步骤。
- Architect and define a service in a Service Registry
- 在服务注册中心中架构和定义服务
- Instrument the service to emit READS and other operational telemetry
- 为服务配备仪器,以发射READS和其他操作遥测信息
- Define SLIs/SLOs and SLO alerts for the service
- 定义服务的SLI/SLO和SLO警报
- Deploy services in production, monitor them, and visualize service health
- 在生产中部署服务,监控它们,并将服务健康状况可视化
- Analyze service health an...