Cadence多租户任务处理

Cadence Multi-Tenant Task Processing

Cadence is a multi-tenant orchestration framework that helps developers at Uber to write fault-tolerant, long-running applications, also known as workflows. It scales horizontally to handle millions of concurrent executions from various customers. It is currently used by hundreds of different teams within Uber to support their business logic. As we onboard different use cases to Cadence, resource isolation became a pressing issue. Bursty traffic or workflow tasks from one customer usually consumes all the resources in a cluster, causing significant task processing delays for others. This is especially common and undesirable when batch processing jobs were triggered during month-ends, causing delays in human interactive and latency-sensitive workflows. Moreover, the traffic in the clusters may become so heavy that it ultimately overloads our downstream database.

Cadence是一个多用户协调框架,帮助Uber的开发人员编写容错的、长期运行的应用程序,也称为工作流。它可以横向扩展,处理来自不同客户的数百万并发执行。目前,Uber内部有数百个不同的团队使用它来支持他们的业务逻辑。随着我们将不同的用例纳入Cadence,资源隔离成为一个紧迫的问题。来自一个客户的突发流量或工作流任务通常会消耗集群中的所有资源,导致其他客户的任务处理出现重大延误。当批处理任务在月末被触发时,这种情况尤其常见,也不可取,因为它造成了人类互动和延迟敏感的工作流的延迟。此外,集群中的流量可能变得非常大,以至于最终使我们的下游数据库超载。

A typical solution to resource isolation is to apply rate limiters to APIs. For example, one may rate limit the speed of starting or signaling workflows, or even the rate at which they make progress by controlling certain worker-related APIs. However, this solution alone is not enough for Cadence. When processing workflows, Cadence generates tasks that need to be processed asynchronously, thus directly rate limiting customer requests will not help throttle the background task processing load. Consider the case where one customer gradually schedules a large number of timers to be fired at the same time: the external requests RPS is quite low, but when all the timers start to fire, the background task processing load will explode.

资源隔离的一个典型解决方案是对API应用速率限制器。例如,人们可以通过控制某些与工人相关的API来限制工作流的启动或信号的速度,甚至是工作流的进展速度。然而,对于Cadence来说,仅有这种解决方案是不够的。在处理工作流时,Cadence产生的任务需要异步处理,因此直接限制客户请求的速率将无助于节流后台任务处...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-23 01:34
浙ICP备14020137号-1 $访客地图$