Titus网关中的一致的缓存机制
by Tomasz Bak and Fabio Kung
作者 Tomasz Bak 和 法比奥-巩
Introduction
简介
Titus is the Netflix cloud container runtime that runs and manages containers at scale. In the time since it was first presented as an advanced Mesos framework, Titus has transparently evolved from being built on top of Mesos to Kubernetes, handling an ever-increasing volume of containers. As the number of Titus users increased over the years, the load and pressure on the system increased substantially. The original assumptions and architectural choices were no longer viable. This blog post presents how our current iteration of Titus deals with high API call volumes by scaling out horizontally.
Titus是Netflix的云容器运行时间,可以大规模地运行和管理容器。在它首次作为先进的Mesos框架提出后的这段时间里,Titus已经从建立在Mesos之上透明地发展到了Kubernetes,处理不断增加的容器数量。随着这些年Titus用户数量的增加,系统的负载和压力也大幅增加。原来的假设和架构选择已经不再可行了。这篇博文介绍了我们目前迭代的Titus如何通过横向扩展来处理高API调用量。
We introduce a caching mechanism in the API gateway layer, allowing us to offload processing from singleton leader elected controllers without giving up strict data consistency and guarantees clients observe. Titus API clients always see the latest (not stale) version of the data regardless of which gateway node serves their request, and in which order.
我们在API网关层引入了一个缓存机制,允许我们从单子领导选出的控制器中卸载处理,而不放弃严格的数据一致性和客户观察的保证。无论哪个网关节点为他们的请求提供服务,Titus API客户总是看到最新(而不是陈旧)的数据版本,而且是以何种顺序。
Overview
概述
The figure below depicts a simplified high-level architecture of a single Titus cluster (a.k.a cell):
下图描述了一个单一的Titus集群(又称单元)的简化高层结构。
Titus Job Coordinator is a leader elected process managing the active state of the system. Active data includes jobs and tasks that are currently running. When a new leader is elected it loads all data from external storage. Mutations are first persisted to the active data store before in-memory state is changed. Data for completed jobs and tasks is moved to the archive store first, and only then removed from the active data store and from the leader memory.
Titus工作协调员是一个领导选出的过程,管理系统的活动状态。活动数据包括当前正在运行的作业和任务。当一个新的领导者当选时,它从外部存储中加载所有数据。在改变内存状态之前,突变首先被持久化到活动数...