Odin: Uber的有状态平台

Uber employs various technologies for data storage, including well-known open-source products such as Kafka, Cassandra, and MySQL, alongside internally developed solutions. In 2014, Uber underwent rapid expansion. Like many startups, the technology teams manually performed provisioning and maintenance operations using runbooks. This approach led to operational toil as storage demands rapidly increased. Uber created a technology-agnostic management platform called Odin to uplevel operational throughput through automation and allow the teams to manage thousands of databases effortlessly.

Uber使用各种技术进行数据存储,包括著名的开源产品如Kafka、Cassandra和MySQL,以及内部开发的解决方案。2014年,Uber经历了快速扩张。像许多初创公司一样,技术团队使用运行手册手动执行配置和维护操作。这种方法导致了运营负担,因为存储需求迅速增加。Uber创建了一个技术无关的管理平台Odin,通过自动化提高运营吞吐量,并使团队能够轻松管理数千个数据库。

The Odin platform aims to provide a unified operational experience by encompassing all aspects of managing stateful workloads. These aspects include host lifecycle, workload scheduling, cluster management, monitoring, state propagation, operational user interfaces, alerting, auto-scaling, and automation. Uber deploys stateful systems at global, regional, and zonal levels, and Odin is designed to manage these systems consistently and in a technology-agnostic manner. Moreover, Odin supports co-location to increase hardware cost efficiency. All stateful workloads must be fully containerized, a relatively novel and controversial concept when the platform was created.

Odin平台旨在通过涵盖管理有状态工作负载的所有方面,提供统一的运营体验。这些方面包括主机生命周期、工作负载调度、集群管理、监控、状态传播、运营用户界面、警报、自动扩展和自动化。Uber在全球、区域和区域级别部署有状态系统,Odin旨在以技术无关的方式一致地管理这些系统。此外,Odin支持共存以提高硬件成本效益。所有有状态的工作负载必须完全容器化,这是在创建该平台时相对较新且有争议的概念。

This blog post is the first of a series on Uber’s stateful platform. The series aims to be accessible and engaging for readers with no prior knowledge of building container platforms and those with extensive expertise. This post provides an overview of Odin’s origins, the fundamental principles, and the challenges encountered early on. The next post will explore how we have safely scaled operat...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-22 15:04
浙ICP备14020137号-1 $访客地图$