Jellyfish:为Uber最大的存储系统提供具有成本效益的数据分层

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber deploys a few storage technologies to store business data based on their application model. One such technology is called Schemaless, which enables the modeling of related entries in one single row of multiple columns, as well as versioning per column.

Uber部署了一些存储技术来存储基于其应用模型的业务数据。其中一项技术被称为Schemaless,它能够在多列的单行中对相关条目进行建模,以及对每列进行版本管理。

Schemaless has been around for a couple of years, amassing Uber’s data. While Uber is consolidating all the use cases on Docstore, Schemaless is still the source of truth for different pre-existing customer pipelines. As such, Schemaless uses fast (but expensive) underlying storage technology to enable millisecond-order latency at high QPS. Furthermore, Schemaless deploys a number of replicas per region to ensure data durability and availability in the face of different failure models.

Schemaless已经存在了几年,积累了Uber的数据。虽然Uber正在整合Docstore上的所有用例,但Schemaless仍然是不同的预先存在的客户管道的真实来源。因此,Schemaless使用快速(但昂贵)的底层存储技术来实现高QPS下的毫秒级延迟。此外,Schemaless在每个区域部署了一些副本,以确保在面对不同故障模式时的数据耐久性和可用性。

Accumulating more data while using expensive storage, Schemaless has increasingly become a key concern for cost and thus required attention. To this end, we carried out measurements for understanding data access patterns. We found that data is frequently accessed for a period of time, after which it is accessed less frequently. The exact period varies from one use case to another, however, old data must still be readily available upon request.

积累更多的数据,同时使用昂贵的存储,无模式已日益成为成本的一个关键问题,因此需要关注。为此,我们进行了测量以了解数据访问模式。我们发现,数据在一段时间内被频繁访问,之后访问的频率会降低。确切的时期因不同的用例而异,然而,旧的数据仍然必须在要求下随时可用。

Requirements

要求

To sketch the right solution for the problem, we set out 4 main requirements:

为了勾勒出问题的正确解决方案,我们提出了4个主要要求。

Backward Compatibility

向后兼容

Schemaless has been around for so long that it is integral to many of Uber’s services and even the hierarchy of services. Consequently, changing the behaviour of existing APIs or introducing a new set of APIs were not options, since they would require a chain of changes across Uber product services, delaying la...

开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2025 iteam. Current version is 2.147.1. UTC+08:00, 2025-11-02 16:28
浙ICP备14020137号-1 $お客様$