一切都在它的书写位置。使用对象存储的云存储抽象
Dropbox originally used Amazon S3 and the Hadoop Distributed File System (HDFS) as the backbone of its data storage infrastructure. Although we migrated user file data to our internal block storage system Magic Pocket in 2015, Dropbox continued to use S3 and HDFS as a general-purpose store for other internal products and tools. Among these use cases were crash traces, build artifacts, test logs, and image caching.
Dropbox最初使用Amazon S3和Hadoop分布式文件系统(HDFS)作为其数据存储基础设施的骨干。虽然我们在2015年将用户文件数据迁移到我们的内部块存储系统Magic Pocket,但Dropbox继续使用S3和HDFS作为其他内部产品和工具的通用存储。在这些用例中,有崩溃痕迹、构建工件、测试日志和图像缓存。
Using these two legacy systems as generic blob storage caused many pain points—the worst of which was the cost inefficiency of using S3’s API. For instance, crash traces wrote many objects which were rarely accessed unless specifically needed for an investigation, generating a large PUT bill. Caches built against S3 burned pricey GET requests with each cache miss.
使用这两个遗留系统作为通用的blob存储引起了许多痛点--其中最糟糕的是使用S3的API的成本效率低。例如,崩溃追踪写了许多对象,除非调查特别需要,否则很少被访问,产生了大量的PUT账单。针对S3建立的缓存在每次错过缓存时都会产生昂贵的GET请求。
Looking at the bigger picture, S3 was simply an expensive default choice among many competitors—including our own Magic Pocket block store. What we really desired was the ability to expose a meta-store, transparently backed by different cloud providers’ storage offerings. As pricing plans, access patterns, and security requirements change over time and across use cases, having this extra layer would allow us to flexibly route traffic between options without migrations.
纵观全局,S3在众多竞争对手中只是一个昂贵的默认选择,包括我们自己的MagicPocket 块存储。我们真正想要的是能够公开一个元存储,透明地由不同的云提供商的存储产品支持的能力。由于定价计划、访问模式和安全要求随着时间和使用情况的变化而变化,有了这个额外的层,我们就可以在不同的选择之间灵活地安排流量,而不需要迁移。
Another desirable side effect of routing all blob traffic through a single service was centralization. At the service layer, we could provide additional features like granular per-object encryption, usage and performance monitoring, retention policies, and dedicated support for on-call upkeep.
通过单一服务路由所有blob流量的另一个理想的副作用是集中化。在服务层,我们可以提供额外的功...