从HBase到TiDB的在线数据迁移,零停机时间

Ankita Girish Wagh | Senior Software Engineer, Storage and Caching

Ankita Girish Wagh | 高级软件工程师,存储和缓存

A library with five levels connected by white staircases. Blue couches are on each level, some have readers sitting on them with a book. Bookcases are white filled with rows and rows of books.

Introduction and Motivation

简介和动机

At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. cannot serve the needs of our clients for the next 3–5 years. This is due to high operational cost, excessive complexity, and missing functionalities like secondary indexes, support for transactions, etc.

在Pinterest,HBase是最关键的存储后端之一,为许多在线存储服务提供动力,如Zen(图形数据库)、UMS(宽列数据存储)和Ixia(近实时的二级索引服务)。HBase生态系统虽然具有各种优势,如在大批量请求中的行级强一致性、灵活的模式、对数据的低延迟访问、Hadoop集成等,但在未来3-5年内无法满足我们客户的需求。这是由于高运营成本,过度的复杂性,以及缺失的功能,如二级索引,对交易的支持等。

After evaluating 10+ different storage backends and benchmarking three shortlisted backends with shadow traffic (asynchronously copying production traffic to non production environment) and in-depth performance evaluation, we have decided to use TiDB as the final candidate for Unified Storage Service.

在评估了10多个不同的存储后端,并通过影子流量(异步复制生产流量到非生产环境)和深入的性能评估对三个入围的后端进行基准测试后,我们决定使用TiDB作为统一存储服务的最终候选人。

The adoption of Unified Storage Service powered by TiDB is a major challenging project spanning over multiple quarters. It involves data migration from HBase to TiDB, design and implementation of Unified Storage Service, API migration from Ixia/Zen/UMS to Unified Storage Service, and Offline Jobs migration from HBase/Hadoop ecosystem to TiSpark ecosystem while maintaining our availability and latency SLA.

采用由TiDB驱动的统一存储服务是一个跨越多个季度的重大挑战项目。它涉及到数据从HBase迁移到TiDB,统一存储服务的设计和实施,API从Ixia/Zen/UMS迁移到统一存储服务,以及离线作业从HBase/Hadoop生态系统迁移到TiSpark生态系统,同时保持我们的可用性和延迟SLA。

In this blog post, we will first learn the various approaches considered for data migration with their trade offs. We will then do a deep dive on how the d...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.1. UTC+08:00, 2024-05-17 19:50
浙ICP备14020137号-1 $访客地图$