从档案到访问:配置驱动的数据管道
Uber processes vast amounts of data daily—across multiple verticals—using technologies like Apache Hadoop™, Apache Hive™, and Apache Spark™. Each data team at Uber must operate within resource constraints while managing ever-growing data volumes. Our team, CDS (Compliance Data Store) serves as Uber’s central repository for regulatory reporting. We share data with regulators in accordance with applicable laws and requirements.. Moreover, managing this extensive data poses significant challenges, as we rely on HDFS™ (Hadoop Distributed File System), which imposes storage quotas like NSQ (Namespace Quota) for file and directory limits and a space quota for overall storage capacity.
Uber每天处理大量数据——跨多个垂直领域——使用Apache Hadoop™、Apache Hive™和Apache Spark™等技术。Uber的每个数据团队必须在资源限制内运作,同时管理不断增长的数据量。我们的团队,CDS(合规数据存储)作为Uber的监管报告中央存储库。我们根据适用的法律和要求与监管机构共享数据。此外,管理这些庞大的数据面临重大挑战,因为我们依赖HDFS™(Hadoop分布式文件系统),它对文件和目录限制施加存储配额,如NSQ(命名空间配额)以及整体存储容量的空间配额。
In 2021 our team managed 65 regulatory reports, consuming terabytes of storage. By Q2 2024, this number surged to over 500 reports majorly covering areas related to trips across a given jurisdiction, significantly increasing resource consumption. Although existing solutions could archive and retrieve data, they often risked data mutation, especially during backfills, which isn’t ideal for regulatory and audit purposes. Additionally, retrieving smaller partitions and range-based retrieval wasn’t feasible with the existing solutions, complicating efficient data access.
在2021年,我们团队管理了65份监管报告,消耗了数TB的存储。到2024年第二季度,这一数字激增至500多份,主要涵盖与特定管辖区内的旅行相关的领域,显著增加了资源消耗。尽管现有解决方案可以归档和检索数据,但它们往往面临数据变异的风险,尤其是在回填期间,这对于监管和审计目的并不理想。此外,现有解决方案无法实现较小分区的检索和基于范围的检索,复杂化了高效的数据访问。
To address this, we implemented an archival and retrieval mechanism to efficiently store, retrieve, and manage sensitive regulatory data. The archival mechanism ensures compliance by retaining past submitted data, while the retrieval mechanism allows this data to be accessed whenever needed. This solution enables Uber, particularly the Compliance ...