资产管理平台中的数据再处理管道 @Netflix
Overview
概述
At Netflix, we built the asset management platform (AMP) as a centralized service to organize, store and discover the digital media assets created during the movie production. Studio applications use this service to store their media assets, which then goes through an asset cycle of schema validation, versioning, access control, sharing, triggering configured workflows like inspection, proxy generation etc. This platform has evolved from supporting studio applications to data science applications, machine-learning applications to discover the assets metadata, and build various data facts.
在Netflix,我们建立了资产管理平台(AMP),作为一个集中的服务来组织、存储和发现电影制作过程中创建的数字媒体资产。工作室应用程序使用这个服务来存储他们的媒体资产,然后经过模式验证、版本控制、访问控制、共享、触发配置的工作流程(如检查、代理生成等)的资产循环。这个平台已经从支持工作室应用发展到数据科学应用、机器学习应用,以发现资产元数据,并建立各种数据事实。
During this evolution, quite often we receive requests to update the existing assets metadata or add new metadata for the new features added. This pattern grows over time when we need to access and update the existing assets metadata. Hence we built the data pipeline that can be used to extract the existing assets metadata and process it specifically to each new use case. This framework allowed us to evolve and adapt the application to any unpredictable inevitable changes requested by our platform clients without any downtime. Production assets operations are performed in parallel with older data reprocessing without any service downtime. Some of the common supported data reprocessing use cases are listed below.
在这个演变过程中,我们经常收到更新现有资产元数据的请求,或者为新增的功能添加新的元数据。当我们需要访问和更新现有的资产元数据时,这种模式会随着时间的推移而增长。因此,我们建立了一个数据管道,可以用来提取现有的资产元数据,并根据每个新的用例进行专门处理。这个框架使我们能够发展和调整应用程序,以适应我们平台客户要求的任何不可预知的必然变化,而不需要任何停机时间。生产资产操作与旧数据再处理并行进行,没有任何服务停机。下面列出了一些常见的支持数据再处理的用例。
Production Use Cases
生产用例
- Real-Time APIs (backed by the Cassandra database) for asset metadata access don’t fit analytics use cases by data science or machine learning teams. We build the data pipeline to persist the assets data in the iceberg in parallel with cassa...