郁金香。将Meta的数据平台图案化
- We’re sharing Tulip, a binary serialization protocol supporting schema evolution.
- 我们正在分享Tulip,一个支持模式进化的二进制序列化协议。
- Tulip assists with data schematization by addressing protocol reliability and other issues simultaneously.
- 郁金香通过同时解决协议的可靠性和其他问题来协助数据图表化。
- It replaces multiple legacy formats used in Meta’s data platform and has achieved significant performance and efficiency gains.
- 它取代了Meta公司数据平台中使用的多种传统格式,并取得了显著的性能和效率提升。
There are numerous heterogeneous services, such as warehouse data storage and various real-time systems, that make up Meta’s data platform — all exchanging large amounts of data among themselves as they communicate via service APIs. As we continue to grow the number of AI- and machine learning (ML)–related workloads in our systems that leverage data for tasks such as training ML models, we’re continually working to make our data logging systems more efficient.
有许多异质服务,如仓库数据存储和各种实时系统,构成了Meta的数据平台--所有这些服务在它们之间通过服务API进行通信时,都会交换大量的数据。随着我们的系统中与人工智能和机器学习(ML)相关的工作负载数量不断增加,这些工作负载利用数据进行训练ML模型等任务,我们不断努力使我们的数据记录系统更加高效。
Schematization of data plays an important role in a data platform at Meta’s scale. These systems are designed with the knowledge that every decision and trade-off can impact the reliability, performance, and efficiency of data processing, as well as our engineers’ developer experience.
数据的图表化在Meta公司规模的数据平台中起着重要作用。在设计这些系统时,我们知道每一个决定和权衡都会影响数据处理的可靠性、性能和效率,以及我们工程师的开发经验。
Making huge bets, like changing serialization formats for the entire data infrastructure, is challenging in the short term, but offers greater long-term benefits that help the platform evolve over time.
做出巨大的赌注,如改变整个数据基础设施的序列化格式,在短期内是具有挑战性的,但能提供更大的长期利益,帮助平台随着时间的推移不断发展。
The challenge of a data platform at exabyte scale
百万亿字节规模的数据平台的挑战
The data analytics logging library is present in the web tier as well as in internal services. It is responsible for logging analytical and operational data via Scribe (Meta’s persistent and durable message queuing system). Various services read and ingest data from Scribe, including (but not limited ...