介绍 OpenZL:一个开源的格式感知压缩框架

Today, we are excited to announce the public release of OpenZL, a new data compression framework. OpenZL offers lossless compression for structured data, with performance comparable to specialized compressors. It accomplishes this by applying a configurable sequence of transforms to the input, revealing hidden order in the data, which can then be more easily compressed. Despite applying distinct transformation permutations for every file type, all OpenZL files can be decompressed using the same universal OpenZL decompressor.

今天,我们很高兴地宣布 OpenZL 的公开发布,这是一个新的数据压缩框架。OpenZL 提供对结构化数据的无损压缩,其性能可与专业压缩器相媲美。它通过对输入应用可配置的变换序列来实现这一点,揭示数据中的隐藏顺序,从而可以更轻松地进行压缩。尽管对每种文件类型应用不同的变换排列,但所有 OpenZL 文件都可以使用相同的通用 OpenZL 解压缩器进行解压。

A Decade of Lessons

十年的经验教训

When Zstandard was announced, it came with a simple pitch: It promised the same or better compression ratio of prior default but at the much increased speed required by datacenter workloads. By pairing strong entropy coding with a design that fully utilized modern CPU capabilities, Zstandard offered a substantial improvement that justified its presence in datacenters.

Zstandard 被宣布时,它带来了一个简单的宣传:它承诺提供与之前默认相同或更好的压缩比,但以数据中心工作负载所需的更高速度。通过将强大的熵编码与充分利用现代 CPU 能力的设计相结合,Zstandard 提供了显著的改进,证明了其在数据中心的存在是合理的。

However, while it was improved over time, remaining within the Zstandard framework offers diminishing returns. So we started looking for the next great leap in data compression.

然而,尽管随着时间的推移有所改进,但仍然在Zstandard框架内提供的收益递减。因此,我们开始寻找数据压缩的下一个重大飞跃。

In this quest, one pattern kept repeating: Using generic methods on structured data leaves compression gains on the table. Data isn’t just byte soup. It can be columnar, encode enums, be restricted to specific ranges, or carry highly repetitive fields. More importantly, it has predictable shapes. A bespoke compressor that leans into that structure can beat general-purpose tools on both ratio and speed. But there’s a catch — every bespoke scheme means another compressor and decompressor to create, ship, audit, patch, and trust.

在这个追求中,有一个模式不断重复:在结构化数据上使用通用方法会...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.147.1. UTC+08:00, 2025-10-31 20:33
浙ICP备14020137号-1 $访客地图$