Improving data processing efficiency using partial deserialization of Thrift
摘要
At Pinterest we’ve worked to greatly improve data processing efficiency. One quote that resonates with our unique approach is from writer Antoine de Saint-Exupéry: “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”
Ultimately, we process petabytes of Thrift encoded data at Pinterest. Most jobs that access this data need only a part of it. To meet our unique needs, we devised a way to efficiently deserialize only the desired subsets of Thrift structures in each job. Our solution enabled us to significantly decrease our data processing resource usage: about 20% reduction in vcore usage, 27% reduction in memory usage, and 36% reduction in intermediate data (mapper output).
欢迎在评论区写下你对这篇文章的看法。