Grab 的 Docker 懒加载:加速容器启动时间
At Grab, we’ve been exploring ways to dramatically reduce container startup times for our data platforms. Large container images for services like Airflow and Spark Connect were taking minutes to download, causing slow cold starts and poor auto-scaling performance. This blog post shares our journey implementing Docker image lazy loading using eStargz and Seekable OCI (SOCI) technologies, the results we achieved, and the lessons learned along the way.
在 Grab,我们一直在探索大幅减少数据平台容器启动时间的方法。像 Airflow 和 Spark Connect 这样的服务的大型容器镜像需要几分钟下载,导致冷启动缓慢和自动缩放性能差。本博客文章分享了我们使用 eStargz 和 Seekable OCI (SOCI) 技术实现 Docker 镜像懒加载的历程、我们取得的成果以及沿途学到的教训。
Results: The numbers speak for themselves
结果:数字胜于雄辩
Benchmark results
基准测试结果
Our initial testing on fresh nodes (nodes without cached images) showed dramatic improvements in image pull times as shown in Figure 1.
我们在新节点(没有缓存镜像的节点)上的初步测试显示,镜像拉取时间有了显著改善,如 Figure 1 所示。

Figure 1. Table of results.
Figure 1. 结果表。
The key advantage of lazy loading is the reduction in image pull time, especially on “fresh” nodes that do not have the image cached. By analyzing detailed pod events, we can see the precise impact of using the stargz snapshotter.
懒加载的主要优势是减少镜像拉取时间,尤其是在没有缓存镜像的“新鲜”节点上。通过分析详细的 pod 事件,我们可以看到使用 stargz snapshotter 的精确影响。
During our SOCI benchmark testing, we observed an important distinction between SOCI and eStargz: SOCI maintains the same application startup time as standard images, while eStargz takes longer. For example, with Airflow, both overlayFS and SOCI achieved 5.0 seconds startup time, while eStargz took 25.0 seconds. This demonstrates that lazy loading doesn’t eliminate download time; it redistributes it. SOCI’s approach of maintaining separate indexes allows it to optimize the download-to-startup time trade-off more effectively, keeping application startup performance on par with standard images while still dramatically reducing image pull time.
在我们的 SOCI 基准测试中,我们观察到 SOCI 和 eStargz 之间有一个重要的区别:SOCI 保持与标准镜像相同的应用启动时间,而 eStargz 需要更长时间。例如,对于 Airflow,overlayFS 和 SOCI 都实现了 5.0 秒的启动时间,而 eStar...