推进我们的 Chef 基础设施:安全无中断
Last year, I wrote a blog post titled Advancing Our Chef Infrastructure, where we explored the evolution of our Chef infrastructure over the years. We talked about the shift from a single Chef stack to a multi-stack model, and the challenges that came with it – from updating how we handle cookbook uploads to navigating the limitations around Chef searches.
If you haven’t had a chance to read that post yet, I highly recommend checking it out first to get the full context for this post.
去年,我写了一篇题为 Advancing Our Chef Infrastructure 的博客文章,探讨了我们 Chef 基础设施多年来的演进。我们谈到了从单一 Chef stack 转向多 stack 模型的过程,以及随之而来的挑战——从更新 cookbook 上传方式,到应对 Chef 搜索的种种限制。
如果你还没读过那篇文章,我强烈建议先去看看,以便更好地理解本文的背景。
At Slack, keeping our service reliable is always the top priority. In my last post, I talked about the first phase of our work to make Chef and EC2 provisioning safer. With that behind us, we started looking at what else we could do to make deploys even safer and more reliable.
在 Slack,保持服务可靠始终是首要任务。在我上一篇文章中,我谈到了我们让 Chef 和 EC2 配置更安全的第一阶段工作。完成这一阶段后,我们开始研究还能做些什么,让部署更加安全和可靠。
One idea we explored was moving to Chef Policyfiles. That would have meant replacing roles and environments and asking dozens of teams to change their cookbooks. In the long run, it might have made things safer, but in the short term it would have been a huge effort and added more risk than it solved.
So instead, this post is about the path we chose: improving our existing EC2 framework in a way that doesn’t disrupt cookbooks or roles, while still giving us more safety in our Chef deployments.
我们曾探讨的一个想法是迁移到 Chef Policyfiles。这意味着要替换 roles 和 environments,并要求数十个团队修改他们的 cookbooks。从长远来看,这可能让系统更安全,但短期内将是一次巨大的投入,带来的风险比解决的问题还多。
因此,这篇文章讲述的是我们最终选择的路径:在不破坏现有 cookbooks 或 roles 的前提下,改进我们现有的 EC2 框架,同时让 Chef 部署更加安全。
Splitting Chef Environments
Splitting Chef Environments
Previously, each instance had a cron job that triggered a Chef run every few hours on a set schedule. These scheduled runs were primarily for compliance purposes — to ensure our fleet remained in a consistent and defined con...