In 2020, Figma’s infrastructure hit some growing pains due to a combination of new features, preparing to launch a second product, and more users (database traffic grows approximately 3x annually). We knew that the infrastructure that supported Figma in the early years wouldn’t be able to scale to meet our demands. We were still using a single, large Amazon RDS database to persist most of our metadata—like permissions, file information, and comments—and while it seamlessly handled many of our core collaborative features, one machine has its limits. Most visibly, we observed upwards of 65% CPU utilization during peak traffic due to the volume of queries serviced by one database. Database latencies become increasingly unpredictable as usage edges closer to the limit, affecting core user experiences.
2020 年，由于新功能、准备推出第二个产品以及更多用户（数据库流量每年增长约 3 倍），Figma 的基础设施遇到了一些成长的烦恼。我们知道，早年支持 Figma 的基础设施将无法扩展以满足我们的需求。我们仍然使用单一的大型亚马逊RDS数据库来保存我们的大部分元数据，如权限、文件信息和评论，虽然它可以无缝处理我们的许多核心协作功能，但一台机器也有其局限性。最明显的是，我们观察到，由于一个数据库所服务的查询量，在流量高峰期CPU利用率高达65%以上。当使用量接近极限时，数据库的延迟变得越来越不可预测，影响了核心用户的体验。
If our database became completely saturated, Figma would stop working.
We were far from that, but as an infrastructure team, our goal is to identify and fix scalability issues proactively before they come close to being imminent threats. We needed to devise a solution that would reduce potential instability and pave the way for future scale. Plus, performance and reliability would continue to be top of mind as we implemented that solution; our team aims to build a sustainable platform that allows engineers to rapidly iterate on Figma’s products without impacting the user experience. If Figma’s infrastructure is a series of roads, we can’t just shut down the highways while we work on them.
我们离这一点还很远，但作为一个基础设施团队，我们的目标是在可扩展性问题接近成为迫在眉睫的威胁之前，主动发现并解决这些问题。我们需要设计一个解决方案，以减少潜在的不稳定性，并为未来的扩展铺平道路。另外，在我们实施该解决方案时，性能和可靠性将继续成为我们的首要任务；我们的团队旨在建立一个可持续的平台，使工程师能够在不影响用户体验的情况下快速迭代 Figma 的产品。如果 Figma 的基础设施是一系列的道路，我们就不能在工作时关闭高速公路。
We started with a few tactica...