January 23, 2024

Improving performance with incremental frame loading

Matthew HuangSoftware Engineer, Figma

Andrew ChanSoftware Engineer, Figma

Inside Figma Quality & performance Engineering

Large prototypes took minutes to load, and users took notice. Here’s how we overhauled our prototype player to improve load time and stability.

When we first built our prototyping features, we had one goal: Allow users to create interactive flows in Figma. From a technical perspective, we built a simple loading strategy to match. We’d load the entire document containing the prototype into memory before displaying the starting screen.

Mobile devices (especially iPhones) tend to have less memory than desktop devices, and the operating system would often kill our process instead of expanding our application’s memory budget.

This was reasonable at the time. But as Figma became more advanced, documents became larger and larger, featuring many pages and design systems with exponentially growing numbers of component variants. Prototypes multiplied in size and scale, while load times and stability lagged. Similarly, prototypes often crashed on mobile, exceeding mobile memory limits.

Incremental loading is a concept used to describe loading only a new or updated piece of data, rather than loading the entire file. By syncing just the update, processes move more efficiently. Incremental “frame” loading is a term the team coined to describe how we apply those processes to prototypes at Figma.

It was immediately clear that the sheer amount of data was the culprit. Instead of loading the entire prototype in memory, what if we could just load the content needed to display what was currently on the screen? With this incremental frame loading strategy, we could kill two birds with one stone, fixing both load times and memory usage by downloading and storing in-memory only what was needed.

Careful considerations

Maintaining a smooth user experience at our scale would involve augmenting our multiplayer system to allow for piecewise syncing of document content.

This sounds easy, but what makes Figma, Figma—multiplayer technology, built in the browser, with a wide feature set—actually complicated things. And if we were moving away from loading the whole prototype, how much of the prototype should we actually load upfront? When should we load more?

Time to interactive measures how long it takes for a page to become fully interactive. This is defined by the earliest time where the page displays useful content and begins responding to clicks and other user interactions within 50ms.

To answer that question, we needed to think about the experience of viewing a prototype. We wanted users to be able to interact with their prototype as quickly as possible. To start interacting with the least amount of screens required, we just needed the first screen and any adjacent screens that could be reached by a prototype interaction. So, the time to interactive should just be blocked by loading this subset of screens. When the user navigates to the next screen, we’ll load its adjacent screens, and so on!

User starts viewing the first frame; we pre-load its immediately-reachable frames.

User navigates and views the next frame; we pre-load that new frame's immediately-reachable frames. We also keep the first frame loaded-in still, because users can navigate back to it.

Another consideration was that Figma files can be updated in real time, with any changes immediately synced to all active viewers of the document. This makes it easy for designers to do things like view a flow on its intended device while iterating on the underlying document on another. In our previous system, real-time updates worked by associating each file with a real-time datastore on our servers. To allow clients to load only a subset of a file, we added support for querying specific parts of a file.

Queries were made real-time by specifying a protocol for how the database would communicate with the client, given changes in file content or client queries. At a high level, this document syncing protocol looks like:

A server session starts with an empty subscription. A client requests a query message specifying the subtrees in the document tree it wants to subscribe to.
The server responds with a reply message containing a snapshot of the subtree, and confirming the query is fulfilled.
After the initial response, the server will sync down any subsequent updates to the subscribed subset via additional “changes” messages.

Here’s an example:

Client sends a query of a. This means “subscribe me to node with ID a, all of its ancestors, and all of its descendants.”
Server sends a reply confirming that the query for a is fulfilled along with the contents of the query.
Whenever these nodes change in the future, the server will send more changes messages.

We start off with the client subscribing to a subset of the full file represented on the server (a, b, c, x). So when a property change is made to c, the client receives a message about that change.

Next, a change occurs that re-parents node y underneath node a. Since the client is subscribed to a, the server sends a message that informs the client about the existence of node y under a.

Lastly, c is re-parented under x, and the client isn’t subscribed to x's descendants. The client receives a message that c is removed.

Other noteworthy dives

The above protocol is the heart and soul of our incremental frame loading scheme, but there were a few additional details needed to support all files and cases. Here are a couple that we think are the most interesting.

Dependency edges

Some objects in Figma have properties that need to be inferred from other objects. For example, instances are copies of their backing component: Whenever the backing component changes, instances need to be updated as well. Similarly, consumers of styles or variables need to be updated when the style or variable itself changes. To support this, when a client subscribes to an object, our protocol also subscribes them to its dependencies (and any of their dependencies, transitively).

To make things more complicated, dependencies can change as the document is edited: Nodes can be reparented, can switch the style they use, and so on.

When new dependencies are added, which weren’t already present in a client’s subscription, the server sends node created changes, so that the client learns about those nodes.
When old dependencies are removed from a subscription, the server sends node removed changes, effectively allowing the client to “evict” those nodes from memory.

Changing subscriptions

As clients navigate through a Figma prototype, they change the set of nodes they are subscribed to. This means subtrees become “unsubscribed,” too, as the client navigates. What happens if a viewer unsubscribes from a node, but then re-subscribes to it later on? If the affected node is updated, or even deleted, during the in-between period, the client may end up with stale nodes if we’re not careful. By always evicting unsubscribed nodes from memory and making sure that the server sends a fresh version of those nodes down on re-subscription, we can ensure that this never happens.

This is also important for keeping memory usage low, as we’ll see below.

Optimizing the prototyping experience

This new protocol for syncing Figma documents made it possible to incrementally load prototypes while still supporting real-time updates. Now, all we have to do is to hook our clients into this protocol to deliver an incrementally loaded prototype experience. We designed our shiny new client strategy with the goal of reducing memory usage and load times, while also preserving existing, snappy navigation.

Reducing memory usage and load times

Incremental loading allows us to subscribe to the minimum amount of screens needed for the prototype to function properly. Remember: This means that we only need the first screen of the prototype, along with any adjacent screens.

Next, we want to make sure that even as a client navigates around the prototype, they don’t accumulate unnecessary data in memory. We make sure that clients evict any screens that aren’t directly subscribed, so clients are constantly cleaning up unused screens as they request new ones while navigating prototypes. However, as we’ll see in the next objective, constantly allocating and evicting screens was at odds with maintaining a snappy navigation experience.

Reducing memory usage doesn’t come for free! Since Figma clients now initially load fewer screens, this means we need to repay that price later—when users navigate forward and need to download new screens. Recall that our client strategy only loads the screens adjacent to what someone is currently viewing. This means that if users do two quick navigations in succession, they’ll need to download more screens and will hit a loading spinner.

So, why don’t we just keep loading more of the prototype in the background? Well, this is a direct tradeoff with one of our primary goals, which is to reduce memory usage on mobile devices. We’d eventually end up loading the entire prototype and crashing the application again. To solve this, we decided to fork the behavior between desktop and mobile devices: On desktop, we would continue loading all screens in the background to retain fast and snappy navigations; on mobile devices, we would load screens as a user navigates, evicting old-enough screens to ease memory consumption.

Another hurdle involved lock-ups caused by processing loads in the background. With incremental loading, a lot of things happen as a user navigates from one screen to the next: Our renderer plays animations, we compute metadata about the new screen, we pre-load and process yet more screens. In tests, we learned that users’ devices would often lock up amidst all of this work. Animations and interactions weren’t running smoothly, and this wasn’t up to our standards of quality. This was challenging to solve because Figma has to run in the more-or-less single-threaded environment of a web browser. We couldn’t delegate processing of loads to a worker or otherwise leverage parallelism because we needed to read and write state only found in the main thread.

To mitigate the effect, we made broad optimizations to the prototype experience. First, we split each incremental load of the file into smaller chunks, which could be progressively loaded in the background more easily. We then skipped the computation of rendering metadata (such as layout and boolean operations) if they weren’t being shown to the user. Along with other targeted optimizations of expensive flows, these drastically reduced the amount of CPU lockup that our users experienced.

After careful planning and deciding which tradeoffs to make, we shipped these improvements to all of our users. Since then, we’ve been happy seeing not only the improvements to load times, but also the reduction to crash rates on mobile devices. Much like other engineering work at Figma, we prioritized the user experience, embraced the complexity that comes with that, and invested in creating the best architecture to fit those needs. If this sounds interesting to you, we’d love to have you come join us!

Illustration by Enle Li.

Keeping Figma fast

Slava Kim,

Laurel Woods

When a laptop crashed in an empty office, we knew it was time to overhaul our performance testing framework.

Inside Figma Quality & performance Engineering

How Figma draws inspiration from the gaming world

Alice Ching

Engineering Manager Alice Ching discusses the parallels between developing gaming interfaces and building Figma and FigJam, and why our tech stack is more similar to a game engine’s tech stack than a web stack.

Inside Figma Infrastructure Engineering