An Interactive Intro to CRDTs

Have you heard about CRDTs and wondered what they are? Maybe you’ve looked into them a bit, but ran into a wall of academic papers and math jargon? That was me before I started my Recurse Center batch. But I’ve spent the past month or so doing research and writing code, and it turns out that you can build a lot with just a few simple things!

In this two-part series, we’ll learn what a CRDT is. Then we’ll write a primitive CRDT, compose it into more complex data structures, and finally use what we’ve learned to build a collaborative pixel art editor. All of this assumes no prior knowledge about CRDTs, and only a rudimentary knowledge of TypeScript.

To pique your curiosity, this is what we’ll end up with:

Draw by clicking and dragging with your mouse. Change the paint color by using the color input on the bottom left. You can draw on either canvas and your changes will show up on the other, as if they were collaborating on the same picture.

Clicking the network button prevents changes from reaching the other canvas (although they’ll sync up again when they come back “online”). The latency slider adds a delay before changes on one canvas show up on the other.

We’ll build that in the next post. First, we need to learn about CRDTs!

What is a CRDT?

Okay, let’s start from the top. CRDT stands for “Conflict-free Replicated Data Type”. That’s a long acronym, but the concept isn’t too complicated. It’s a kind of data structure that can be stored on different computers (peers). Each peer can update its own state instantly, without a network request to check with other peers. Peers may have different states at different points in time, but are guaranteed to eventually converge on a single agreed-upon state. That makes CRDTs great for building rich collaborative apps, like Google Docs and Figma — without requiring a central server to sync changes.

Broadly, there are two kinds of CRDTs: state-based and operation-based.1 State-based CRDTs transmit their full state between peers, and a new state is obtained by merging all the states together. Operation-based CRDTs transmit only the actions that users take, which can be used to calculate a new state.

They’re also known as CvRDTs (“Cv” standing for “convergent”) and CmRDTs (“Cm” standing for “commutative”), respectively, although I think “state-based” and “operation-based” are the preferred terms. ↩

That might make operation-based CRDTs sound way better. For example, if a user updates one item in a list, an operation-based CRDT can send a description of only that update, while a state-based CRDT has to send the whole list! The drawback is that operation-based CRDTs impose constraints on the communication channel: messages must be delivered exactly once, in causal order, to each peer.2

There are also delta CRDTs, or hybrid CRDTs, which allow peers to negotiate the subset of the state they need to send each other. That’s one example of blending operation-based and state-based CRDTs. But the fundamental tradeoff remains: sending less data between peers means more constraints on communication. ↩

This post will exclusively focus on on state-based CRDTs. For brevity, I’ll just say “CRDTs” from here on out, but know that I’m referring specifically to state-based CRDTs.

I’ve been talking about what CRDTs do, but what is a CRDT? Let’s make it concrete: a CRDT is any data structure that implements this interface:3

Technically, a CRDT can be anything that follows the merging rules described below. This is a working definition; in practice, implementations in object-oriented languages will end up looking something like this. ↩

interface CRDT<T, S> { value: T; state: S; merge(state: S): void;
}

That is to say, a CRDT contains at least three things:

A value, T. This is the part the rest of our program cares about. The entire point of the CRDT is to reliably sync the value between peers.
A state, S. This is the metadata needed for peers to agree on the same value. To update other peers, the whole state is serialized and sent to them.
A merge function. This is a function that takes some state (probably received from another peer) and merges it with the local state.

The merge function must satisfy three properties to ensure that all peers arrive at the same result (I’ll use the notation A ∨ B to indicate merging state A into state B):

Commutativity: states can be merged in any order; A ∨ B = B ∨ A. If Alice and Bob exchange states, they can each merge the other’s state into their own and arrive at the same result.
Associativity: when merging three (or more) states, it doesn’t matter which are merged first; (A ∨ B) ∨ C = A ∨ (B ∨ C). If Alice receives states from both Bob and Carol, she can merge them into her own state in any order and the result will be the same.4
Idempotence: merging a state with itself doesn’t change the state; A ∨ A = A. If Alice merges her own state with itself, the result will be the same state she started with.

Commutativity and associativity might sound the same, and indeed most commutative operations are also associative. But there are a few math operations that are only one or the other. Matrix multiplication, for example, is associative but not commutative. And surprisingly, floating point arithmetic — i.e. any math operator in JavaScript — is commutative but not associative! ↩

Mathematically proving that a merge function has all these properties might sound hard. But luckily, we don’t have to do that! Instead, we can just combine CRDTs that already exist, leaning on the fact that someone has proven these things for us.

Speaking of CRDTs that already exist: let’s learn about one!

Last Write Wins Register

A register is a CRDT that holds a single value. There are a couple kinds of registers, but the simplest is the Last Write Wins Register (or LWW Register).

LWW Registers, as the name suggests, simply overwrite their current value with the last value written. They determine which write occurred last using timestamps, represented here by integers that increment whenever the value is updated.5 Here’s the algorithm:

You might ask: why not use the actual time? Unfortunately, accurately syncing clocks between two computers is an extremely hard problem. Using incrementing integers like this is one simple version of a logical clock, which captures the order of events relative to each other rather than to the “wall clock”. ↩

If the received timestamp is less than the local timestamp, the register doesn’t change its state.
If the received timestamp is greater than the local timestamp, the register overwrites its local value with the received value. It also stores the received timestamp and some sort of identifier unique to the peer that last wrote the value (the peer ID).
Ties are broken by comparing the local peer ID to the peer ID in the received state.

Try it out with the playground below.

Did you get a sense for how LWW Registers work? Here are a couple specific scenarios to try:

Turn the network off, make a bunch of updates to bob, and then turn it back on. When you send updates from alice, they’ll be rejected until the timestamp exceeds bob’s timestamp.
Perform the same setup, but once you turn the network back on, send an update from bob to alice. Notice how the timestamps are now synced up and alice can write to bob again!
Increase the latency and send an update from both peers simultaneously. alice will accept bob’s update, but bob will reject alice‘s. Since bob’s peer ID is greater, it breaks the timestamp tie.

Here’s the code for the LWW Register:

class LWWRegister<T> { readonly id: string; state: [peer: string, timestamp: number, value: T]; get value() { return this.state[2]; } constructor(id: string, state: [string, number, T]) { this.id = id; this.state = state; } set(value: T) { this.state = [this.id, this.state[1] + 1, value]; } merge(state: [peer: string, timestamp: number, value: T]) { const [remotePeer, remoteTimestamp] = state; const [localPeer, localTimestamp] = this.state; if (localTimestamp > remoteTimestamp) return; if (localTimestamp === remoteTimestamp && localPeer > remotePeer) return; this.state = state; }
}

Let’s see how this stacks up to the CRDT interface:

state is a tuple of the peer ID that last wrote to the register, the timestamp of the last write and the value stored in the register.
value is simply the last element of the state tuple.
merge is a method that implements the algorithm described above.

LWW Registers have one more method named set, which is called locally to set the register’s value. It also updates the local metadata, recording the local peer ID as the last writer and incrementing the local timestamp by one.

That’s it! Although it appears simple, the humble LWW Register is a powerful building block with which we can create actual applications.

Last Write Wins Map

Most programs involve more than one value,6 which means we’ll need a more complex CRDT than the LWW Register. The one we’ll learn about today is called the Last Write Wins Map (or LWW Map).

[citation needed] ↩

Let’s start by defining a couple types. First, our value type:

type Value<T> = { [key: string]: T;
};

If each individual map value holds type T, then the value of the entire LWW Map is a mapping of string keys to T values.

Here’s our state type:

type State<T> = { [key: string]: LWWRegister<T | null>["state"];
};

Do you see the trick? From our application’s perspective, the LWW Map just holds normal values — but it actually holds LWW Registers. When we look at the full state, each key’s state is the state of the LWW Register at that key.7

If the value type is T, why is the state type a union of T | null? More on that in a bit! ↩

I want to pause on that for a moment, because it’s important. Composition lets us combine primitive CRDTs into more complex ones. When it’s time to merge, all the parent does is pass slices of incoming state to the appropriate child’s merge function. We can nest this process as many times as we want; each complex CRDT passing ever-smaller slices of state down to the next level, until we finally hit a primitive CRDT that performs the actual merge.

From this perspective, the LWW Map merge function is simple: iterate through each key and hand off the incoming state at that key to the corresponding LWW Register to merge. Try it out in the playground below:

It’s kind of difficult to trace what’s happening here, so let’s split up the state for each key. Note, though, that this is just a visualization aid; the full state is still being transmitted as a single unit.

Try increasing the latency and then updating different keys on each peer. You’ll see that each peer accepts the updated value with a higher timestamp, while rejecting the value with a lower timestamp.

The full LWW Map class is kinda beefy, so let’s go through each property one by one. Here’s the start of it:

class LWWMap<T> { readonly id = ""; #data = new Map<string, LWWRegister<T | null>>(); constructor(id: string, state: State<T>) { this.id = id; for (const [key, register] of Object.entries(state)) { this.#data.set(key, new LWWRegister(this.id, register)); } }
}

#data is a private property holding a map of the keys to LWW Register instances. To instantiate a LWW Map with preexisting state, we need to iterate through the state and instantiate each LWW Register.

Remember, CRDTs need three properties: value, state and merge. We’ll look at value first:

 get value() { const value: Value<T> = {}; for (const [key, register] of this.#data.entries()) { if (register.value !== null) value[key] = register.value; } return value; }

It’s a getter that iterates through the keys and gets each register’s value. As far as the rest of the app is concerned, it’s just normal map!

Now let’s look at state:

 get state() { const state: State<T> = {}; for (const [key, register] of this.#data.entries()) { if (register) state[key] = register.state; } return state; }

Similar to value, it’s a getter that builds up a map from each register’s state.

There’s a clear trend here: iterating through the keys in #data and handing things off to the register stored at that key. You’d think merge would work the same way, but it’s a little more involved:

 merge(state: State<T>) { for (const [key, remote] of Object.entries(state)) { const local = this.#data.get(key); if (local) local.merge(remote); else this.#data.set(key, new LWWRegister(this.id, remote)); } }

First, we iterate through the incoming state parameter rather than the local #data. That’s because if the incoming state is missing a key that #data has, we know that we don’t need to touch that key.8

You might be wondering: if another peer deleted a key from their map, shouldn’t we remove it from our local map as well? It’s the same reason the state holds T | null. We’re getting there! ↩

For each key in the incoming state, we get the local register at that key. If we find one, the peer is updating an existing key that we already know about, so we call that register’s merge method with the incoming state at that key. Otherwise, the peer has added a new key to the map, so we instantiate a new LWW Register using the incoming state at that key.

In addition to the CRDT methods, we need to implement methods more commonly found on maps: set, get, delete and has.

Let’s start with set:

 set(key: string, value: T) { const register = this.#data.get(key); if (register) register.set(value); else this.#data.set(key, new LWWRegister(this.id, [this.id, 1, value])); }

Just like in the merge method, we’re either calling the register’s set to update an existing key, or instantiating a new LWW Register to add a new key. The initial state uses the local peer ID, a timestamp of 1 and the value passed to set.

get is even simpler:

 get(key: string) { return this.#data.get(key)?.value ?? undefined; }

Get the register from the local map, and return its value if it has one.

Why coalesce to undefined? Because each register holds T | null. And with the delete method, we’re ready to explain why:

 delete(key: string) { this.#data.get(key)?.set(null); }

Rather than fully removing the key from the map, we set the register value to null. The metadata is kept around so we can disambiguate deletions from states that simply don’t have a key yet. These are called tombstones — the ghosts of CRDTs past.

Consider what would happen if we really did delete the keys from the map, rather than leaving a tombstone. Here’s a playground where peers can add keys, but not delete them. Can you figure out how to get a peer to delete a key?

Turn off the network, add a key to alice’s map, then turn the network back on. Finally, make a change to bob’s map. Since alice sees that the incoming state from bob is missing that key, she removes it from her own state — even though bob never knew about that key in the first place. Whoops!

Here’s a playground with the correct behavior. You can also see what happens when a key is deleted.

Notice how we never remove deleted keys from the map. This is one drawback to CRDTs — we can only ever add information, not remove it. Although from the application’s perspective the key has been fully deleted, the underlying state still records that the key was once there. In technical terms, we say that CRDTs are monotonically increasing data structures.9

There are a couple ways mitigate this drawback, both of which are outside the scope of this article. One is “garbage collection”: pruning tombstones from CRDTs, which prevents you from merging states with any changes made before the tombstones were removed. Another is creating an efficient format to encode the data. You can also combine these methods. Research suggests that this can result in as little as 50% overhead compared to the “plain” data. ↩

The final LWW Map method is has, which returns a boolean indicating whether the map contains a given key.

 has(key: string) { return this.#data.get(key)?.value !== null; }

There’s a special case here: if the map contains a register at the given key, but the register contains null, the map is considered to not contain the key.

For posterity, here’s the full LWW Map code:

class LWWMap<T> { readonly id: string; #data = new Map<string, LWWRegister<T | null>>(); constructor(id: string, state: State<T>) { this.id = id; for (const [key, register] of Object.entries(state)) { this.#data.set(key, new LWWRegister(this.id, register)); } } get value() { const value: Value<T> = {}; for (const [key, register] of this.#data.entries()) { if (register.value !== null) value[key] = register.value; } return value; } get state() { const state: State<T> = {}; for (const [key, register] of this.#data.entries()) { if (register) state[key] = register.state; } return state; } has(key: string) { return this.#data.get(key)?.value !== null; } get(key: string) { return this.#data.get(key)?.value; } set(key: string, value: T) { const register = this.#data.get(key); if (register) register.set(value); else this.#data.set(key, new LWWRegister(this.id, [this.id, 1, value])); } delete(key: string) { this.#data.get(key)?.set(null); } merge(state: State<T>) { for (const [key, remote] of Object.entries(state)) { const local = this.#data.get(key); if (local) local.merge(remote); else this.#data.set(key, new LWWRegister(this.id, remote)); } }
}

Next Steps

We now have two CRDTs in our toolbox: LWW Register and LWW Map. How do we actually use them? We’ll cover that in the next post: Building a Collaborative Pixel Art Editor with CRDTs.