Protocol Buffer Design: Principles and Practices for Collaborative Development

At Lyft Media, we’re obsessed with building flexible and highly reliable native ad products. Since our technical stack encompasses mobile clients on both iOS and Android, as well as multiple backend services, it is crucial to ensure robust and efficient communication between all involved entities. For this task we are leveraging Protocol Buffers, and we would like to share the best practices that are helping us achieve this goal. This article focuses on our experience addressing the challenges that come with collaborating on shared protocols in teams where people with different levels of familiarity and historical context, or even people outside the team, get to contribute. The problem of development process quality is prioritized over raw efficiency optimizations.

Note: This article focuses on the proto3 specification of the Protocol Buffers (protobuf) language. Code snippets illustrating the handling of generated protobuf messages are provided in Python.

Why Protocol Buffers?

In comparison to text-based data serialization formats like JSON, protobufs offer both higher serialization efficiency & performance, and better backwards compatibility.
Our team works with Python, Swift, and Kotlin codebases extensively, and all of these languages have extensive protobuf support.
Protobufs are extensible with rich validation capabilities that help us reach our reliability goals while avoiding writing boilerplate code across platforms.
Protobufs boast rich internal tooling at Lyft, and widespread use in both mobile-to-server and server-to-server domains. For additional context on the decision to adopt protobufs for mobile development at Lyft and the process of achieving this change, refer to this 2020 article by Michael Rebello in the Lyft Engineering Blog.

Protocol design is different from typical coding & implementation work in a couple of crucial aspects. To illustrate the rather abstract arguments presented here, let’s imagine we’re designing a message conveying a certain event, containing typical fields like event kind, id, timestamp, etc. With no prior protocol design experience, one might be tempted to approach it like writing code and use the familiar concept of enum to distinguish between different kinds of events:

message Event {
enum Kind {
EVENT_KIND_A = 0;
EVENT_KIND_B = 1; }

uint64 id = 1;

uint64 timestamp = 2;
Kind kind = 3;
}

Indeed, this setup will work well — until new events are added that are required to carry additional data. Let’s say, now a new rich event type appears, and we add a new enum value and the new associated field like so:

message Event {
enum Kind {
EVENT_KIND_A = 0;
EVENT_KIND_B = 1;
EVENT_KIND_C = 2; }

uint64 id = 1;

uint64 timestamp = 2;
Kind kind = 3;
uint32 payload_size = 4;
}

The problem with this setup is that it is implicit about the correctness of various combinations of its fields being set — if kind equals EVENT_C, should payload_size’s presence be enforced? What about if kind is EVENT_A? Of course it can be implemented semantically in the logic handling these values, but with each new implicit relationship, like this one, the code becomes more convoluted and can quickly grow unmaintainable. Avoiding such a pitfall brings us to our first principle: clarity. It’s nice to work with a protocol that’s structured in a way where it explains itself; both as perceived immediately and as proven by iterating on it long term.

Clarity relates not only to semantic relationships between fields like we just illustrated, it also applies to individual fields in their own context. Take for example the payload_size field that was just added: what is the unit specifying this value? One may assume bytes, but one shouldn’t have to — and this makes the difference between a good protocol and a lacking one. What about, besides the time unit, assuming the timezone of timestamp? It proves much more practical to name these fields appropriately with these considerations in mind, e.g. payload_size_bytes and timestamp_ms_utc.

One other common point of ambiguity is fields being required versus being optional. The proto3 standard has deprecated marking fields as required for backward compatibility reasons, effectively allowing any field to be left unset and interpreting such cases as carrying the default value for its given type. Some practices will be covered later in this article to help make your protocols more explicit and more appreciated by engineers who will be using it.

To zoom out again, a more correct way to go about structuring this message is by using oneof, the protobuf version of the familiar concept of a union. A protobuf oneof unites a set of fields to ensure that no more than one of them is set at the same time, and also to improve data transfer efficiency by saving on the payload size (since unset oneof fields do not get serialized):

message Event {
uint64 id = 1;
uint64 timestamp_ms_utc = 2; oneof data_kind {

EventDataA data_a = 3;

EventDataB data_b = 4;
EventDataC data_c = 5; }}message EventDataA {}message EventDataB {}message EventDataC {

uint32 payload_size_bytes = 1;

}

Now the EventData messages serve both to distinguish between event kinds, and to contain the respective fields specific to a certain kind but not others. Even if some of them are going to share a common subset of fields, the duplication of declaring them individually is worth it compared to the ambiguity in the other approach. Unlike with enums, the oneof approach is self-documenting, and it doesn’t leave much room for error interpreting how the message should be formed.

Note, however, that the structure of our message had to change in a major way to allow this upgrade, which is another key thing to keep in mind — let’s call this the principle of extensibility. This doesn’t only apply to avoiding convoluted field semantics. Let’s say at some point the system is migrated from using integer IDs to UUIDs. It becomes a problem since the id field is already locked in as uint64, and we are forced to deprecate and declare a new one; whereas having it as string from the beginning would allow a smooth transition to virtually any ID scheme. While it’s impossible to predict and stay safe from all potential breaking changes, there’s a few common pitfalls in protobuf, which often revolve around changing the type of a field and rearranging oneof groupings.

To recap, the key principles of protocol design that we just outlined are:

Clarity: A well-designed protocol should define its messages in a way where it’s not only explicit about which fields must be set. This prevents missetting any of the messages during implementation. In other words, good protocols leave no ambiguity for its implementers.
Extensibility: It is crucial that protocol structure is built with future vision and potential roadmap in mind. This way, some foreseeable additions and breaking changes can be accounted for in advance.

These ideas are quite applicable to classic software development. However, protocol design features greater constraints in comparison and therefore puts greater emphasis on the above principles.

Best Practices

Besides the broad principles, let’s go over some practices that help avoid typical pitfalls in protocol design.

Unknown enum values

It’s always a good idea to declare the 0-th element of an enum as “unknown” to ensure backward compatibility. When an enum without one is added to a message definition, earlier implementations that came before this addition will produce messages for which the fields of its type will be interpreted by newer implementations as 0. To use the Event.Kind example from earlier:

enum Kind {
EVENT_KIND_A = 0;
EVENT_KIND_B = 1;
}

The above definition should become:

enum Kind {
EVENT_KIND_UNKNOWN = 0;
EVENT_KIND_A = 1;
EVENT_KIND_B = 2;
}

This way, coming back to the clarity principle, it’s unambiguous and implementation-agnostic when the enum value is set.

Well-known types

Once your team has used protobufs for some time, you might notice that some field types commonly pop up across your protocol surface. There are some that are commonplace enough that the Protocol Buffers development team made them part of the language itself, such as Duration and Timestamp among other, more specific ones. Indeed, going back to the event message example, the uint64 timestamp field can — and should — be replaced with a google.protobuf.Timestamp, fitting right in line with our clarity principle. Some might not be available out of the box, and it’ll be at your team’s discretion to add and standardize your usage of them, for example a reusable LatLng type for geospatial coordinates.

The full list of default well-known protobuf types is available here in the official documentation.

Explicit optional fields

A bit of historical context: in the proto2 protobuf standard, both required and optional fields could be marked with a namesake label. The required label was enforced strictly by the compiler which proved hugely problematic in the long run, because it was nearly impossible to safely change a required field to be optional. In the proto3 standard, the required label was dropped entirely and starting with protobuf 3.15 (2021), the optional label was added. The distinction shifted to fields being explicitly optional vs. implicitly required (having no explicit label). The value of marking optional fields is in the ability to check them for presence in a serialized message. Let’s say that, in the event message example above, we need to distinguish between the payload size value being 0, and being absent. With its current state:

message EventDataC {
uint32 payload_size_bytes = 1;
}

Querying from a formed message like this:

if event_pb.WhichOneof('data_kind') == 'data_c':

if not event_pb.data_c.payload_size_bytes:

handle_payload_size_absent()

Is error-prone, and can be quite misleading — since primitive types get initialized to a default value, it’s impossible to tell whether a field was absent or equal to default value. However, with an optional label like so:

message EventDataC {
optional uint32 payload_size_bytes = 1;
}

The .HasField method can then be used on the EventDataC instance:

if event_pb.WhichOneof('data_kind') == 'data_c':

if not event_pb.data_c.HasField('payload_size_bytes'):

handle_payload_size_absent()

Note: Since protobufs were adopted at Lyft prior to the introduction of optionals to the language specification, our convention for optional primitive types is to use wrappers from the google.protobuf package.

Validation rules

When considering principles of protocol design, we are big fans of explicitly stating a field’s constraints within its message’s broader context. For this we’re taking advantage of the protoc-gen-validate plugin (PGV).

Note: Since recently, PGV has reached a stable state and has been succeeded by protovalidate. While the general idea remains the same, consider using the modernized solution when getting started with validation.

A list of useful validation rules for common types is provided in the bulleted section below. Please note that some level of familiarity with protobufs is assumed.

oneof validation: By default, and somewhat counterintuitively, none of the fields declared under a oneof have to be set. A neat validation rule exists to enforce one of the fields to be present in a formed message: option (validate.required) = true; and needs to be declared alongside with the oneof members.
Generic message validation: (validate.rules).message = { … }
· required with a boolean value is self-explanatory and extremely useful.
enum validation: (validate.rules).enum = { … }
· Prior to proto3, enums were treated as “closed” — meaning that fields of their type could only store the defined values. This produced undefined behaviors, and “open” enum behavior was introduced with proto3, making it valid for fields to be set to values other than the ones listed in the enum definition. defined_only is useful for enforcing that an enum field is effectively “closed” and will only carry expected values.
· in allows you to specify the collection of acceptable values for the given field.
· not_in is also extremely handy. The obvious example is to set it to [0] — enforcing cases when the unknown value is not acceptable.
string validation: (validate.rules).string = { … }
· min_len with value 1 is great for enforcing a non-empty value to be set for the field.
· Well-known string formats are handily available for validation, including email, ip (and ipv4 and ipv6), uri, uuid, among other ones.
· pattern allows you to define a bespoke regex to fit your validation needs.
repeated validation: (validate.rules).repeated = { … }
· min_len with value 1 is great for enforcing that the collection is not empty.
· items allows individual values to be validated against their given type, e.g. items: {enum: {not_in: [0]}}
· unique set to true is useful for validating set-like collections.
map validation: (validate.rules).map = { … }
· min_pairs and keys & values work exactly like min_len and items for repeated fields, respectively.
· no_sparse is good for validating that, for maps with non-primitive value type, values must be set.

Pro tip: Validation also works on the wrapper types with the same rules as for their respective wrapped types, e.g. a google.protobuf.StringValue field can be validated with (validate.rules).string = { … }.

An exhaustive definition of all validation rules (declared in protobuf syntax themselves!) is available in the validate.proto source.

Note: It is important to understand that the generated validation methods still need to be called manually — if a message is formed in violation of the stated rules, nothing will fail until its validator is invoked! A snippet for validating a formed protobuf message will look like this:

import protoc_gen_validate.validator

from your_protobuf_namespace_path.event_pb2 import Event as EventPB

event_pb = EventPB(...)

try:

protoc_gen_validate.validator.validate(event_pb)

except protoc_gen_validate.validator.ValidationFailed as ex:

raise ValueError(f'Protobuf validation error: {ex}')

Cross-entity constants

In some cases, various code points across different services or even domains (e.g. client app and server) may need to refer to the same constants. Protobuf definitions can lend great help in aligning these constants across all entities. Although it’s not an explicit feature of the language, this effect can be achieved using custom options:

import "google/protobuf/descriptor.proto";extend google.protobuf.EnumValueOptions {

string const_value = 11117;

}

enum EventTag {

EVENT_TAG_UNKNOWN = 0 [(const_value) = ""];

EVENT_TAG_1 = 1 [(const_value) = "#tag1"];
EVENT_TAG_2 = 2 [(const_value) = "#tag2"];
}

Then the values can be accessed through the enum descriptor:

from your_protobuf_namespace_path import event_pb2tag_name = event_pb2.EventTag.Name(event_pb2.EVENT_TAG_1)tag_descriptor = event_pb2.EventTag.DESCRIPTOR.values_by_name[tag_name]tag_options = tag_descriptor.GetOptions()

tag_value = tag_options.Extensions[event_pb2.const_value]

Or, compacted:

tag_value = event_pb2.EventTag.DESCRIPTOR \ .values_by_name[event_pb2.EventTag.Name(event_pb2.EVENT_TAG_1)] \ .GetOptions() \

.Extensions[event_pb2.const_value]

Note: It’s recommended to exercise caution when using this technique. It is most suitable for cases where the constant values are never expected to change, or where you have complete control over deployment of entities that will be consuming the protocol.

Language-dependent behaviors

The “Getting started” section in the official documentation is a good entry point to language-specific protobuf work, covering the basic setup as well as more nuanced details like exact type mapping, ways of parsing messages, properties of the entities generated from the protocol definition, etc. This is important because certain behaviors differ across languages (from namespace structuring and naming to implementation details, i.e. when a key in a map field has no value, it being serialized with the default value in some languages and omitted in others), so knowing your target language stack you can always find the right steps to ensure correct behavior.

Conclusion

In this article, we’ve explored the intricacies of working with Protocol Buffers from a collaboration standpoint. In the end, our protocol might end up looking like this:

syntax = "proto3";

import "google/protobuf/descriptor.proto";

import "google/protobuf/timestamp.proto";
import "validate/validate.proto";extend google.protobuf.EnumValueOptions {

string const_value = 11117;

}

enum EventTag {

EVENT_TAG_UNKNOWN = 0 [(const_value) = ""];
EVENT_TAG_1 = 1 [(const_value) = "#tag1"];
EVENT_TAG_2 = 2 [(const_value) = "#tag2"];}message Event {

string id = 1 [(validate.rules).string = {min_len: 1}];

google.protobuf.Timestamp timestamp_utc = 2 [(validate.rules).timestamp = {required: true}]; oneof data_kind {

option (validate.required) = true;

EventDataA data_a = 3;
EventDataB data_b = 4;
EventDataC data_c = 5; }}message EventDataA {}message EventDataB {}message EventDataC {

optional uint32 payload_size_bytes = 1;

}

To recap the key takeaways:

Clarity and Extensibility: We’ve emphasized the importance of designing protocols that are self-explanatory and flexible enough to accommodate future changes. This approach minimizes ambiguity for implementers and reduces the likelihood of breaking changes.
Best Practices: We’ve covered several useful practices, including:· Using unknown default enum values· Leveraging standard well-known types· Setting optional fields intentionally and explicitly· Implementing validation rules

· Declaring cross-entity constants when appropriate

There are many other useful practices that aren’t mentioned in this article, that may or may not apply to your team depending on the given use case and language stack. For an extensive list, refer to Proto Best Practices and API Best Practices from the official documentation.

And one more thing: Lyft is hiring! If you’re passionate about developing complex systems using state-of-the-art technologies or building the infrastructure that powers them, consider joining our team.