The evolution of event data collection at Vimeo, part 1: The Fatal Attraction era

Our original event tracker racks up a mixed record.

Traci Mathieu
Vimeo Engineering Blog

--

Here at Vimeo, we place a premium on data. Our data engineering organization uses event tracking to collect billions of data points about how our users interact with our products. This information informs the decisions that we make about how to improve our user experience.

Our original internal event tracking system, which is no longer in use but which I want to talk about anyway, went by the name of Fatal Attraction. If that sounds familiar, then you’ve probably seen the film where Glenn Close’s character takes a turn for the crazy after an extramarital affair with Michael Douglas’s character. Our own (data) love affair started off a little differently when the very first Fatal Attraction event was sent in January 2015.

At the center of this legacy system was a JavaScript client that we used to embed event tracking code into vimeo.com. HTTP POST requests sent the event data to an HTTP server, which was our internal logging framework. The data ended up in a Kafka topic and was later cleaned and processed with Airflow before being stored in Snowflake, our data warehouse of choice. The data would then be used for analysis in Snowflake and Looker.

Let’s roll back the clock about four years and have a look at Fatal Attraction and explore the reasons why we eventually retired it.

About the FA specification

Fatal Attraction, or FA to us, was designed to collect data about page interactions like clicks, page views, and impressions. We had a specification for FA events where the payload consisted of a subset of the following event tracking elements:

  • Name. This was the name of the event.
  • Component. This was the component or page associated with the event.
  • Container. This was the container or UI element associated with the event.
  • Copy. This was the label text on the container.
  • Keyword. This optional value supplied additional information about the event not covered by the other event tracking elements.
  • Target. This was the destination page after the interaction.
  • Type. This was the interaction type, with values like pageview or click.

FA events were structured in the sense that this spec existed. However, there was no validation whatsoever on the values passed in for each attribute. Stakeholders were free to supply whatever values they wanted. The semi-structured nature of events was one of the key defining characteristics of FA, and this came with major pros and cons, which I’ll get into a little later.

About the JavaScript client

In the world of FA, there were two different ways to implement new events with the JavaScript client. The first was an attribute-based implementation methodology. The second, which we made as a response to the first, was a programmatic implementation methodology for FA events.

The attributed-based implementation methodology took advantage of HTML attributes to send FA events. There was a global tracking utility that was responsible for listening for clicks on the document. When the global tracking utility detected a click, it fired a tracking pixel based on the event target’s attributes.

An example of the use of HTML attributes to implement an FA event is as follows:

<a data-fatal-attraction=”container:homepage|component:topnav|keyword:new_upload”>
Home </a>

The main advantages of this implementation methodology was that it was efficient, straightforward, and easy for developers to implement.

One major limitation was that it was difficult to send dynamic values for FA event payloads. Another disadvantage was that it relied on a central delegating event handler on the document node and was based on the assumption that all clicks would bubble up on the Document Object Model. However, due to some of the tools that we used and the nature of some of the code that we wrote, we couldn’t always have events bubbling up. Due to these limitations, we created another method of event tracking that was more programmatic in nature and that involved the use of JavaScript function calls to collect the different types of event data that could be generated by user actions:

  • FatalAttraction.trackPageview
  • FatalAttraction.trackClick
  • FatalAttraction.trackImpression
  • FatalAttraction.trackEvent

See Figure 1 for an example of the use of a JavaScript function call to implement an FA event.

Figure 1. This is an example of the implementation of a click event with the FatalAttraction.trackClick function call. Arguments are passed in for the parameters container, component, and keyword.

Each time stakeholders wanted to send a new FA event, they would need to implement it with an explicit JavaScript function call.

This implementation methodology was more verbose than the attribute-based implementation, which led to code bloat. As more product teams started using FA, the size of the codebase increased quickly and significantly.

Weighing the pros and cons

One of the main advantages of FA was the ease of event implementation, which product managers appreciated whenever they wanted to collect data for a new product or feature release. And product teams loved FA because it made collecting data about page interactions a breeze.

However, the lack of validation and the limitations of the FA spec posed considerable challenges downstream. Our business intelligence and marketing organizations valued (and still value) having a single table with all the events and columns that they need for analysis. While all of the FA data was stored in a single table, the data was difficult to understand and not ready for analysis as is. Since the event payload fields were limited to those specified by the FA spec, such as component, container, and keyword, the data had to be parsed and processed before the required information could be extracted and used.

In an attempt to solve this problem, we created several extract, transform, and load operations, or ETLs, to clean up the FA event tracking data. As you can probably imagine, this was a recipe for disaster in the form of spaghetti code. The ETLs consisted of hundreds — sometimes thousands — of lines of repetitive SQL and many case statements. One ETL, for example, had a whopping 3,570 lines of SQL statements. Figure 2 shows a small snippet of code from this ETL.

Figure 2. This is an example of one of the many complex SQL statements from an ETL for processing FA event tracking data.

On top of that, the ETLs were often slow to run. Even if we made optimizations to speed up the ETLs, they would still be complex due to the nature of the data and difficult to maintain. These were some of the primary reasons we decided to work towards deprecating Fatal Attraction in favor of a new event tracking system.

Another challenge we faced with FA was the ownership of upstream regressions. The flexibility of FA allowed for the fast implementation of new events. However, when a new event was pushing bad data into the pipeline due to incorrect implementation, it was difficult to figure out which team to contact to begin resolving the issue.

But what finally convinced us to move away from FA was that it was a web-only system, with its single JavaScript client for event collection. We weren’t able to use it to collect event data that would help us improve user experience for our products on non-web platforms, such as mobile. (For that, we relied on Localytics, if you’re curious.)

Based on our experiences with FA, we knew that we wanted our new event tracking system to decrease the need for heavy processing of event tracking data downstream and make it easier to provide clean, quality data from multiple platforms to our product managers, data analysts, and data scientists. After much consideration, we felt that an event tracking system centered on structured events would make our lives, and those of our stakeholders, much easier.

From the lessons we learned from our journey with FA, we established the following requirements for whatever came next:

  • The ability for stakeholders to define custom fields for events beyond those given in the FA spec.
  • Event validation, such as data type validation.
  • Support for multiple platforms, including web and mobile.

To buy or to build?

Before we committed to the immense task of retiring FA, we decided to explore our options. Given that we collect tens of billions of events every month, it was incredibly important that the solution we selected could scale in a financially suitable way.

After seeking out quotes from several companies, we determined that an in-house solution was the right one for us. That way, we could handle a large volume of events without incurring excessive bandwidth costs, and we’d enjoy the added benefit of having an event tracking system uniquely built to address our specific needs.

Which brings us to Big Picture

When we conceived the solution that would become Big Picture, we envisioned a world where event fields are explicitly defined and regulated at the source. This would eliminate the need for maintaining and scaling complex ETLs that prepare data for analysis. Instead, our stakeholders would be able to spend more time doing analysis because of the inherent cleanliness and overall integrity of the data.

We also hoped that Big Picture would bring unity to our data org. We wanted a platform-agnostic event collection solution that would be flexible enough to send data to multiple third-party outputs like Amplitude, Snowflake, BigQuery, and Anodot.

Stay tuned to find out whether our brainchild lived up to all our expectations!

Care to join us?

Check out Jobs at Vimeo for our current offerings.

--

--