Write tests smarter, not harder

Published in

Booking.com Engineering

6 min readNov 16, 2022

In my career, I’ve seen many times how teams started with automated testing. Not all attempts were successful. In this post, I’m going to share a few tips on creating a culture of automated testing in your team, and shaping the journey from zero-tests to a reliable set of tests at different levels.

A common way in which some teams approach automated testing is that they set up a target, something like: “In this quarter, we will increase test coverage to X percent”.

I think this is a suboptimal way to achieve higher quality. The ultimate goal is not about having a percentage of lines covered with tests. The goal is to have a fast feedback loop for new changes made to the code, for the whole lifespan of that code base.

I’ll repeat this, since I’ll be referring to this throughout the post:

“The goal is to have a fast feedback loop to validate new changes made to the code, for the whole lifespan of the code base”

Let’s break this down with an analogy. The goal shouldn’t be to exercise in this quarter for the next beach season, the goal should be to stay in good shape for the rest of one’s life. It’s not about vigorously brushing your teeth for one day once a month, it’s about having a consistent practice of mouth hygiene.

Similarly, I see automated testing as a set of habits the team follows. And an important aspect of sustaining new habits is to identify the benefits of the new behaviour sooner.

For the rest of the article, I’ll share my learnings on how to develop these habits faster.

The very first thing you should do is…

Understand the expected lifespan of your code base.

What is the expected lifespan of the codebase your team is currently working on? How long do you expect code to stay in production? These questions might seem unrelated to the discussion about testing, but they matter a lot. This is because the lifespan of the codebase will define where you expect changes to happen. Based on my experience, it’s quite reasonable to assume that a business application will live for 5–7 years. Sometimes, it’s even more than 10 years. Now think what may change in 10 years:

Hardware code runs on
Operating system
All technologies your software depends on will get major updates
All libraries and frameworks will be updated
Language version
Tooling
Build process
Deployment process

This list is by no means complete, and it doesn’t even cover any changes made to the code itself due to the new requirements.

You might see how popular open-source projects take these changes into account, so they are being tested on different OS versions and different hardware as part of their CI pipeline. And that is totally reasonable for their lifespan. Another example could be writing tests for technologies your code heavily relies on. Not because you want to test someone else’s code, but because you want to have a safety net when you update your Kafka/Postgres/whatever to the next major version.

The intent here is to make the team think about what they actually should be testing. Now, let’s get closer to the actual testing. Imagine you already have a codebase, and it has zero tests. Where would you start writing tests?

Identify hotspots that change often.

Transitioning from zero tests to a well-tested code base takes a lot of effort from the team. Therefore, it is important to get the benefits of automated tests as soon as possible. I’ve seen a recurring pattern: a team sets a target to reach 70% of line code coverage. After a while, once they reach that target, the team still doesn’t feel any value of the tests they wrote: the post-launch defect rate is still high, the rollout success rate remains the same. Eventually, the team develops resentment against automated testing — “we’ve tried this, it doesn’t work for our team”.

I understand why some folks give up. They had to carve time for writing tests, negotiated technical debt with the product owner, and invested a considerable amount of effort. Yet, they didn’t get much from this investment.

An alternative approach is to identify where changes happen more often and start writing tests to cover these hotspots. Remember, the ultimate goal of automated tests is to provide a fast feedback loop to new changes. So, it’s worth thinking about where changes happen. Luckily, there is a tool for that — `git effort` from the package `git-extras`. Here, how it works.:

~ git effort **/*.go --above 10

 path                                      commits active days

  assert/assertions.go........................ 223      163
  assert/assertions_test.go................... 145      108
  mock/mock.go................................ 106      81
  mock/mock_test.go........................... 62       54
  suite/suite.go.............................. 46       37
  require/require.go.......................... 45       39
  assert/assertion_forward.go................. 44       38
  require/require_forward.go.................. 43       37
  assert/assertion_format.go.................. 36       31
  suite/suite_test.go......................... 34       26
  assert/doc.go............................... 19       17
  assert/http_assertions.go................... 18       15
  assert/forward_assertions_test.go........... 17       17
  require/requirements.go..................... 16       15
  assert/forward_assertions.go................ 16       16
  assert/http_assertions_test.go.............. 15       9
  assert/assertion_order.go................... 11       10
  _codegen/main.go............................ 11       10

That’s an example from an open source project. It shows how many commits each file received, and the number of days of active development for each file. From the example above, you might see that change frequency is not equally distributed. Some files change almost every day, while others change a few times per year.

And this is one of the reasons why code coverage is a misleading metric — it ignores the change frequency. You might achieve a high test coverage, but it will give you nothing if hotspots remain uncovered.

So, make sure you first cover with tests code that changes often. This way, your team will notice the benefits of automated tests sooner.

This tactic helps you choose where to start — but there is more work that needs to be done to sustain new habits. So, let’s talk about how to make sure your tests actually help you to validate incoming changes.

Do not limit your tests only to a happy path.

When going with several teams, from no-tests to a good test automation, I noticed that in the beginning, folks think about tests as a binary thing — “we have tests for X” or “we don’t have tests for X”.

Later on, folks start asking themselves how good their tests are. Again, I think this is a bad influence of a test coverage metric, which only captures if a given line of code is covered with a test case. But let’s not confuse this with the possible variations that code execution can take due to logic branching or input parameters.

I’ll use the term “test completeness” to describe how well all possible paths of execution are covered with tests. To show the difference between code coverage and test completeness, let’s consider the next example:

I have a simple function that adds two numbers:

func Sum(a, b int) int {
    return a + b
}

To achieve 100% code coverage for this function, it’s enough to write a test that calls it with any parameters. `assert.Equal(Sum(1, 2), 3)`. 100% of lines covered! But does it cover all possible scenarios?

No, there are much more possible variations depending on input parameters:

# these are just basic school math
assert.Equal(Sum(1, 2), 3)
assert.Equal(Sum(1, 0), 1)
assert.Equal(Sum(3, -1), 2)
assert.Equal(Sum(-1, -1), -2)
assert.Equal(Sum(-1, 0), -1)

# there are more language specific test-cases
assert.Equal(Sum(math.MaxInt, 1), math.MinInt)
assert.Equal(Sum(math.MinInt, -1), math.MaxInt)

So instead of asking whether something has tests or not, question how complete these test scenarios are.

The example with summing up two numbers might look artificial, but according to the research “Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems”

Almost all catastrophic failures (92%) are the result of incorrect handling of non-fatal errors explicitly signaled in software.
A majority of the production failures (77%) can be reproduced by a unit test.

As engineers, we should develop an appropriate level of paranoia when it comes to thinking about what could go wrong and designing test cases to cover for those failures.

Conclusions

It might be tempting to set up a measurable target to see the progress in adoption of automated testing. However, this might lead you to hitting the target, but not achieving the goal.

An alternative approach is to not chase a line coverage metric, but instead ensure good test completeness on the code that changes often.

Then, when the team has a safety net for the code that changes often, they can focus on getting real value out of the automated tests:

Is it possible to improve the deployment success rate by adding more tests?
Can bugs be reproduced by an automated test, rather than following manual steps?
How might we catch issues earlier next time we do a major version update for our dependencies?

Answers to these questions might not produce easily measurable targets, but it will create more meaningful steps towards your own testing strategy.

Write tests smarter, not harder

Written by Maxim Schepelin