What does a good automated test look like?

Recently, I did a large refactor to a core component of an e-commerce SPA web application in React. Notably, nothing in the functionality was being changed, this was just a code cleanup, and the goal would be for all of the original tests to pass after the changes were made.

Yet, not one single automated test passed after the refactor was completed. It’s not that anything was wrong with the changes, the component worked perfectly. Every piece of functionality worked exactly the same. It was the tests that were the problem.

This got me thinking, how do we know what is or isn’t a good test? How can we write tests so that they don’t break when code is moved around or renamed? That’s going to be the subject of this post.

Some definitions

Let’s start by defining some key words around testing. This is going to allow us to grasp on a more abstract level what a test is and should do, and help us set the foundation for an intuitive sense of what is or isn’t a good test.

If you’re familiar with most of these terms or just want to skip to the conclusion, feel free to go to the “What makes a test bad?” section.

Features

First off, we have to ask what it is we’re actually testing. At the end of the day, we want our features to work, this is the business ask that we are completing. Features however, can almost never be tested in a few simple cases. Generally, there are several different flows the user can go through for a given feature.

User story

In the traditional use of the word, a user story is a flow that the user can go through to complete some action, and features are broken down into the various user stories. For instance, if a user can select a cash payment plan and continue, versus calculating a finance plan, adding it and then continuing, we have two different user stories for financing a purchase.

Test case

Now we get to the fundamental unit of testing, the test case. This is one particular instance of one particular flow (user story) in a feature. In other words, there can be multiple (in fact infinite usually) test cases for a particular user story.

For instance, a user could select their finance plan in the story above, and set the deposit to $5,000, or they could set it to $50,000, the second case might test a situation in which the user’s deposit is too high. This is an example of two test cases for one user story.

Types of tests

It’s important to know what kinds of tests exist so that you can write your tests well for that particular type of test. While writing tests, keep in mind the test type outlined below and what the purpose of each test type is. There is some ambiguity in all of these terms, so don’t worry if there seems to be some overlap.

Unit test

First off we have unit tests. Unit tests test one specific unit of the code. This usually means one method, or even one branch of a method. The goal of these tests is to ensure that inputs produce the expected outputs, and to reach as many branches of code as possible. These do not typically require formal test cases to document.

Functional test

Functional tests test a specific piece of functionality, and they are usually one-to-one with the test cases as outlined above. This means it tests a single user flow of one part of a feature. Rather than trying to cover all branches of code, these tests typically try to cover the most important and common use cases.

Integration test

These are similar to unit tests, but they test several units working in cohesion. Like unit tests, this typically refers to units of code such as a method, module, etc. In the previous example, this could be like testing that your finance API applies finance data correctly to a user’s cart when they select a variety of products from a product feed. The two units being integrated are the finance and product APIs.

What is the goal of a test?

To know how to write good tests, we should know what they are actually supposed to accomplish, then we can define guidelines which help point us in the right direction.

Prevent regressions

We don’t want new features to break old code, so we have to test that the basic functionality works with every major change. These are the most basic and fundamental types of tests.

Validate new features

Tests should cover as many cases as possible so that code is known to behave according to the business requirements before it sees a QA environment.

Catch mistakes and improve code quality

Tests are also there to CYA if you make a silly typo or forget something obvious while you’re working in the code base. Small mistakes like this shouldn’t end up in QA.

On the code quality side, writing tests will often cause you to consider cases which were previously not considered, as you write a test and notice that there are other options available and paths that could be taken. Writing things out formally in test cases helps with this.

What makes a test bad?

So given all of this, why were the tests I worked on so bad? How could the same problem be avoided in the future?

Tested internal implementation, not functionality

It’s often okay for unit tests to test the specific implementation of a feature, especially in libraries that change infrequently. However, if you’re doing front-end work, or any kind of web or SPA application, your code will change frequently, or it will eventually fall apart. You’re not agile if your tests are preventing code changes.

In the case of the tests I reworked, each test modified the internal state of a UI component, then checked that specific parts of specific methods were called with specific values, then checked the internal state of the UI component again.

This is awful because these tests should have been functional tests, not unit tests. You have to consider the goal of the tests you’re writing. In the case of UI work, the goal is not to check the inputs and outputs of a component’s methods, nor is it to check the internal values of your UI code. The goal is to test the functionality of the component, because the internal implementation of that functionality is a detail that is unimportant at the end of the day. Business does not care how your code is written, users do not care how your code is written. Everyone cares if it meets the stated requirements.

This internal testing of the component’s methods made it extremely fragile, and as soon as it was refactored and the internal representation changed and became inaccessible, all the tests failed despite the functionality remaining the same.

Tested the way the code was written, not functionality

Similar to the last section but a bit different, rather than testing a single test case, such as a user entering a certain deposit, tests had names like “set deposit, state set and methods called as expected”.

No-one on Earth can tell what this means. No-one has a clue if this test passing means anything, and I certainly couldn’t rewrite this test to pass once the code was changed. Tests have to be meaningful and reflect functionality that can be tested regardless of the way the code is structured internally.

The assertions for these types of tests often just duplicated the code itself, checking that functions were called in a certain order with certain parameters.

Tested the libraries being used, not the domain specific code

The tests tended to test things like state being set by React, or props being passed through correctly. You do not need to test your underlying libraries, especially when they have a massive development team behind them, test the domain-specific code. Validating that setState works or Date.toString() formats correctly is the job of Facebook and Chrome respectively.

What makes a test good?

Well let’s summarize. And keep in mind, we’re talking about automated tests for a UI-heavy application. These rules are not going to apply for dishwasher microprocessor code.

Tests should be written with their purpose in mind
Tests should test domain-specific code
Tests should be associated with user interactions / feature functionality
Tests should not duplicate the codebase or test implementation details
Tests should not be fragile

The last one is bit of an art more than a science, but if you find that you’re constantly rewriting tests when the functionality hasn’t changed, that’s a strong sign that something is wrong with your tests.

How would I fix these tests?

Well looking at the bullet points above and the “TO-NOT-DO” list below it here’s what I would do.

I would focus on the functionality of the component and the features it implements
I would test only the code that is specific to the application
I would clearly label what the test is doing, and what the expected outcome is
I would not test that specific code is being ran in a certain order, unless that is the best way to test underlying functionality
I would not test the internal state of the component, i.e. things not visible to the end user

I hope this helps! There is a lot more detail and nuance to go into, and many books have been written on just this topic, but hopefully this provides a good synopsis of common problems when automating UI tests, without going into technical details that can be very different from project to project. As always feel free to leave feedback or to reach out with any questions.