Designing your test methods using a simple structure such as given-when-then will help you:

  • Communicate the purpose of your test more clearly
  • Focus your thinking while writing the test
  • Make test writing faster
  • Make it easier to re-use parts of your test
  • Highlight the assumptions you are making about the test preconditions
  • Highlight what outcomes you are expecting and testing against.

In this post I’ll be talking about designing your test cases/test methods using given-when-then.

It doesn’t matter if you are using pytest, unittest, nose, or something completely different, this post will help you write better tests.


Note: This was originally a writeup done after the Python Test Podcast episode 10. However, I think it stands pretty good on it’s own as a post.


Structuring your test not only makes it easier to read, it makes it easier to write and reuse.
I’m really excited to get into this.
But first, I’d like to take a moment to thank the show sponsors.


Designing your test cases using given-when-then.

What I’m talking about here is the test functions and methods.
Not the structure of your entire suite, but the individual tests.

This applies to any test framework. But I’m going to assume pytest for now so I don’t have to keep saying “pytest or unittest or nose or something else”.

Pytest doesn’t care what you put in your test functions and methods.
And it doesn’t really care what goes into the setup and teardown functions and methods, or test fixtures.

The syntax and mechanics for all of that is pretty straightforward.
If the syntax or mechanics trip you up, don’t feel bad.
Just bookmark a good reference.

But once you’ve got the mechanics down, you can put whatever you want in there.
If a fixture hits an assert or an exception, then your test will end in Error.

If the test function or method hits an assert, your test will fail.

Great. There’s the mechanics.

But it still seems like a blank canvas.
An empty page.
What should you put there?

Well. Just like artists and writers are freqently aided in their creativity by following a familiar structure, so too can a test writer use structure to not only get past the blank page, but also achieve quite a few benefits.

Structure

Let’s talk about the structure first. And then we’ll cover some of the benefits.

There have been many different structures or outlines proposed as good models for writing tests.
The models I’m familiar with really all seem like the same thing with different names.

Given-When-Then

The model I use now and love the most is given-when-then.
It’s just so darned easy to remember.
And it puts me in the right mindset for thinking through my tests, expanding the tests, and reusing parts.

It’s pretty basic.
Given some context for your test to run in.
When some action happens.
Then some consequences are expected. Either output from the action, or side effects that can be tested.

A simplistic way to start is to just separate the code in the body of your test functions into 3 visually separate chunks.
I usually separate the sections with a blank line.
You can also put a comment at the top of the sections with these very keywords.
If you are writing whole sentences in the comment, maybe put given, when, and then in all caps.

def test_something():
    # GIVEN a mobile is registered
    ... some source code ...

    # WHEN a test mode data connection is initiated
    ... some source code ...

    # THEN the call should connect
    ... some source code ...

This given-when-then structure is borrowed from BDD: Behavior Driven Development.
I think it’s the only thing I’ve taken from BDD.
BDD has a lot of baggage that I’m not quite ready to deal with yet.

But I love given-when-then.

This especially becomes super powerful if you don’t even put the GIVEN in the test function/method proper.
Put it in a fixture.
Put it in setup for a class or module.
Or better yet, put it in a named pytest fixture.

The power of putting the GIVEN in a fixture is that if you can’t get through the GIVEN portion (say an assert is hit), then the test doesn’t end in Failure, it ends in Error.

And also now you’ve just got two halves to your test body proper, the WHEN and the THEN.

For some tests, the GIVEN will be setting up test data.
But it could also be getting the system into the proper state.
For me, when testing embedded electronic instrument code, the GIVEN or setup, is configuring RF ports, setting cable losses, loading arbitrary waveform generators, or many fun things like that.

Or it could be empty.
If the action you are taking should have the same effect regardless of the state of the system under test, then there’s nothing to put there.
I suggest being explicit and putting a comment like

# GIVEN any state

before moving on to the next sections, and to ensure that future test maintainers know that you did think of what the preconditions were.

The WHEN portion is really what we’re testing with this test.
The WHEN section should be very readable and very obvious what’s going on.
This is the section that people are (or should be) referring to when they say “a test should test one and only one thing”.
The WHEN section should be doing one thing.

Even if that “thing” is complex. It should be something that a user would think of as doing one thing.

The THEN section is where you:

  • check the post conditions.
  • look for the observable side effects
  • and where all of the asserts are

Some people will tell you to only have one assert per test.
That’s rubbish.
They are talking about a very narrow definition of TDD, a definition that doesn’t include all of the levels of testing that I concern myself with.

If your action from the WHEN section has 15 observable side effects and a function output, then by all means, go for it and put 16 assert statements.

I usually only have a few really.
But this totally depends on the test, what you are testing, your domain, and many, many other factors.

Let’s take a break from Given-When-Then and discuss the other names for test design structure.


Setup-Test-Teardown (or also Setup-Exercise-Verify-Teardown)

This is for the most part just like given-when-then with an additional teardown step.

  • Setup == Given
  • Exercise == When
  • Verify == Then

When written as setup-test-teardown, the Test portion is both the WHEN and the THEN.

So what’s Teardown.

Well, for a lot of you, it’s nothing. Empty. Nothing to do.
It will be something important when you really need to undo whatever you did in the setup or in the exercise portion.
Let’s say you are testing a transactional system.
You can use the teardown to roll back the transactions to the state before the test started.
In my case, I might break a data connection with a mobile device, or make sure power levels in the system are at safe levels, or reset a switch matrix to safe paths through the system.

The teardown step is present in given-when-then as I use it. It’s just not the hard part, so I don’t mind not having it explicitly part of the name.

When using pytest named fixtures, you will write the teardown as part of the fixture itself. Well, right with it anyway, in the form of a finalizer function, so the test proper doesn’t have to think about teardown.

Of course, if the test proper, the When section say, needs something undone in the teardown, we need to make sure that happens even if an exception or assert causes the test function to not complete.
A great way to do that is to have a fixture that doesn’t really have any setup action, but just has a finalizer.


Arrange-Act-Assert

Another common name for this structure is Arrange-Act-Assert

Now this should be really obvious how it maps.

  • Arrange == Given
  • Act == When
  • Assert == Then

Come to think of it, I kinda like Act better than when.
Maybe “given-act-then”? no. maybe “given-act-assert”. Well, we lose the alliteration, and I won’t remember it.
But I seem to have no trouble remembering given-when-then.
So I guess I’ll stick with that.


preconditions-trigger-postconditions

Let’s see. Any other names?
An older one is preconditions-trigger-postconditions.
That’s not bad.

Again:

  • given == preconditions
  • when == trigger
  • then == postconditions

That’s pretty good, actually.
But I’m not writing those in comments.
Too much typing.
And I can’t make a sentence out of it.


Benefits

Ok. I promised I’d talk about the benefits of using a pattern like given-when-then or whatever variation that we’ve discussed that makes the most sense to you.

Splitting up your test functions like this (let’s assume given-when-then and of course the optional teardowns or finalizers), has many benefits.

Hopefully these will make sense.

Communicate the purpose of your test more clearly

Having the WHEN section simple and separated by whitespace will highlight for you and for others reading your code what you are testing.

Especially if the test method name directly relates to the action in the WHEN section, it really helps clarify what you are trying to test.

If the name seems too long, or there’s too much code in your WHEN section, just review it.
Should this really be one test? Or should you split it up into more than one.
I’m not telling you what the right answer is.
I’ve got plenty of biggish tests that just make sense the way they are.

Just make sure it’s clear what’s going on.

Focus your thinking while writing the test

Only thinking about one section at a time really helps to clarify thinking and coding. Kinda hard to put in words. But it really does help make it easier to know what to write.

Make test writing faster

Working within the constraints of given-when-then and the focus you gain really does make it faster.
Also, you can look at the set of tests with the same GIVEN section, or using the same setup, and decide if you’ve tested all the actions available to the user with that GIVEN state.
If not, write more tests with the same GIVEN, but with different actions.
And of course, the THEN postconditions will need to be re-examined.

This is also related to the next benefit.

Make it easier to re-use parts of your test

You can look at your tests now and think about if the GIVEN really represents the only states in your system where the WHEN action can occur.
If not, then you can add more tests with the same action, but with different GIVEN states.
And of course, the THEN postconditions will probably need to be altered as well.
If not, you need to examine if the two tests really represent two tests, or if you can simplify the GIVEN section.

Behavior coverage

These two kinds of re-use to create new tests are part of what’s called behavior coverage. Specifically, I’m talking about state coverage and transition coverage.
I’m going to talk about behavior coverage, state coverage, and transition coverage in future posts.

For now, just realize that separating the given and when sections help highlight the states being tested (the GIVEN), and the changes to the state (the WHEN).
And this separation allows you to review your tests and see if you’ve missed some obvious actions or starting states.

Highlight the assumptions you are making about the test preconditions

I think I’ve covered this already.
Do you have all reasonable GIVENs for the functionality you are testing in the WHEN section?

Highlight what outcomes you are expecting and testing against.

Highlight missing tests.

Highlight missing functionality.

This is a smidge harder to get your head around.
Look at the THEN sections associated with related tests, ones with either common GIVEN or common WHEN sections. Or the starting states and the transitions out of those states.

The THEN sections, the checks for postcondition state, represent the final states of the system after the actions.

If you’re pretty darn sure there is a final state that isn’t represented in the tests:

  • you might have a missing test
  • you might have missing functionality in the system

Let me give you a simple example:

def test_subscribe_when_not_registered():
    # GIVEN a user is not subscribed to a newsletter
    # WHEN a user subscribes to the newsletter
    # THEN the users email is now part of the newsletter email list
    # and the user is told they are now subscribed

def test_subscribe_when_already_registered():
    # GIVEN a user is already subscribed to a newsletter
    # WHEN a user subscribes to the newsletter
    # THEN no changes are made to the email list
    # and the user is told they already are subscribed

Looking at this set of tests:

  • I have two tests with the same action (subscribing)
  • Coming from two different start states (subscribed or not subscribed).
  • But I don’t have any postcondition where the user is not subscribed.

If there no unsubscribe functionality, then I’ve just noticed a missing functionality of the system.
If there already exists the unsubscribe action, then I’ve just forgotten to write tests for it.

  • Reuse can be aided by fixture parameterization or test parameterization.

It’s such a common case that you have lots and lots of tests with shared parts.
Both of these cases can be handled by pytest so that you can write fewer tests, but still cover all of the starting states and functionality you need to.
I’ll talk about parameterization in a future post as well.

I do touch on it in the Params section of a post called pytest fixtures nuts and bolts.


I think that’s enough from the benefits list.

Structuring your test not only makes it easier to read, it makes it easier to write and reuse.

But wow, I’ve highlighted a lot of areas I need to cover in more detail in future posts.