65 - one assert per test

Is it ok to have more than one assert statement in a test? I’ve seen articles that say no, you should never have more than one assert. I’ve also seen some test code made almost unreadable due to trying to avoid more than one assert per test.

Where did this recommendation even come from? What are the reasons? What are the downsides to both perspectives?

That’s what we’re going to talk about today.

Transcript for episode 65 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

Welcome to Test and Code podcast about software development and software testing.

Tell me what you think. Is it okay to have more than one search statement in a test? I’ve seen articles that say no, you should never have more than one assert. I’ve also seen some test code made unreadable due to people trying to avoid having more than one assert. Where did this recommendation come from? What are the reasons? What are the downsides to both perspectives? That’s what we’re going to talk about today. Thank you to Pie Charm for sponsoring this episode.

This episode is sponsored by Pie Charm. Pie Charm saves me time. It’s also fun, but the time saving and it fitting naturally into my workflow is why I really love it. In previous episodes, I’ve talked about the awesome Python support for pytest. Get virtual environments, handling multiple file types, and markdown and restructured text preview. Today let’s talk about coverage. I lean on coverage analysis when adding, removing, and refactoring both code and tests to make sure I don’t reduce my coverage. For me, I like using coverage analysis. It helps me.

I’m already used to running my test code from PyCharm. One option PyCharm gives you is to run the same tests with coverage, and when it’s done, PyCharm has added percentages of coverage next to all the file names. And in the editor window, there are color coded lines in the gutter that show you what’s covered in green and what’s not in red. So cool, so easy. No tool switching. Thanks, PyCharm. Try this out yourself by going to Testingcode.com PyCharm. That link will give you four months to try out PyCharm Pro and see if it’s right for you. Don’t wait. Save time. Now with PyCharm.

Now back to tests. One assert per test. I’ve seen that advice in many automated test tutorials.

It seemed to creep up more and more with the growth of test driven development and the proliferation of tutorials on how to write good unit tests. This kind of coincides with an unfortunate bit of history where the extreme programming and test driven development. People were talking about unit tests when they meant all automated tests because they were using frameworks with unit in the name of J, unit, et cetera. Whether or not the advice holds for unit test at all, or even what a unit is, is not the topic of this conversation. I’m talking about automated tests purposely not stating if they are unit oriented or whatever. Regardless, the blanket advice of want to search for test does start with good intentions, but it is bad advice, I think mostly because it’s so easy to muck things up when you try to stick to it. That’s why I don’t think it’s great. But let’s dig a little deeper and cover a few things that might explain why I both agree and disagree with this advice. By the way, I’m not the only one who disagrees. I put out a very unscientific survey on Twitter last week. I will link to it in the show. Notes I asked, Are Multiple Asserts checks okay? In an automated test, the responses 73% said yes sure, 15% said it was okay for higher level tests only, 10% said no never, and 2% said only in BDD. So Behavior Driven Development I put that in there because Behavior Driven Development has regularly puts in the structure of Given when, then and there’s like and statements so you can test multiple things. Also, I’m going to link to a 2015 article from Bill Wake entitled Multiple Asserts Are okay. It’s a good article that covers many of the reasons for multiple assertions and uses the phrase probe with multiple assertions for a set of assertions about aspects of an outcome, which I like. Let’s talk about what we like to have as a good structure for a test. The two most common structures promoted for test function structure are a Range, Act, Assert, and Given when then. I teach people to put these words as comments in their test code to remind maintainers to keep the structure arrange act, assert. The Arrange part is where you set up or retrieve data and put the system in a particular state or whatever. It’s the part of the test that isn’t what you’re testing, but the getting ready to do something that you want to test part. The act is the action you’re going to test. It might include capturing the output or return value or something. The Assert part is where you check to see if the thing you did in the act part actually worked. This can be checking a return value, or checking on the standard out or standard error, or checking side effects, et cetera. Given when then it’s really the same thing given some state or data or whatever. When some action occurs, then some outcome, return value, or side effect occurs. The Assert calls go in the then part clean up. Oddly enough, even though the Arrange and Given sections are really set up, the tear down or clean up after part is just kind of implied in these models. It happens after the Assert or the Then stage. I usually prefer to think about Given when then because it fits my thinking better. But like I said, they are really the same thing and Given when then does not require you to use Cucumber or BDD. It’s just a way to arrange test functions. This structure is one way to keep our tests simple, expressive, easy to read, and easy to understand what we’re testing when we keep to one of these structures. You also want to keep the focus of the test to really testing one thing. The act or when stage is usually just one function call. The name of the test should reflect this focus, even though if the name is excessively long. The hope is that just seeing the name in the failing test list will tell you what’s wrong with the code under test. I think this is really where the advice for one assert per test comes from. Let’s take a couple of example tests.

Let’s say we’ve got trying to test a set and we’ve got a test that’s test add element to set.

That’s what the test name is.

Test add element to set. What if it fails? If it fails, what went wrong? I can’t really tell from the test. Also, what was the state of the set beforehand. So let’s be more specific. Let’s break that test into two. First, one is test add new element to set, and the second is test add duplicate element to set. Okay, that covers a couple of possible States of the set before the ad happens. What would you assert in those tests? Probably both the resulting size of the set and that it contains a new element.

That’s two search statements. If we want to stick with one per test, we could further split up the test into maybe test add new element to set, results in set size, increase. That’s all a name. Here’s another one. Test add new element to set, results in element being inset, and test add duplicate element to set results, and no change to set. So if we include the expected state that we’re checking for in the name of the test, it kind of gets really long and, well, it’s possibly just a matter of preference, but I think that’s going a bit too far. This brings up the notion of a test suite being such that a failure helps you diagnose what’s wrong. I’m totally on board with that goal, but I think you can achieve that much with adding comments to a search statement to add specificity to the failure output. Test functions, modules, and suites are read a lot, and not just by the person who wrote the test. A test function that has one system state, one action, and one failure reason is simple and elegant, and that’s nice, and that’s one of the reasons why people advocate for a single assert. However, it’s misleading if you really need to test for lots of aspects of an action, and those are spread across many test functions, that actually makes your test suite harder to read. This also increases the code size and code maintenance burden. On the other hand, asserts stop the test, therefore hiding possible later failures. If you test for a set containing the element and the increase or none increase the set size and the first assert fails, then you don’t know what happens in the second assert. For test suites, where failures are very rare and the tests are quick to run, this may not be a big deal. Fix the first assert, rerun the test, and keep going until all the asserts are passed. If you’d like to know the full picture of all aspects being tested, you have a few options, which we’ll cover a bit later. But this quality of assert function. Stopping execution of tests leads to the problem with run on tests. Another place where we see multiple asserts is with run on tests or some tests read like a procedural manual steps. For instance, start with an empty set, add some stuff to it, making sure that each addition results in a set increasing in size and the items actually getting into the set correctly. Try to add some duplicate items to make sure that it doesn’t do anything to the set. Take the items out of the set one at a time, making sure the set decreases in size, et cetera. Basically, it’s a test with a structure of a range. Then act, then assert, then act, then assert, then act, then assert. Maybe some more range, maybe some more act, maybe some more assert. This totally makes sense actually for manual tests because you want to test as much as you can with as little time commitment to the tester as possible. But for computers, it’s short sighted and will drive you crazy in no time. If a test fails, why is it failing? You have to look at the specific assert and line number more closely to figure out what went wrong, and also mentally calculate what the system state was at the time of the failure and none of the checks after the failure were tested. Tests like this can happen with a hasty conversion of manual tests to automated tests, or if people new to automated testing, but these kind of tests should be broken up as fast as possible. They are a liability. I do have one exception to this. I personally like to have very few act like a customer tests that cover a ton of the system, possibly all of the system in one or two tests. I then implement that test or one or two tests at multiple levels for full web interaction with Selenium or GUI automation if that’s feasible, and then through API or subcutaneous testing, but only one or two of these. They are horrible to maintain, but actually make awesome smoke tests because they test a big chunk of the system. They tell you a lot when they pass, but unfortunately they are not very helpful when they fail. If one fails and no other test fail, however, that tells you that you’re missing some more specific focus testing anyway. You can also just go with the idea of never running those types of tests. You’ll be fine with that too.

Run on tests will have asserts all over the place and our bad smell and should be avoided unless you are intentionally building a smoke test. The given or arranged portion sets the system into a known state. That’s another place where asserts will come in and do a test. As I offer this set up, let’s take our set example. Given a set already contains A, B, and C, should I assert that A, B, and C are there and that the size of three. This is where fixtures are awesome. I recommend pushing the given or range portion into fixtures and surging about the state there. In pytest, an assert failure in a setup fixture will result in the test ending in a state of error instead of failure, helping you to pinpoint what’s wrong. If you’re using unit test, that doesn’t help. So in that case, I recommend you raise an exception without using a Cert for a setup failure. Also, if you’re using unit test, you should really consider switching to Pi test. They’re saying it’s more fun. Let’s talk about remedies. Let’s say you have named all your tests well.

You either go with a given one or range active search structure and you’re focusing each test on testing one specific bit of functionality and you still have more than one assert per test. This is probably due to testing multiple aspects of an action outcome. The change in state the return value of the stream output may be multiple state changes. There are multiple aspects of the outcome that need to be tested. If one fails, you won’t see the others and you want to. So if you’re trying to get rid of multiple search, what do you do? There’s a few different ways. One way is to use object equality instead of aspect equality. This is neat.

Another one is to collect aspects into a structure and compare that with a precanned expected structure. This is okay too. One thing you can do is to push the action into a fixture or class set up and have multiple tests for it so the test doesn’t have the action anymore, it just has the assert part. This is dangerous, but not horrible. Another thing you can do is to have multiple tests that duplicate everything except for the assert. And I’ve actually seen this and I think it’s yucky and a maintenance nightmare. So don’t do that. The last thing is to use nonblocking checks. pytest check is a plug in for pytest that I wrote just for this reason. I personally like object equality and non blocking checks the most object equality. Let’s talk about that a little bit. Let’s take our set example. I’m adding I’ve got A, B, and C in the set and I add D to the set. I can also construct without using add. Just construct a predefined set with A, B and C and D in it and assert that the my expected set is equal to my built up set. That will be one assertion.

You can do this with non object aspects as well by containing them all into a list and then creating an expected outcome list and compare against that. I really should write this up in a blog post too, because it’d be easier to look, but hopefully you can visualize it. If I take a list and instead of asserting every time I get an output or something, I just collect it all into a list, and then I have a list of expected values and doing asserting it’s the two. I don’t really like this a lot because it’s not terrible, but it does complicate the test code a little bit. And again, we’re trying to make the test code readable. So if I can compare native objects like do a pre canned set and then compare against the built up set, I’ll do that. I don’t really like creating the result list.

That’s just a personal preference though, and I think there’s a lot of people that do like building up an assert outcome list. I’ve also used the method of pushing the action into the fixture, but the danger there is that if the action raises an exception, it will be listed as an error instead of a failure. If you can live with that, go for it.

It’s also a great use of classes. In pytest, I think you can define a fixture, you can define a test class, put the fixture right in the class, make it class scope, and then use a bunch of tests around it that use that fixture. The downside of it is that it does involve more code and make it a bit more complex to understand then the last thing I like to use a lot is nonblocking check. A nonblocking check is great if you can use something like pipest check or something else that does all the bookkeeping for you and it doesn’t clutter up your test code. The downside is that you have an extra dependency and you also give up on some of the very nice syntax that pytest allows you to use.

For instance, the pytest check plugin includes a bunch of functions like equal, not equal. It is in and stuff like that. It kind of ends up looking a little like unit test code, but I can live with that because the upside is that I can see all of the things I’m checking will get checked. The real thing I want to stress here is that one of my goals really high up on the list is that keeping the tests and the suite readable is really high up on my list. I don’t like splitting up tests unnecessarily or creating extra structures that aren’t obvious why they’re there. For instance, let’s say I’ve taken I’ve gotten a test with a bunch of asserts and I split it up into a bunch of different tests. Well, each one of those tests might be very small and readable, but it doesn’t allude to all the other things that are getting tested. I think that actually makes it less readable. So in conclusion, the advice to test one thing per test to keep your test focus. That’s great. That’s great advice. If you have multiple search due to run on tests, or because you’re really testing lots of things that’s bad. If you have multiple search due to checking multiple aspects of the outcome, then it’s really up to you if it’s something you need to change or not. But don’t chop up your tests so that they’re unreadable and maintainable. For me personally, in a lot of most of my open source and personal projects, I just live with multiple search per test.

When I’m checking multiple aspects I think it’s fine. But for a lot of things that I’m working with that are slower tests or grabbing lots of data or testing lots of aspects and it’s really hard to understand what’s wrong with just one of those failures. It would be easier to understand it with multiple failures. Then I use pytestcheck thanks again to Pie Charm for sponsoring the show get the extended four months to play with Pycharm at testandcode.com PyCharm. That link is also in the show notes at testandcode.com 65 as well as I’ve got links to the article I referenced and also to the Pytech plugin. Thank you to Patreon supporters really cannot express how cool it feels to have direct support for people like you even if you don’t show support the show directly. Thank you for listening and for spreading the word about the show and for spreading the word about testing. We need to test more and do it well. Even if you don’t agree with me, if you listen to the show, then you care about building quality software and about the importance of what we do.

We have more in common than not. That’s all for now go out and test them.