or Complete coverage testing
or More is Better testing
The setup
For the sake of this post, let’s say I’ve got a Python package that needs testing.
- It’s written completely in Python
- It has a specification fully describing the API
- The specification is so complete that it also covers behaviors and side effects
- The API is the only interface this program exposes
- It was written by us
- It was written recently
- It only uses base Python
Therefore:
- no GUI
- no other API other than the well-defined and specified one
- we have full access to the source
- we remember all of the decisions for all of the code
- no third-party modules that might be flaky
This is the ideal program to both write and test.
You can tell this is made up, because I have never had all of the above be true.
In reality, so many of those items will not be present. But let’s go with it for now and see where it takes us.
Test Strategy: Test everything
I want this package to be rock solid. I don’t want any defect reports coming back from customers.
So why not just go ahead and plan on testing everything.
Functionality:
- Every function or method in the API should have at least one test written to verify its correct behavior, and at least one testing each of the possible error conditions if we pass in faulty input to the function or method.
- Every feature or behavior in the specification should have at least one test that verifies the package meets that feature or behavior.
Independence of components:
- Every package/function/class/method/module should be tested in isolation with dependencies mocked.
Integration:
- Every package/function/class/method/module should be tested also fully with its dependencies NOT mocked.
The plan is to just test the begeezers out of the package.
Some testing is good.
Full and complete testing must be better. Right? Wrong.
Why this is bad
In my mind there are quite a few problems with this kind of testing.
It takes too long
Surely my time is better spent developing the next super cool thing.
It’s overkill
I’d argue that it’s ok if some of the mid to low level functions don’t really satisfy their full internal interface.
As long as these deficiencies don’t result in errors at the API level, the users will never get bit by little bugs in helper functions.
For every internal function in your software, the set of possible valid input to the function is larger than the set your software is actually going to pass to the function.
This is obvious but important.
Let’s say your software sometimes uses the ‘pow()’ function to compute cubic volume.
The interface of ‘pow’ is ‘pow(x,y[,z])’.
Your software only every set’s y to 3, and never fills in z.
If you feel you need to test ‘pow()’ you don’t need to bother with testing any z input, and only 3 as y is necessary.
Let’s look at another example.
Let’s say I’ve got an internal function that takes a string and removes html tags from it.
Now, to fully test it in isolation I should probably at least do the following:
- Test it with passing in ‘None’
- Test an empty string
- Test a one character string
- Test something big, like a 2 MByte string.
- Test it with unicode strings
- Test xml tags or other non-html tags
- Test multiple levels of tags
- Test non-matching tags.
<h1>
with no</h1>
, etc - Test strings with random newlines inserted
- …
I can see right now that I’ll probably spend more time writing tests for ‘tag_strip()’ than the time it will take to actually write the function. However, if ‘tag_strip()’ is part of my API, it’s time well spent.
But ‘tag_strip()’ isn’t part of the API. It’s an internal function.
And upon inspection of the final software, I might notice that my software only ever strips html tags from strings it generates.
And I only ever use it for titles.
Real interface to tag_strip():
tag_strip('<h1>Hi there</h1>')
-> ‘Hi there’tag_strip('<h2>Foo</h2>')
-> ‘Foo’tag_strip('<h3></h3>')
-> ”
So:
- I never pass in None
- I never pass the empty string
- I never pass in one character strings
- I never pass in large strings
- I never pass in unicode
- I never pass in xml tags or other non-html tags
- I never pass in multi-line strings
In the end, I may have a very robust ‘tag_strip()’ function.
But it’s way over designed.
It’s inflexible
If absolutely every component of your software has a test harness around it, then any redesign of the code, any change of the code at all, will probably make many of your tests fail. You will have to go back and examine whether the test code is wrong, or your new code is wrong. And you’ll need to write new tests to accompany the new code.
Try it sometime. This is a crippling way to write software. And it’s not fun.
Many important design and implementation changes will be skipped due to this inflexibility, and the overhead attached to any and all changes.
It’s an illusion of complete testing
There are so many tests in the system, that many people will think that the test coverage is complete.
However, complete coverage is just not possible.
Complete coverage of every possible input AND compinations of input AND combinations of the order of method calls is JUST NOT POSSIBLE.
Unless you are providing an API with one function that takes no input.
Every function you add and every input parameter exponentially expands the combinations of input possible to your system.
You can’t have complete exhaustive tests of your system.
If you thought you could, get over it.
It’s not possible.
What should we do?
All hope is not lost. And testing is a wonderful thing.
Don’t throw in the towel yet.
Before I give my opinion on how testing should be done, I’m going to cover at least a couple more approaches that I feel sound reasonable at the start, but have serious problems.
And yes, I can only present my opinion.
I don’t think there is one right way to test.
However, I think it is useful to look at some of the ways that are seriously wrong.