Complete coverage testing from the bottom of the pyramid up is a bad idea

or Complete coverage testing
or More is Better testing
or How not to test, part 1

The setup

For the sake of this post, let’s say I’ve got a Python package that needs testing.

It’s written completely in Python
It has a specification fully describing the API
The specification is so complete that it also covers behaviors and side effects
The API is the only interface this program exposes
It was written by us
It was written recently
It only uses base Python

Therefore:

no GUI
no other API other than the well-defined and specified one
we have full access to the source
we remember all of the decisions for all of the code
no third-party modules that might be flaky

This is the ideal program to both write and test.
You can tell this is made up, because I have never had all of the above be true.
In reality, so many of those items will not be present. But let’s go with it for now and see where it takes us.

Test Strategy: Test everything

I want this package to be rock solid. I don’t want any defect reports coming back from customers.
So why not just go ahead and plan on testing everything.

Functionality:

Every function or method in the API should have at least one test written to verify its correct behavior, and at least one testing each of the possible error conditions if we pass in faulty input to the function or method.
Every feature or behavior in the specification should have at least one test that verifies the package meets that feature or behavior.

Independence of components:

Every package/function/class/method/module should be tested in isolation with dependencies mocked.

Integration:

Every package/function/class/method/module should be tested also fully with its dependencies NOT mocked.

The plan is to just test the begeezers out of the package.
Some testing is good.
Full and complete testing must be better. Right? Wrong.

Why this is bad

In my mind there are quite a few problems with this kind of testing.

It takes too long

Surely my time is better spent developing the next super cool thing.

It’s overkill

I’d argue that it’s ok if some of the mid to low level functions don’t really satisfy their full internal interface.
As long as these deficiencies don’t result in errors at the API level, the users will never get bit by little bugs in helper functions.

For every internal function in your software, the set of possible valid input to the function is larger than the set your software is actually going to pass to the function.
This is obvious but important.

Let’s say your software sometimes uses the ‘pow()’ function to compute cubic volume.
The interface of ‘pow’ is ‘pow(x,y[,z])’.
Your software only every set’s y to 3, and never fills in z.
If you feel you need to test ‘pow()’ you don’t need to bother with testing any z input, and only 3 as y is necessary.

Let’s look at another example.
Let’s say I’ve got an internal function that takes a string and removes html tags from it.
Now, to fully test it in isolation I should probably at least do the following:

Test it with passing in ‘None’
Test an empty string
Test a one character string
Test something big, like a 2 MByte string.
Test it with unicode strings
Test xml tags or other non-html tags
Test multiple levels of tags
Test non-matching tags. <h1> with no </h1>, etc
Test strings with random newlines inserted
..

I can see right now that I’ll probably spend more time writing tests for ’tag_strip()’ than the time it will take to actually write the function. However, if ’tag_strip()’ is part of my API, it’s time well spent.
But ’tag_strip()’ isn’t part of the API. It’s an internal function.
And upon inspection of the final software, I might notice that my software only ever strips html tags from strings it generates.
And I only ever use it for titles.

Real interface to tag_strip():

tag_strip('<h1>Hi there</h1>') -> ‘Hi there’
tag_strip('<h2>Foo</h2>') -> ‘Foo’
tag_strip('<h3></h3>') -> "

So:

I never pass in None
I never pass the empty string
I never pass in one character strings
I never pass in large strings
I never pass in unicode
I never pass in xml tags or other non-html tags
I never pass in multi-line strings

In the end, I may have a very robust ’tag_strip()’ function.
But it’s way over designed.

It’s inflexible

If absolutely every component of your software has a test harness around it, then any redesign of the code, any change of the code at all, will probably make many of your tests fail. You will have to go back and examine whether the test code is wrong, or your new code is wrong. And you’ll need to write new tests to accompany the new code.

Try it sometime. This is a crippling way to write software. And it’s not fun.

Many important design and implementation changes will be skipped due to this inflexibility, and the overhead attached to any and all changes.

It’s an illusion of complete testing

There are so many tests in the system, that many people will think that the test coverage is complete.
However, complete coverage is just not possible.

Complete coverage of every possible input AND compinations of input AND combinations of the order of method calls is JUST NOT POSSIBLE.
Unless you are providing an API with one function that takes no input.
Every function you add and every input parameter exponentially expands the combinations of input possible to your system.

You can’t have complete exhaustive tests of your system.
If you thought you could, get over it.
It’s not possible.

What should we do?

All hope is not lost. And testing is a wonderful thing.
Don’t throw in the towel yet.

Before I give my opinion on how testing should be done, I’m going to cover at least a couple more approaches that I feel sound reasonable at the start, but have serious problems.
And yes, I can only present my opinion.
I don’t think there is one right way to test.

However, I think it is useful to look at some of the ways that are seriously wrong.

As of July 2025, I still don’t think I’ve fully explained the approach to testing I recommend. However, that’s kinda part of why I’m reviewing old posts now. I’m trying to clean things up in order to get ready to write more, in part, about TDD and a better approach to testing.

The setup

Test Strategy: Test everything

Why this is bad

It takes too long

It’s overkill

It’s inflexible

It’s an illusion of complete testing

What should we do?

pytest debug print logging in real time

unittest fixture syntax and flow reference

Complete coverage testing from the bottom of the pyramid up is a bad idea

The setup

Test Strategy: Test everything

Why this is bad

It takes too long

It’s overkill

It’s inflexible

It’s an illusion of complete testing

What should we do?

pytest debug print logging in real time

unittest fixture syntax and flow reference

Related

Timeline of Selected Software Events

Given-When-Then

Test First Programming / Test First Development