A recent Twitter thread by Simon Willison reminded me that I’ve been meaning to do an episode on the testing trophy. This discussion is about the distinction between unit and integration tests, what those terms mean, and where we should spend our testing time.


Transcript for episode 177 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


00:00:00 A recent Twitter thread by Simon Willison reminded me that I’ve been meaning to do an episode on the Testing Trophy. This discussion is about the distinction between unit and integration tests, what those terms mean, and where we spend our time testing.

00:00:15 This episode is sponsored by Rollbar. Rollbar is a leading platform that enables developers to proactively, discover and resolve issues in their code, allowing them to work on continuous code improvement throughout the software development lifecycle. Rollbar has plans for all situations, from free to large enterprise with robust developers deploy better software faster and can quickly recover from critical errors as they happen. Learn more at rollbar.com and if you use the link in the show notes, there’s a raffle you can enter that’s.

00:00:44 Cool.

00:00:45 Thanks Rollbar thank you Pie Charm for sponsoring this episode.

00:00:50 Pycharm saves me time. Error highlighting lets me know as soon as I type a mistake. Code completion and parameter hint pop ups save me from having to look stuff up the get. Integration makes all of my git workflows super easy, and then you get a whole bunch of bonus features like the database front end that are super cool in a super fast debugger. I wonder how PyCharm will save you time. Find out yourself at testandcode.com. Pycharm welcome to Test and Code.

00:01:38 Let’s run through Simon’s thread. There’s a lot of great information in there and I like the topics that he brings up. And then we’ll look at some of the articles that he references in the thread. Simon’s first tweet is not sure if this is a controversial opinion or not. Unit tests should make up a minority segment in your overall testing suite, your automated testing suite. I’d absolutely take a project with integration and no unit tests over one with unit tests, but no integration tests.

00:02:06 He goes on to say the usual challenge with testing language applies here. I’m confident different people have different mental models of what unit and integration tests actually mean in this case. I mean unit tests as in tests that attempt to test a single function or class in isolation, mocking or stubbing the things that I might talk to in integration tests as tests that don’t do that, isolation that exercise multiple components integrated together. And he goes on, this is a pretty great distillation of a testing philosophy I’m trying to get out here. Any references Kent? See Dodge’s article write Test, not too many, mostly integration. We’ll take a look at that later in the episode. And he goes on, Evidently the JavaScript world has been discussing this stuff in a lot of depth over the past few years.

00:02:58 And he references somebody named SWIX on Twitter that says in front end we now have the Testing Trophy. And in the back end, of course, the testing honeycomb from Spotify. So he references Kent Dodds. Apparently this is Kent and Roach G something they’re Testing Trophy, which we’ll talk about later.

00:03:27 Well, go ahead and describe it here. And the testing honeycomb. So the trophy is an image that has a base of static tests, and then on top of that is a little tiny bit of unit tests, and then a larger integration section. And then the top is a small end to end, and if you draw it in a little diagram like Kent does, it looks kind of like a trophy. Maybe the Micro Services testing honeycomb from Spotify is like a little triangle at the bottom with implementation detail tests, and then a larger rectangle of integration tests and a small integrated at the top integrated tests.

00:04:11 Simon goes on to reference an article by Sarah May says this piece by Sarah is fantastic. It identifies it defines a framework for discussing test strategy based on the following goals. One, verify the code is working correctly to prevent future regressions, three document the code’s behavior, four provide design guidance, and five support refactoring. I love these things.

00:04:41 Sarah calls this five factor testing, and I actually like to focus on that in a completely different episode. So we’ll hold off for the moment. And then lastly, Simon says, I hadn’t seen this piece by Martin Fowler before, which matches my own concerns about how hard this conversation is to have thanks to the vagueness of the terms unit and integration with regards to testing. And I definitely agree.

00:05:08 So the article by Martin is called on the Diverse and Fast Testicle Shapes of Testing. So first of all, I really like this overview of Simon. He links together.

00:05:21 I’m including this, I’m going to include the thread in the show notes, and I hope he keeps it up because I think running through his ideas around this and referring to other stuff, we get in a lot of stuff. So the difference between unit test and integration test and really what those meanings have, and then we have to define what unit integration means.

00:05:41 To have this discussion, we talk about testing shapes and then also referencing things like paying attention to other programming worlds, like mostly I sit in either the C world or the Python world, mostly Python on this podcast, taking a look at what other the best practices are in the JavaScript world and other communities I think is important. So can’t see Dodge is usually in the JavaScript world. I think that’s what he writes about.

00:06:15 So we’ll cover the right test, not too many, mostly integration. And then Martin Fowlers. Kensey Dodd has several articles on this topic.

00:06:25 I’m going to first talk about a later one, this one’s from June of 2021, The Testing Trophy and Testing Classifications, where he talks about a lot of this stuff around this, but in particular talking about his definitions for unit and static. And that’s really what I’m going to talk about here. The definitions listed in the article are unit, test and code, which either have no dependencies collaborators or which have their dependencies and collaborators mocked for the test and the integrated test and code which test multiple units integrated with one another. So Simon says he’s using that definition too, when he says we should do more integration tests and less unit tests. Now let’s jump into one of the articles referenced by Simon, and it’s an article called Rate Test, not too many, mostly integration.

00:07:20 This one is also from Kent Dodds.

00:07:23 This from July 2019.

00:07:28 I’m going to link this in the show notes. Of course, he links to has a link in here for a 2018 talk that he gave at, I think at a certain JS conference, but it talks about the entire discussion. It’s only 17 minutes. I encourage you to listen to it. It’s a good talk, but I like this notion of right tests, not too many, mostly integration. And the talk is titled that too, and he kind of goes through that.

00:07:59 He credits Guerremo Route for coming up with that. He took that from a tweet from Guermo from 2016. Right test, not too many, mostly integration.

00:08:11 I love that, too. It’s nice, short, simple, and it might be surprising to me that somebody listening that I actually would be agreeing with the not too many part, like not too many tests. And you shouldn’t be surprised because I’ve mentioned this before.

00:08:28 I don’t brag about how many tests I’ve written or are on a code base, because I wouldn’t brag about how many lines of code I have on a project either. I mean, wow, I wrote Hello World with 15 lines of code. That would be terrible, just as it would be to say I’ve got 1000 tests for my Hello World program. That seems ridiculous, but why do we think it’s not ridiculous for other things? Tesco has to be maintained, too. So I’m going to quickly run through this because it’s 17 minutes video. It’s a short article, too, but I’m going to jump around. We need automated tests to help us develop.

00:09:05 That shouldn’t be a tough one. The not too many is interesting. Can’t actually talks about whether or not you should go for 100% code coverage. It’s actually not as clear in the documents as in his talk. 100% coverage totally makes sense for libraries. Of course you’d want 100% coverage for libraries, and it’s really not that hard to do even coverage. Code coverage doesn’t tell you if it’s good testing, it just tells you that you’ve covered it. The idea around the not too many is you have diminishing returns, especially on applications if you try to hit 100%. And so there isn’t a number you should hit, but you need to look at it differently. We have some cool tools in Python where we can like with coverage Pi where you can configure it so that it ignores the parts you don’t care about.

00:09:57 If it’s not difficult to do that, I think that’s a good idea for Python projects. Code coverage is a good idea for telling you what parts of your code are not getting tested, and you can see maybe they should be tested. But if there’s little corner cases that really are never going to get hit, trying to hit those just because with unit tests or whatever is silly. I use code coverage to tell me if there are parts of the code that I just forgot to test, and that’s a great use for it. But also open source projects are different, I think is a good thing to come to mind. And open source projects are often not applications, but libraries, and that totally makes sense to hit up.

00:10:38 It’s frustrating to work on an open source project that doesn’t have a coverage because it’s hard to tell whether or not you covered your own code that you contributed. That’s why the mostly integration is the meat of what he’s talking about. And the idea is that unit tests are cheap to write and they’re fast to run into. Ed tests are expensive to write, and they take a long time to run, so the integration is between that. So the integration level is more costly than a simple unit test? Maybe, but I don’t think so. But less expensive than the idea around that is not just the cost and speed, but value. If I test a little tiny function that I know works, it doesn’t really help me to know that I’ve got a working application.

00:11:26 But if I test through the whole application that a workflow works, I have very good confidence. And the idea around can’t is the integration is a nice sweet spot of it gives you really good confidence in your system without being too expensive, and I kind of like that anyway. But this is Kent’s ideas. I do want to also talk about the Martin Fowler article that I think is really interesting.

00:11:52 Martin Fowler’s article on the diverse and fantastical shapes of testing.

00:11:57 This was written in June of 2021. He said pyramids, honeycombs, trophies, and the meaning of unit testing is the subtitle.

00:12:05 It’s a great article.

00:12:07 It talks about.

00:12:09 Basically, this discussion about the testing pyramid versus the honeycomb versus the trophy shape is kind of falls apart if you examine the terms unit test integration a little bit better. I’m going to pull out a couple of things out of this article. The point of these images is to indicate the amount of effort we should expand on various types of tests, in particular the balance between unit and broader tests. The pyramid argues that you should have most of your testing done in unit tests. The honeycomb and trophy instead say you should have a relatively small amount of unit test and code mostly on integration tests. Martin says the biggest issue he has with this discussion is that it’s rendered opaque by the fact that it’s not clear what people see as the difference between unit and integration tests. The term unit and integration have been rather murky, even by the slippery standards of most software terminology as he originally understood it. They were primarily an organizational issue. And they go back to the days of large waterfall projects. And he is describing a project where these really big projects where they thought of a unit test as there’s different units of code like you have different library like DLLs or something like that. And you got a team working on one subsystem and that subsystem was the unit and the larger system was the system. So a unit test was testing this subsystem, not testing a function, but a subsystem or something. And that’s an interesting concept.

00:13:52 Many people today ran into Unit Tests. I’m still kind of quoting from this article. Many people today ran into Unit Tests as part of the Exunit family of testing tools pioneered by KentBack and Extreme Programming and then on with Test Driven Development.

00:14:07 Kent Beck uses Unit Tests to indicate tests written by developers as part of their day to day work. That’s how I learned about the term Unit Test. I was using unit Tests for a long time, but with this terminology because that’s where I learned it from.

00:14:24 And then I was confused about all this mocking stuff. I’m like what’s going on there, Kent’s. Kent backs definition. Programmers write unit tests so that their confidence in the operation of the program can be part of the program itself. Customers write functional tests so that their confidence in the operation program can be part of the program, too.

00:14:45 This is back in the first edition of Extreme Programming explained, I don’t have customers that write functional tests. I write the functional tests for my software. So, yeah, we have to write both functional tests and unit tests. Notice that.

00:14:59 Back to Martin’s article, notice that Kent’s original formulation Unit Test, means anything written by the programmer as opposed to a separate testing team. This is a good distinction. I remember considerable discussion over this issue of Unit Test.

00:15:15 One test expert vigorously lambasted Kent for this usage. And so Kent and Martin and others asked this person to define Unit Test. And he replied with, in the morning of my training course, I cover 24 different definitions of unit tests. Okay?

00:15:33 And then Martin Fowler goes on to say, maybe we shouldn’t talk about unit tests and integration tests. Instead, let’s call them Sociable tests and solitary tests. So a Sociable test is a test of a thing like a class or an API or a library or something, and all of its dependencies are attached to it. And then a solitary test is the isolated part thing. And that distinction might be more interesting. And he also talks about basically the original Testing Pyramid. In the Testing Pyramid, Unit Test was all developer tests. So that makes more sense. I totally buy the Testing Pyramid if it’s defined as the bottom is developer level tests above. That is like integrated system tests that are difficult for developers to run and then above. That is like in the end user GUI tests.

00:16:31 I buy the testing pyramid if we define unit test is like everything that developers write. I like that, actually. Martin Beck even refers to Justin Souls. Tweet says people love debating what percentage of which type of tests to write, but it’s a distraction. Nearly zero teams write expressive tests that establish clear boundaries, run quickly and reliably, and only fail for useful reasons. Focus on that instead.

00:17:01 Actually a decent place to leave this, but I’m not going to.

00:17:06 I love Justin sentiment.

00:17:08 It’s silly to discuss these shapes.

00:17:13 It’s an interesting discussion, but really what we want is expressive tests.

00:17:19 One of my big things around testing is I think we should test behavior. We should test behavior, not implementation. And the behavior could be a programmer defined behavior. But the really important stuff is the behaviors that we exposed to the user. So the functionality of our systems that we promised to the users, those are the things that we should be testing. Of course, now there’s other stuff that we should contracts between us and different teammates, contracts between us and different sub teams. Those are important too, and those are good to test, because if you have another team that you’re depending on for some library, you can just kind of assume their code works. So we assume that they test it. They should test it according to what our expectations are. Sure. And the same vice versa.

00:18:08 Anyway, I like behavior over implementation, and that kind of falls into what integration tests. So how do I define unit tests versus integration tests? I tried to do like Martin and define it as for a while I used to define a unit test is anything a developer would write, which really is all the tests, but really as a functionality, a unit of functionality.

00:18:38 This actually comes up in Martin’s article a little bit is you can write an integrated test, a test for something, but it’s really focusing on your class, even if or your function or whatever. The thing you’re looking at, all of its dependencies. You just assume those work and focus on the functionality through your class. You don’t stubborn or mock it, but you just assume everything else is correct. This is a decent way to test, and there’s a lot of times where that works just fine.

00:19:10 So I used to think of a unit as a unit of functionality, but I’m a small fish in the testing community as a whole. So I decided that that was just confusing because almost everywhere where somebody looks up, there are so many consultants and people writing articles and stuff that refer to a unit as a unit test as a little tiny test around one class or one class or function, and anything larger as an integration test. I don’t like that definition, but I’ve stopped fighting it. So I just generally don’t use the terms integration tests and unit tests anymore. That’s why I prefer terms like functional test or feature test that people don’t even know what that means, which is good.

00:20:05 It’s actually better than them thinking they know what I mean and being wrong. So it just opens the discussion.

00:20:13 Anyway, I’m going to attach a link to the Twitter thread and attach a link to these two articles that I covered and they’re really quick reads. This will probably take you less time than listening to this episode to go read through it but it’s a really kind of an interesting quick start on understanding what some of the different types of testing terminals you are out there. And anyway, get back to me if you have any more questions and if you’d like me to cover other general categories like categories like this. Thanks.

00:20:55 Thank you Rollbar for sponsoring Rollbar enables developers to proactively, discover and resolve issues in their code so they can work on continuous code improvement throughout the software lifecycle. Learn more at Rollbar.com thank you PyCharm for sponsoring visit testandcode.com PyCharm for a four month free trial of PyCharm save time, use PyCharm thank you Patreon supporters join them at testandcode.com Support all of those links as well as links to the things I’ve discussed in the show are in the show notes@Test And Code.com that’s all for now. Now go out and test something.