Subtests are a way to continue a test function after an assert fails. Paul Ganssle and Brian Okken discuss what subtests are, why you might want them, and what to watch out for if you use them.


Transcript for episode 111 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


00:00:00 In both Unit Test and pytest. When the test function hits a failing assert, the test stops and is marked as a failed test. What if you want to keep going and check more things? There are a few ways. One of them is Subtests. Pythons Unit Test introduce Subtests and Pythons pytest introduced support for sub tests with changes in Pi Test 4.4 that allowed a plug in called Pi Test. Sub tests to work. Sub tests are still not really used that much, but really, what are they? When could you use them? And more importantly, what should you watch out for if you decide to use them? That’s what Paul Gansel and I will be talking about today. This episode of Testing Code is brought to you by Configuration and by Ruven Learners weekly Python exercise and by listeners like you that support the show through Patreon. Thank you.

00:01:00 Welcome to Test and Code Python Testing for Software Engineers.

00:01:07 Thanks for coming on the show and you even sent me a thing. Your name is not difficult. It’s just Paul Ganzel, right?

00:01:13 Yeah. I don’t know why people tend to think that le at the end is pronounced like Lee, which is why when that show Castle was on the air, I would always pronounce it Sassy.

00:01:24 Really?

00:01:24 Just as a joke because people will say Gansley. So if your name was Frank Castle and someone was like, Is this Frank Sastley?

00:01:35 That’s funny. You’re one of the core Python people.

00:01:38 Yeah. As of somewhat recently, just last year. Okay.

00:01:41 Core Python developer and you mostly are working within that capacity around.

00:01:49 I just had the date time stuff.

00:01:51 Yeah, I’ve been working on Daytimes. My latest big initiative has been Pep 615, which is adding the ina time zones to the standard library. So instead of you having to pull in dayto TV or I still don’t think you should do this even before I add Pep 615. But Pi TZ, you can just do like import zone info and then be able to access whatever time zones are available on your computer for the entire history of the Daytime module.

00:02:23 Actually, up until like Python 3.1 or so, there were just no concrete time zone implementation at all. Even if you wanted UTC, you have to write your own time zone. So once we add this in a time zones, I think we’re pretty covered for all the kinds of time zones everyone, like most people would want to see.

00:02:42 Okay, there’s a whole bunch of extra packages built on top of Python for dealing with time zones and date times and stuff like that. Are those all still necessary or less necessary now?

00:02:55 I think it depends on what you’re using them for. Like, for example, the utility is a library that I maintain and that has time zone implementations. I think most of that still has some niche use cases. Like if you have like I calendar type time zones, you can use that utility for that. I don’t really think anyone does use that Pi TZ. I think probably it almost doesn’t need to exist now because I have this whole blog post on it called Pi TZ, the fastest foot gun in the west about why it’s not super.

00:03:29 It’s somewhat dangerous to use Pi TZ because people often use it wrong. And that’s one of the reasons why I think even the maintainer of Pytz commented on Pep 615 that he’s going to be happy to be able to sort of retire Pytz and say, hey, you should really be using the standard library time zones, okay. For things like pendulum and arrow, those are really sort of more. They’re trying to improve things about the ergonomics of daytime. And so I wouldn’t see those as being supplanted by this. I don’t personally use them, but I also have spent an enormous amount of time understanding and working with Python daytime. So I understand that my perspective on what is easy to use may be somewhat skewed.

00:04:15 Yeah. But I actually wanted to talk to you about subtests and I’m smiling in the background because this is amazing. So the prep work that you’ve done to get ready for this podcast, I think it’s probably like the number one prep work. You get like gold star or something. You even written a blog post about it ahead of time so that we can link to it and refer to it and everything. And it’s actually like a really awesome discussion of subtests, so that’s cool.

00:04:49 Yeah.

00:04:50 How did this start? I think it was some comment on Twitter that you were like subtest or something, is that right?

00:04:57 Yeah. I really started using subtest when I was working on CPython because subtests are pretty much the only Avenue for test parameterization that’s available in Unit test and the standard library doesn’t use pytest normally I just use pytest. I use pytest Mark parameterize and things are great. I started using subtest because that wasn’t available. And then I realized that I had all these kinds of other uses and I started to miss it when I was using Pytest in my normal other open source work. I was very excited when I saw that the developers of Pytest had added a plugin called pytest. Subtest that add subtest functionality into Pytest. And I think you were on Twitter talking to the episode you did about Pytest plugins, and I sort of commented offhand like, oh hey, you didn’t talk about Pyte. Subtest, which is super great. It brings this awesome feature to unit test. Then of course you made me have to justify my opinion, which prior to that was just a vague aesthetic feeling like, oh, I should be using these subtests more.

00:06:06 Thank you, Config Cat for sponsoring this episode. Config Cat is a feature flag service. It has a central dashboard where you can toggle your feature flags visually. You can hide or expose features on your application without redeploying. You can set target rules to allow you to control who has access to new features easily. Use flags in your code with Config cat libraries for Python and nine other platforms. Get builds out faster, test in production, and do easy rollbacks release new features with less risk, and release more often. With Config cats, simple API and clear documentation, you’ll have your initial proof of concept up and running in minutes. Train new team members in minutes also, and you don’t have to pay extra for team size with the simple UI. Even product managers can use it effectively. Whether you are an individual or a team. You can try it out with their forever free plan or get 35% off any paid plan with special code test and code. All one word release features faster with less risk with ConfigCat. Check them out today at configat.com. A long, long time ago I did a lot of research trying to figure out to compare unit test and code test. It wasn’t good enough for me to just say hey, one school, let’s start using it, compared them a lot. And one of the things that I was excited about with the pytest support for unit test subtest is it was the one thing that unit test could do that Pipes didn’t support for a long time. So before the Pipest subtest plug in, you can run a unit test that has sub tests in it, but the behavior is wrong. So without the plug in, what happens is it stops. If there’s a failure, it just stops and it acts like a normal test failure and it doesn’t keep going. If everything passes, it’ll merrily go through and pass everything just fine. But the reason why I am interested in this plugin is that now with that Python fully supports all of unit test features, so it behaves a little different. But actually I don’t know if it behaves different. I haven’t checked recently, but if you write a unit test that uses subtest, you can run it with pipes. But then along with that, not only that, you don’t have to write unit tests. If you want to use sub tests with the plug in, you can do like a sub test dot test and it acts like a subtest within a pipe test.

00:08:39 But they’re weird and a lot of people don’t like them. So you said you got into them because unit test doesn’t have parameterization and you could use them that way. So they are a way to kind of loop or you have multiple tests that keep and keep going. So it’s this idea of one assert per test sort of thing, right? Do you try to adhere to the one assert per test rule then? Yeah.

00:09:05 I mean I’m not doctrinaire about it, but I think it is a good heuristic for how to design your tests. And I think I should clarify. I would consider multiple assert statements to be a single assert if they’re basically testing one property, right?

00:09:22 If you could write a function that says assert X and then X is some meaningful quality of the object that you’re testing, I think it still makes sense to be able to just sort of inline that. But given that caveat, I do think it makes sense to try and keep to one assert per test just because it’s nice to keep it logically separated in case you want to run small subsets of these tests, or if you want to just look at it also helps to identify the source of a bug by looking at the pattern of failures. That’s probably the more common use. I do think the subtest help a lot with that, and they actually help with it more than just in the sense of test parameterization, right? So with PYT you use this decorator where you will enumerate all your cases and you decorate the function, and then it passes each of the parameters to the function. It runs one test per element in the list of test cases with subtests, you have one test, and then you can just Mark a small section of it as just another test. And I think you’re right that it’s weird in the sense that it has a whole bunch of behaviors that you’re not really expecting when you’re thinking about writing a test, when you’re thinking about handling a test. So it gets a little fuzzy what a test is. So sometimes you could use it for doing something like I said, where you’re testing multiple properties and they’re essentially the same property. So I’m probably going to keep going back to the well of the time and time zone tests as my examples, because I’ve just been writing an enormous amount of time zone code recently trying to get this Pep 615 off the ground. So for example, if I’m testing the TZ name attribute or the TZ name function which tells you at a given day time, give me like est versus EDT. And I’m also testing the UTT offset function. So if I want to get est plus 500, I need to call two different functions. But really that’s sort of one function, because all you’re doing is you’re figuring out which set of rules applies, and then each of those functions under the hood just has to look it up in the table or something. So you could consider that to be one property of the test, does it select the right offsets? Or you could consider it multiple different properties of the object that need testing because you’re testing separate functions. So I sort of do a mix of putting those separate things under subtests or just having them be regular tests. I think I mostly do it based on how likely it is that they’re going to independently fail.

00:11:53 The normal way you fail a test is by asserting, and assert also has that other behavior that it doesn’t continue afterwards. So within a subtest block, you still only get one. The first assert will stop your execution. If you want to have multiple checks, those multiple checks need to be if you want them all run. They need to be in different sub test regions or subtest contexts. However, an interesting thing that you brought up within your article was sometimes you’re doing the same. Even with parameterization, you’re doing the same set up to get ready to check something and that might be time consuming and to be able to then go and run multiple sub things that you’re checking whether they’re really aspects of the same test or the same behavior, or even if they’re just different things you need to test. But with all the exact same setup, there is some I think it’s perfectly reasonable to have those be separate subtests and within support that we have now for sub tests. They kind of show up and the output as separate tests. Also, it isn’t like they’re completely hidden away. Do you even brought up that you possibly could put like, let’s say if you’re having a parameterized test and you want to have some initial work that’s done and then do all these different parameterizations that to try to have some shared work around a parameterization.

00:13:22 There isn’t a fixture. What do they call it? Scope. There’s not the right correct scope that says just around this test and do it once for all parameterizations. I think it’s something that is known because there’d be a great need for that. But even if we had that with Empire test, it still is. Like you said, it’s pretty clean to have the setup code just right there at the top of the test and then go into the subtests. So there are lots of aspects of testing that maybe there’s some weirdness around sub tests. There’s like one per test for a thing or whatever. There’s lots of rules of thumbs around testing to make clean code. But the real answer is you’ve got to have maintainable tests, and if using subtest for maintainable tests work and they help, then awesome. The thing that I am a little leery about is that you kind of have to jump in and become an expert at sub tests just to use them. Don’t you think it’s hard to be a casual subtest user?

00:14:25 Well, I think it depends in some ways your use case, I guess for some background for people who aren’t Super Brian fans and haven’t been following all the little nuances. But one of the bigger issues that you would come across with subtest is the notion of counting the test is weird. So if you’re looking at numbers of tests passing or failing, I go into details about it in the article and then even that just links out to a whole bunch of other bugs and stuff where it’s just not super straightforward to say how many tests failed. Was it just a top level test or was it any of the subtest? But once you get past that, it’s really, in my opinion, a pretty straightforward concept, with the main difference being that it’s so rarely done that it’s just not super well supported in a lot of tools. So I would say if it was as well supported as parameterized tests, I don’t think you’d have to be super expert in it, right. I mean, you can identify if everyone said the way you parametrize tests is that you write a list of test cases in your test function and then you just loop over it and then you specify which sections go into what test. I think people would get that. I think there are some other use cases that are less straightforward to think of, like the one you mentioned with the setup, because people might be thinking in terms of calling the set up function or set up class or something like that. But I think once you know about those use cases again, it’s fairly easy to use, right?

00:15:54 It is pretty easy to use, right. I guess you can be a fairly novice user and just throw that in and it doesn’t look weird if you ran across it. It wouldn’t be surprising. You wouldn’t have to. I mean, if you were reading somebody new to sub tests, I don’t think they’d be confused about what it was.

00:16:12 I mean, someone new to your tests are probably going to be much more confused about how fixtures work than they are about subtests. And yet I’m a big proponent of fixtures. I think they’re great and super useful.

00:16:28 The bar is pretty low and subtest clears it pretty easily.

00:16:32 And the counting works better than it did at first. So I’m not sure if I just tried it a little too early. So one of the things that I tried, I tried it right away because I was super excited about them. And I also use tests with the juice. Xml output plug in and are piping that too, or however you hook it up to have Jenkins read those XML files to see that there’s which tests pass and fail and how many ran. The issue first was that it was possible to get a fail count larger than the number of tests that’s since been fixed. Because with the J unit plugin now the number of tests that you have, as far as I can tell and like playing with it, it looks like it equals the number of sub tests you have plus one for the test total. Because if every place failed, that’s how many failures you get. So it looks like for the J unit XML plugin, that’s how they are counting. So a test, one test with three sub test sections or something would count as four tests in the Ju to output.

00:17:46 And that is good because when I tried it right away, I would get more failures than tests and it would break or the Jenkins plugin would reject it, which makes sense. I wouldn’t know what to do with that. The passes and fails added up, past fail, skip, and error should add up to be about what the total is. There’s other formats that’s a completely different topic, though, is that I’m annoyed that I can’t map the pipest output directly to Jenkins yet because the Xfail and X pass don’t map very well.

00:18:22 I don’t know about those sorts of exports, but I will say that there’s probably still some work to be done. Or maybe it’s just education of the users and what these are supposed to say, because it was prompted by things that you’ve said in the past to look into this. And I did notice that it’s a little weird when you run unit tests, just pure unit tests with no Pi test. If you have two tests and they each have three sub tests and all of the sub tests fail, it will say ran two tests, failures six. And then if you run it on Pi test for each passing test, it counts each test function as a test in terms of passes, but it counts each subtest as a failure or a skip in terms of failures and skips. So the number of passes plus failures plus skips in the little summary at the very bottom that you get does not always add up to be the same number.

00:19:14 That’s I guess something to be aware of if you’re relying on that number.

00:19:18 Yeah, but I think that’s probably for humans, and it’s like if you’re looking at it, you’re not going to be terribly confused.

00:19:25 If you have zero failures, you don’t really care. And if you do have failures, you’re probably much more concerned with the contents of the failure than with the number of things that failed. Yeah, I guess I can say that. Just to jump back to this idea of the setup and resource thing, I will say that I think that there are it’s interesting now that I’m thinking more because you say I did all this preparation, but really, I just threw out a blog post last night and stayed up a little later than I should have.

00:19:53 Sorry about that.

00:19:55 No, it’s really on me. I’m sure you would have been happy if I just came and went off the cuff, but as I’ve digested it a little more, I’m realizing that the idea of having a little bit of some expensive set up right in your test function and then iterating over it. And then this other idea that I, or at least another pattern that I identified as being useful for subtest, which is if you want to probe the state of a system as it evolves, are actually sort of two sides of the same coin.

00:20:26 Right?

00:20:27 So the first idea would be if you have something that’s expensive to acquire, you have to load an entire database and then you want to query five things about it. It seems fine to do that with a single sort of function scoped action there, as long as the resource that you’re acquiring is immutable. Right. If there’s a database that you can’t write to, then you’re just going to read it, have it in memory, do a bunch of stuff to it, and then release it as soon as the function is done. On the other hand, if you have something mutable, like the example I give is this cash. The zone info cache is very important in Pep 615, and I want to do something that populates the cache, and then I want to try and hit the cache from different angles, right. And each time there’s a potential that I’m going to mutate the state of that cache. Right. Like something. So I construct a zone info file and then I dump it to a pickle, and then I load it from a pickle. And I expect that the thing that I loaded from the pickle should be the same object that I created from the regular constructor. Right. So I can test that those two things are the same. But then the question is, when I loaded it from the pickle, is it possible that I disturbed the cache, that I created a new object, entered it into the cache, and then returned the old object, and so I’ve now mutated the cache. So what I want to do is I want to do the exact same test, but then immediately after that I want to get another one from the pickle or another constructor and then test that that’s the same. So in that case I have something that I’m mutating, and I want to know different aspects of how it evolves over time. And again, having subtest allows me to save all that set up, because otherwise I would have to write the whole first part of the test and then copy paste that whole first part of the test and add one more assertion and then copy paste that whole thing. So I think those two are two sides of the same coin, and they’re one of the more useful patterns for using subtests in a way that’s not just hey, you can also do parameterization without the pipest Mark parameterize decorator.

00:22:27 I definitely agree. And also the notion of having even like a workflow test, we often talk about a good test pattern of not really. I don’t really say one assert per test because I often have tons of asserts, but I try to have them, like you said, focused at the end where I’m focusing on probing a different properties of the result that I’m looking for. There’s other workflow tests that don’t follow a given one then, or a range act assert pattern, which are just like doing a whole bunch of stuff in series, and along the way doing a search and checking for things. The warning around workflow tests is it’s really painful if you have a lot of those, because if they’re all breaking, they don’t tell you much when they’re breaking, but when they’re passing, they tell you a lot. They tell you that a whole bunch of stuff is working. Right. And oftentimes it’s not a rule of thumb to try. I don’t recommend people have most of their tests like this, but having a smoke test through that does a whole bunch of stuff through a system that checks to make sure everything’s right. And it’s an incredibly efficient way to do some testing. And subtest allow you to do that to be able to say, well, I mean, you can do a search. Also, I guess one of the issues is with subtest. You’d have to be careful to do workflow tests with subtest because they’re only valid for things that you’re okay with continuing on with, because the control flow will go past the subtest block and keep going, even if it fails.

00:24:07 Even if it raises an exception. That’s another important notion, right?

00:24:11 It will swallow exceptions and keep going.

00:24:14 Yeah, that’s right. I forget that that’s not just common knowledge.

00:24:18 An assert or any exception will cause a test to fail, but it’ll also get eaten by sub test then. Is that true for unit test also?

00:24:26 Yeah, it’s true for unit tests.

00:24:27 Okay.

00:24:28 Because I think the way it works is it just catches every exception, including assertion, errors. And then instead of doing anything with that, it just says it just marks it as a failure and then keeps going.

00:24:41 You don’t learn to write, readable, maintainable Idiomatic Python overnight or even in a course. Just as you need practice to become fluent in human language, you need practice to become fluent in Python. Now in its fourth year, weekly Python exercise is a family of courses that give you such practice. Here’s how it works. On Tuesday, you receive a problem with pytest tests. On the following Monday, you receive a detailed solution and explanation, and in between, you compare notes and solutions with other students and get extra help during live monthly office hours. It’s run by ruin. Learner a full time Python trainer, new cohorts start every month or two, learn more, including samples and schedule at weeklypythonexercise.com, and become a more fluent Python developer. Today.

00:25:28 One of the other couple issues these are big problems for me that you brought up in your article is that X fail and stop on fail. Both don’t work with subtests. That’s a bummer.

00:25:43 Yeah. No, it really is.

00:25:45 Okay. So I’ll say that they don’t work with the pytest subtest plugin. It sounds like basically my whole article. I think you could replace most of the stuff about subtest with your Pi test check plug in, and almost all the same stuff would occur. The patterns might be slightly different. And it sounds like you got pytestcheck working with those functions, right?

00:26:08 Yeah.

00:26:09 The pytest check implementation. I started with a really horrible implementation and then something that I thought made sense of just keeping a list of all failures. It quickly became obvious. Well, slowly for me, but it was obvious to somebody contributing that just wrapping everything in a try block and kept catching any exception and then using that information to figure out what’s wrong is a very clean, elegant way to deal with it. So all that code makes sense. But the code to make it deal with stop on fail and Xfail correctly is luckily it’s only like ten lines of code, but it’s the Ugliest code in the plugin, so I’m not surprised that that was kind of forgotten in the pipe test subtest plug in. But maybe it’s something that probably could be added because I know I depend on X fail and stop on fail all the time.

00:27:06 At the very least, stop on fail is something that I’d want to get working.

00:27:10 Yeah, I have a lightning talk that I’ve given a couple of times on Xvail, so I’m a big X fail proponent. But yeah, I can probably do without Xfailing subtest. It’s not that hard. The stop on fail stuff has been very annoying because in zone info I’m testing daytime edge cases and there are a lot of them. So I’ll write one test and I’ll parameterize it maybe with 100 different parameters. And then the way the stop on fail is broken is that it will stop on the first failure, but it considers the first failure to be the first failed test function. So if you hit a failure in something that doesn’t have any sub tests or that has a subtest, but there’s like four sub tests, you’ll just see four failed tests, it won’t continue going on. But if you have a failure in something that has 160 subtests in it, you’ll see 160 failures and it’ll be really annoying. So the XML thing I think is just a straight up bug that can probably be fixed fairly easily is my guess. I haven’t looked at the code. The stop on fail thing is that could be a bug, and I suspect that they’ll at least have the Max fails equals one fail on the first failing subtest. But it is a bit of an issue where if you want to say all right, I want at most three failures to show up, do you want three sub tests failures, or do you want three top level test functions?

00:28:37 This is one of the main problems with subtest support in other with introducing the concept of subtest after everyone has already established the nomenclature for what is a test and started building UIs around it, which is like now we have this fuzzier thing of how many tests did I run, how many tests do I want to allow to fail? And I think that’s probably going to be a UI issue that Pythest subtest is going to have to work out, but I trust that they will. And maybe worst case scenario, they have to add a Max dash subtest fails and then we’re back in business, right?

00:29:12 Yeah.

00:29:13 I have been heartened when I wrote this article to see that most of the issues that I was finding, like the things that feel like they could be blockers are just like bugs because Pythest Uptest has 50 stars and very little adoption so far.

00:29:26 Yeah. Was it Bruno that worked on it? I can’t remember, I think. So it was a little bit cringy that we were going to talk about it on the podcast because I hope he’s all right, because I wanted to talk about it. I’m thrilled to talk about it because I like it. It’s interesting. There’s issues around it that need fixed, but it’s still cool and there’s hardly anybody that know about it. So I was actually thrilled that you knew enough to talk about it. It’s going to be cool. My take on it would be a sub test with three sub tests counts as four in the XML output.

00:30:00 Okay, that seems reasonable. So let’s just, I guess maybe just say that that’s what it is. That would be my druthers is to just say every test itself counts as a test and any subtest counts as a test also for the counting reason. And the reason why it’s important isn’t just output, it’s things like you said of how many failures if you say stop after three failures, it gets weird though, but I guess that’s the same with parameterization, right? So if you say if you’ve got four parameters and you say stop after two failures, it’s going to stop after the second parameterized failure.

00:30:40 So it’s really just a matter of communicating or providing a way for users to communicate their needs or their desires. Because as long as users know what it means to stop after a certain number of tests. And I think most people probably care about stopping after a certain number of subtests because they’re just trying to limit the back scroll. Or maybe they’re trying to. But I guess there could be some people who are like each one of these tests is very expensive, so I want to stop after the first one. But each of the sub tests is very cheap, so I only want to stop after the first top level test. So I can imagine there being use cases for stop after X failures being one or the other. Probably the more common use case is going to be the one where you stop after one subtest failure. But I think it’s really just a UI question of how these things get included. I guess the other thing that you have to talk about or that you have to think about when you’re talking about counting is I think this is something you pointed out that’s still pretty weird, which is that at least in pytest reckoning if you have three subtests and they all fail, if you have one test with three subtests and they all fail. You’ll see one dot and four FS. Right. Because it’ll say 03:01 failed, which is weird because it’s just a test and all it has is subtest and every single one of those failed, but it still considers the top level test as passed. I personally think that the top level test should be considered failed if any of its sub tests failed. Right.

00:32:17 I’m on board with counting the top level test as one test and then all the sub tests as separate tests. So that for three sub tests you get four total tests. But I’m not on board with there was no errors that weren’t in subtest. So this test passed even though all of its subtests failed.

00:32:33 Yeah, and that’s a weird one when like, for instance, I’m using, like you said, with user interface or UI stuff on top of it. So if I’m looking at my Jenkins report and I go through and look at all the passing, everything that passed, it’s going to include tests that had failures in it. And that’s weird. I don’t like that. I think that should be fixed.

00:32:59 I think it’s intentional, but I think it’s a bug. I don’t think that’s cool.

00:33:03 But at the very least, there should probably be some way for users to get at the information. I mean, the ideal situation would be if Pythek could report, even if it’s just in a machine readable way, the exact breakdown of like this many sub tests failed. This thing failed. This thing did not fail outside of a subtest, but it did fail.

00:33:27 But one of the subtests did fail. And then you could configure your plugins however you want, right?

00:33:32 Yeah. And that’s one of the reasons why Xfail has a strict mode. So we can have the sub test have a strict mode or something as well that says any failure in a subtest causes the top level test to fail. Also. That would work.

00:33:48 I wonder if that’s why the X is running the whole test, because it’s only considering the top level test.

00:33:57 I don’t know. Possibly. Like I said, it’s only got 50 stars.

00:34:02 It’s not that old and there’s not been a lot of attention to it, so it might be just some kinks to work out and that’s fine. So anyway, one of the things I did not do, I have to admit, I read some of the information you sent me. I did not read your dissertation.

00:34:21 Oh, no. Brian has come unprepared.

00:34:23 Yeah, but you have a dissertation. That’s so cool. I know it’s a totally change of subject, but this was part of your you wrote a dissertation as part of your education, right?

00:34:34 Yeah, my PhD for grad school.

00:34:36 What is your PhD in?

00:34:38 Physical chemistry. I was a chemist for undergrad, and then I went to UC Berkeley to study physical chemistry, and when I was there, I worked on instrumentation. So I build nuclear magnetic resonance devices. It’s actually called niche subfield, called zero field NMR that does not use superconducting magnets. So, yeah, I would build devices and there was a surprising amount of programming in it because I was writing consoles and stuff, but it was more optics and electronics and stuff. And then when I graduated, I did two years in industry, working in oil services, building inside out MRI machines that go underground. But then just being good at Python, it turns out that you can get jobs really easily that don’t require you to be near 40 ton machines all the time. So there’s better work from home possibilities. Just like slightly better.

00:35:33 Yeah. So are you working from home now?

00:35:35 Well, yeah, everyone’s working from home.

00:35:37 Everybody that can, right?

00:35:40 Yeah. Well, anyway, Google makes me go into the office when there’s not a pandemic on, but there’s still more work from home opportunities. Like if I work from home once a week, it’s a lot easier than when I was a hardware guy and had to go in and solder things or run some sort of mechanical tests on something. Okay, yeah, but yeah, I don’t have a remote job right now, but I live in New York and at some point in the future I think it would be nice. It’s nice that there’s an option for it.

00:36:09 I think that the world is going to change after all of this because the company I work at was not intentionally remote. But we did have some people that worked remotely because of extenuating circumstances. And now everybody has an extenuating circumstance as to why they’re working from home. And luckily in software, I feel very grateful that I work in software because it’s something that’s possible to do this. I have plenty of friends that are not able to, but I don’t think I want to go back to having to commute for an hour and a half every day.

00:36:42 Yeah, well, I mean, the thing is, I have 45 minutes to an hour commute each way as well. All my friends who have apartments in Chelsea and stuff, they’ve got a ten minute walk to work and right now they’re crammed in their little apartments and I’ve got my own full office. I’m like, ha ha. For this one circumstance, I made the right choice. Yeah.

00:37:02 Awesome. So again, I want to thank you for being willing to talk about subtests. There’s not a lot of sub test nerds in the world. I may know all of them.

00:37:13 I’m not sure.

00:37:14 Probably at least the Python ones. Right.

00:37:16 But I’m glad that we talked about it. I think it is something that people should consider when structuring their tests. There’s a lot of tools available, so why not use this as well? I’ll definitely link to your article because I think it’s an amazing introduction to actually not just an easy, simple introduction into subtest, but into some of the quirks about it as well. But I don’t think it’ll go away. It was listed as beta for a little bit, but I don’t think it will go away.

00:37:44 I mean, it’s not built into Pi test, right. It’s just a plug in right now.

00:37:48 Right.

00:37:48 So hopefully someone will be able to continue maintaining it if we can get Cordev or someone to agree to start testing the standard library with pytest, which I do not think is terribly likely. But stranger things have happened. It would probably be close to a necessity to keep it maintained as it is for now. I guess the worst case scenario is probably always going to be the niche of people who like Pi test and are writing stuff that is eventually destined to be in the standard library. It’s been very helpful for that.

00:38:23 Since you work with the standard library, do you have to jump back and forth between unit tests and pytest then?

00:38:28 Yeah, well, also I work at Google and they use Google Test or Absale Test or whatever it’s called, which is essentially unit test. Okay. So I have to jump back and forth. It’s not that bad. I always feel like when I’m using unit test, it’s a little clunky, but I’ve been trying to approach it with an open mind, and I have been surprised at how well I can express things in unit tests that are more natural for me to express in Pi test. I’ve also been sort of trying to assemble the cakes for Pi tests that I can pitch to the Google teams that make decisions about this sort of thing. So every time I go and find something hard to do in unit tests, I want to put it in my little list. And I’ve been really surprised at how well things like setup class and set up module and then some other stuff around. Context managers and clean up are really able to take the place of fixtures in a lot of situations that I would normally just write a fixture for and call it a day. I still think that there’s a lot of benefit to pytest and the reporting of pytest is great, but for the most part, writing unit tests is not a terribly different experience for me.

00:39:41 Okay, yeah, I find them jarringly different, but I make very heavy use, so I make very heavy use of fixtures and levels of fixtures.

00:39:54 That’s where I try to put most of the slow work. And I know you can sort of get away with that with set up and tear down, but you can’t cross module boundaries with unit tests very easily.

00:40:09 That I know of anyway, maybe there’s some way. So, like session level fixtures. I don’t know if unit test has session level initialize a database and have multiple test modules be able to run against that sort of thing.

00:40:22 Can you just put it in a module and then import the module?

00:40:27 I don’t know. It just still seems like the setup and tear down will probably happen.

00:40:33 Oh, the tear down for the module. That’s going to be tougher.

00:40:36 Yeah.

00:40:37 Anyway, I’ve got the weird. I guess it’s not too much of a corner case, but working with hardware and long run times, I really care about setup and tear down times and stuff, so those are great. But actually unit test isn’t terrible. They’d be really cool if we started testing core with pipest. So I have one more question. Actually, I probably have lots. But as a core developer, you’re a core developer, but you work on the Python code, right? You’re not one of the C Python people.

00:41:08 C Python is the name of the reference implementation.

00:41:12 Yeah, I guess. Right. So what I meant was a pause. There some of the people that write C Code that implement Python.

00:41:18 There’s not a super clear distinction between roles. Like once you’re a core developer, I can commit anything in any module. I shouldn’t do that. And I tend to defer to the experts on things. So I’ve mostly specialized in daytime and a couple of other minor things. Packaging is a big thing because I also maintain setup tools. I’m one of the maintainers of setup tools and probably the worst maintaining tools, but I am a maintainer of setup tools. So I happen to know a lot about packaging.

00:41:46 But if you’re asking if I write a lot of C, I write a decent amount of C because daytime is written in C and a lot of the improvements that can be made are around efficiency and things like that. So there’s no Info module that I wrote for Pep 615.

00:42:02 It has a C extension. It’s actually a very interesting testing topic that maybe we can address at a different time. But there’s this thing called Pep 399 which says that all new modules in added to Python core that have an accelerated C extension, like the utility zone info, pretty much all the stuff that is fast and in the standard library must also have an equivalent pure Python implementation that can be used in place of the C extension if you can’t build the C extension. So to write Pep 615 with a C extension, I had to write the entire thing in Python and then write the entire thing in C as well. And they have to do the exact same thing. I mean, the spirit of it is that they should do the exact same thing in all situations. In practice, the way it’s worded is just that all the tests that pass for the C implementation must also pass for the pure Python implementation. But if someone reports a bug to me that says like, hey, the implementation of daytime does this and pure Python one does this, what do we do? We have to harmonize it. And that’s useful for, I think, other implementations of Python like PyPy because they don’t get the same advantages from the C extensions because PyPy is Jet compiled. So I think they mainly use just the pure Python implementation of Daytime. And the only use they have for the C extension is that they have some wrappers that support C extensions. So they have to wrap the C API to make calls into the pure Python side.

00:43:37 You can’t make use of Hypothesis either, right?

00:43:42 Language Summit is coming up next week, and Zach Hadfield Dobbs is pitching the use of Hypothesis for Pepsi 15. I’ve been heavily using Hypothesis, and I think we’re pitching that we should be testing the standard library, because I think that came about because when I was writing something that eventually someone found a SEG fault bug in one of my Daytime editions, the from ISO format method. And I was like, well, I wrote Hypothesis test for this before it got merged in, but this was enough of a corner case that it didn’t get hit in the first couple of runs. And if I had merged the Hypothesis tests and they were running on PR within a week, I would have found this and it would have never hit a release. But I had to throw them away. So right now, Zack he’s maintaining a separate repository with tests, just like, how about this test just for the standard library? And I think the plan is to pitch that there should be some official project, whether it goes into the C Python repository and we run it on PR, or we just add some build bots that once a week run the full suite, or it’s run continuously in Fuzzing mode.

00:44:55 That kind of test, run two implementations and throw random stuff at it, make sure that they both get the same output.

00:45:03 That’s easy. Hypothesis test to put together.

00:45:08 It already found one bug. I did the exact same thing, and it found a bug when I was trying to merge it.

00:45:12 Oh, nice. Okay, interesting.

00:45:15 Well, hopefully we get a hypothesis, and that would be a good reason to pull pytest in, too, because Hypothesis plus Pi Test works really well.

00:45:26 Yeah, it certainly works better than Hypothesis with Unit Test, but unit Test works okay. There’s just a couple of weirdnesses. Like the setup function doesn’t get run every time on every Hypothesis test. And there’s some other things. Subtests with Hypothesis are just like completely broken. But with Hypothesis and Pi test subtest, it does work.

00:45:48 Okay. Yeah, I think it would be cool to talk more about your work with the C extension Testing Extensions testing and Testing with the Python core stuff. That would be a great topic. Also interesting.

00:46:02 Cool.

00:46:03 Sure. Well, let me know when people who listen to this have had enough time that they won’t be sick of me.

00:46:13 I’ve already got an hour.

00:46:14 You’re a better writer than I am. I’m a little jealous of that. You said you wrote this last night, but it’s a great article. I do have one little bit of advice. It’s just anecdotal may not work for everybody but I went through the process of trying to convince my work environment to adopt pytest so my process was to write a book on Pi test and then ask and it worked really well. Just saying.

00:46:41 Do you think this will work if I write a novel like a science fiction novel?

00:46:44 I would love I don’t know if it would work for you but it definitely would benefit me to have you write a science fiction novel that had pipest in it.

00:46:52 Yeah, I think it’ll be. I’m already seeing it. Aliens come all their stuff runs on unit test. They’ve got patches, core and unit test and the only way to stop them from taking over the world is to convert Google’s code base over to using pytest. So as soon as that wins like a Hugo award and a Pulitzer prize I suspect that they’ll be like, you know what? We got to do it now as a reference to the book.

00:47:19 Yeah, I think so. We’ll get started on that.

00:47:22 All right. I’ll do it first thing Monday morning.

00:47:24 Yeah, and this probably won’t come out for a couple of weeks. If you want to do a landing page to try to get people waiting for the release, we can link to it anyway.

00:47:39 This is a lot of fun. Thank you for coming on and we’ll talk to you next time. Great. Cool.

00:47:44 Thanks.

00:47:48 Thank you, Paul. I really enjoyed geeking out about subtest with you. Thank you config cat for sponsoring configcat.com feature. Flag service lets you release features faster with less risk. Thank you Ruvan for sponsoring ruvenlearners weeklypython Exercise.com will help you become a more fluent Python developer. Thank you to listeners that support through Patreon. Join them by going to testandcode.com support all those links and more links including Paul’s awesome article on subtests are in the show. Notes at testandcode.com 111 if you do start using subtests, I’d love to hear about your experience with it. That’s all for now. Now go out and test something.