Software tests should be order independent. That means you should be able to run them in any order or run them in isolation and get the same result. Adam Johnson created pytest-randomly to ensure test independence.

Transcript for episode 128 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

00:00:00 Software tests should be order independent. That means you should be able to run them in any order or run them in isolation and get the same result. However, system state often gets in the way, and order dependents can creep into your test suite. One way to fight against this is to to randomize the test order, and with pytest, we recommend the plugin pytest randomly to do that for you. The developer that started pipest randomly and continues to support it is Adam Johnson who joins us today to discuss Pytest randomly and another plug in. He also wrote called Bytes Reverse. This episode of Test and Code brought to you by PyCharm. Save time, use PyCharm and by Honey Badger when bad things happen, it’s nice to know that Honey Badger has your back and by listeners like you who support the show through Patreon and by Talk Python Training do you want to get better at Python? Now is an excellent time to take an online course, whether you’re just learning Python or need to go deep into things like APIs. And async our friends at Talkpython Training have a topnotch course for you. Visit Talkpython. Fm test to find your next level that’s Talk Python FM test. Stick around until the end of the show, also for a chance to win a free course.

00:01:23 Welcome to Test and Code today in Test and Code. Am thrilled to have Adam Johnson here, and we’re going to talk about randomness and testing. Welcome, Adam. Thanks for coming on the show.

00:01:43 Thank you for having me.

00:01:45 For people who are unfamiliar with who you are, can you introduce yourself?

00:01:48 Sure. My name is Adam Johnson. My website is adamj EU. I am a Django core contributor and what I call a solo consultant now helping out a number of small companies with Django projects.

00:02:03 I like drinking tea, I’m drinking some now. And drama based music.

00:02:07 Django Core contributor?

00:02:09 Yes.

00:02:09 That’s cool. How long have you been involved with Django?

00:02:12 I’ve been using it since 2012, I think. I made my first contribution in 2014, and then in 2016 I was voted into the Jacko core team, which has now turned into a change of governance. Pretty much anyone who’s contributed is called a core contributor.

00:02:35 My current position is as a technical board member, so I’m one of the five people who has a Yay on a on major features going into Django.

00:02:43 Neat. You also use pytest?

00:02:45 Yes. Testing with Django’s Test framework is always a little bit unit testy and pytest just offers so much more. So I love using Python on my open source projects and helping clients move over to it.

00:03:01 One of the things I did on the show was we did this top by Test plugins and your pipe test randomly showed up on the list. Why did Pyte randomly come about and can you tell me about that?

00:03:14 Sure. I was working at a company called Wide Plan.

00:03:18 Unfortunately no longer exists, but at the time we were rolling out Factory Boy for our tests, which you might have covered before. This is a tool for generating random data in tests, so you might want to say hey, give me a user, but you don’t want to have to specify the user’s first name, last name, email address, et cetera. Every time you do that in your tests.

00:03:42 And also you want your tests to have to declare that if they depend upon it.

00:03:48 So the solution there is use a Factory Boy factory which generates random data from there. We were looking at controlling it in case we had a failure due to the random data. We wanted to be able to reproduce that.

00:04:03 So this Factory Boy already does it have a seed feature already in it?

00:04:08 Yes, it uses a random generator from the random standard library random module in the standard library, so you can receive that to make it generate the same data.

00:04:22 So what Pyte randomly does is it generates a seat at the start of the run, it lets you pass on in, and then it will receive that random generator and several others, including the default one in the random module to control that.

00:04:35 So that’s one of its two responsibilities. The other responsibility is test order.

00:04:44 Let’s face it, your code is going to have errors, even code written by a kickass developer such as yourself. And when bad things happen, it’s nice to know Honey Badger has your back. Honey Badger makes you a DevOps hero by combining error monitoring, uptime monitoring and Crown monitoring into a single easy to use platform, all for way less than you’re probably paying now. Honey Badger monitors and sends error alerts in real time with all the context needed to see what’s causing the error and where it’s hiding in your code so you can quickly fix it and get on with your day. The included uptime and Cron monitoring also lets you know when your external services are having issues or your background jobs go AWOL or silently fail. Go to Honey Badger. Io and discover how Star, Josh and Ben created a bootstrap monitoring solution. Why is this important? Self funding means they only answer to you, the developer and not a venture capital Overlord Test And Code. Listeners get 30% off for six months. Simply mention the test and code podcast when signing up and they’ll apply the discount to your account. No credit card required.

00:05:47 Tell me why test Georgia is important.

00:05:49 Okay, so you may have had non isolated tests where somehow a test depends upon mutations to global state like a database or a Python module from a previous test, and then you do something that causes the rearrangement of those tests or the removal of the depend upon test and the test starts failing and it’s really hard to debug.

00:06:15 So what Pyte randomly does whilst it’s resetting the random seed, it also shuffles the order of your tests every test run based upon that seed.

00:06:25 So no two test runs will have the same ordering. And hopefully you’ll expose any dependency but very early on in the process.

00:06:33 Yeah. When I was looking at the documentation of this, it randomly shuffles the order of the test. But the next statement is this is done first at a level of modules, then at a level of classes if you have them, and then at the order of functions, what does that mean and why is that important?

00:06:52 This is mostly to keep tests grouped so that they can reuse fixtures.

00:07:00 Tests inside the same module or inside the same class tend to use the same set up. They might use set up class and unit tests, for example. So by keeping them together you can reuse that if you shuffle them completely randomly and separate them, then they can offer run way slower because that same module or class level set up needs to be repeated at multiple points during the test suite and toned down.

00:07:25 Okay, if I’ve got like I’m setting up a connection to a database and then running 100 tests or something within a parameterization within a module, we still want to try to get that fixture to only run no more times than it did before.

00:07:38 Exactly. Yeah.

00:07:40 Okay, that’s cool. There’s probably some benefit in completely random order, but you get most of the benefit with what you’ve done. The connection with other libraries is pretty neat. So the random order is the thing that I really knew about. But this being able to seed things like looks like you’ve got Factory Boy and Faker and NumPy.

00:08:02 Yeah.

00:08:04 Okay. It also says it works with pipe test exist. So will it re randomly order distributed tests as well then?

00:08:13 Yes it does. The randomization is done exactly the same in each process before they then pick which test to run because all the workers collect the same set of tests and then they pick their slice of which test to run. Okay, so I think the way I had to work around with extras is make them all do exactly the same random shuffle on each worker so they end up with the same random thing and then say workers zero select the first slice and then worker one selects the second slice.

00:08:49 Okay.

00:08:52 I PyCharm is the Python ID for professional developers. High PyCharm is a huge collection of tools out of the box, includes an integrated debugger and test runner, Python profiler, a built in terminal, integration with version control and built in database tools, remote development capabilities with remote interpreters, an integrated SSH terminal, and integration with Docker and Vagrant. In addition to Python, PyCharm provides first class support for various Python web development frameworks, specific template languages, JavaScript, Coffee script TypeScript, HTML, CSS, Angular. Js, node. Js and more. High Term integrates with a Python notebook and has interactive Python console and supports Anaconda as well as multiple scientific packages. Including Mathie. Try out all of these timesaving features of PyCharm Pro for four months with the link Test And Code. Compai PyCharm. Make sure your editor is working with you to save you time. Use pie chart.

00:09:52 If I just put a Pip install by test randomly, it just starts working. It starts reordering my tests.

00:09:59 Is it already doing the random seed of Factory Boy and Faker and stuff like that? If necessary?

00:10:05 Yeah, it detects them automatically.

00:10:08 Also, in the output of each test run, it will have the random seeds that was used to generate it, so you can rerun that exact same test for it if you need to.

00:10:18 Cool to check that out. That seems like incredibly useful.

00:10:22 I am using it on all my open source projects, and I think it has helped find a few isolation errors.

00:10:28 Then I think this is probably a good one to just put this in your test all the time. I can’t think of a reason to not.

00:10:36 This is interesting. You said it’s also available for Nos.

00:10:39 Oh yes, it started on Nose than five years ago or so. At this point it started as a Nodes plugin. Then at YPlan, we ported Pytech, so I ported the plugin. The nodes was no longer maintained, just like Nose itself.

00:10:57 And then in the documentation of Pipe test randomly, we also see a reference to Pipe test reverse, right? Yeah, what is that?

00:11:07 So there’s actually quite a lot of theory around finding non isolated tests, and one podcast randomly user actually pointed me to a paper that had studied non isolated tests in Apache open source projects, and they tested various strategies for finding them, and randomization was one of the strategies they tested, but a much simpler one was simply running the test suite in reverse. So if you run Test ABCD, then it just runs DCPA, and that caught most of the isolation failures. It’s normal for isolation problems to be from one test to another, not a three way isolation where you need tests A and B to modify the state before C runs.

00:11:58 So Python’s reverse.

00:12:01 Well, it came out of that. Plus I noted that this reverse flag already existed for Django’s test runner, but wasn’t available for Pipest. We’ve been overcoming such isolation problems with Django for a while, and by using the reverse flag on CI, but it wasn’t possible with Pita.

00:12:18 Okay, it seems like it’s algorithmically easier, but is it really that much faster than randomly?

00:12:24 It’s probably not noticeably faster for most projects, but if you have any problems with random order, then reverse order is a cheaper, less likely to fail way of finding those isolation problems.

00:12:38 It’s actually pretty cool. I’ll check out both of these, so thank you for putting these in place. This is a great addition to the ecosystem, and I’m interested in reading this empirically revisiting the test independence assumption. It’s a paper that’s linked to from your reverse plugin page bit of a tongue question. So if people don’t know, why is it bad to have dependent tests?

00:13:01 So if your tests depend on order, that means that the one running later. That depends on something.

00:13:07 It doesn’t state everything that it needs to run. So you might be trying to debug it, and you don’t realize that there are two users in the database, even though the test says it only creates one because a previous test has left a user around, for example.

00:13:25 And then you might also delete the test it depends upon and you have a failure in a test that you didn’t touch.

00:13:33 And you can’t figure out why I run my suite and then what do I do if there’s some failures?

00:13:39 So there’s a couple of things I often do. I’ll often run it with X to find the failure the first failure and stop on it, and then I’ll probably run it with a dash and a Dashl, and then whatever other debugging flags I want to just isolate and Zoom in on that failing test and maybe show the locals or something else. Or run it in the debugger or something.

00:14:13 It just drives you nuts when you can run the suite and it’ll show some failures. And then if you run the last fails of the LS and everything passes, it’s just really infuriating.

00:14:26 Yeah, definitely.

00:14:29 You’re giving me a flashback.

00:14:30 If I randomize it and things are not isolated, I’m getting different results with different randomization, then what? How do you debug that?

00:14:39 Yeah, that’s a tricky one.

00:14:41 I think with many things in software engineering, it’s way cheaper to add some kind of isolation guard, whether it’s randomization reversing or whatever, early on, because they’re just as hard to debug as the situation you described.

00:14:58 You have no idea where to start.

00:15:02 Yeah.

00:15:02 Hopefully you can drill down and select a subset of the tests that caused it to fail.

00:15:09 That’s at least the intention of being able to pass in the seed. Again, I have got one open issue, I think, on randomly, on a better way to randomly sort the test that you could say if you had like five modules of tests and you found a random seed that causes a failure, you could drill down one module at a time or pairs of modules with that same seed to see if the dependency is from, say, module A or module B, and you can drill down to classes and then individual tests.

00:15:37 Oh, interesting.

00:15:40 Currently the shuffling is completely random based upon the set of tests given, but this would use more of a hash based algorithm, so that if you put in a subset of the tests plus the same seed, they come out of the same order.

00:15:54 Yeah.

00:15:56 I think starting early in a project is definitely a good thing, because what I’ve noticed, for instance, Pay test has a predictable order, then a module they run in like top down order as defined and because of that, a new person to writing tests sometimes will notice this and depend on it and start writing tests in order that they know that they’re going to run it and you don’t want that. So you know, you could even just help out new people to testing by throwing this in your system early on.

00:16:35 Yeah, exactly.

00:16:38 I think that’s a classic example of Hyrum’s law which I was talking about Google like anything that your API happens to expose like leak as a detail like running tests from top to bottom, people will depend upon.

00:16:55 Interesting. I’m going to have to look that up. Anyway. Very cool plugin and thanks for introducing us and actually we have plans to have you on another episode so we’ll talk to you soon.

00:17:07 Thank you very much for having me.

00:17:12 Thank you, Adam. Cool plugins and great information. Thank you Pie Charm for sponsoring the show. Try pycharmourself at Test And Code. Compycharmsavetime PyCharm thank you honey Badger for sponsoring check them out at Honeybadger. Io mentioned Test And Code and get 30% off when bad things happen. It’s nice to know honey Badger has your back. Thank you talk Python training for sponsoring check them out at talkpython. Fm test to level up your skills. Online Video Courses for Python Developers Enter to win a free course by joining the show mailing list at subscribe thank you Patreon supporters for continuing to support the show. Join them by going to support all of those links as well as links to the projects mentioned in the show are in the show notes at testingcom 1218 that’s all for now. Now go out and randomly test something.