Mutation testing is a way of testing the quality of your test suite. Anders wrote mutmut for mutation testing in Python, and explains mutation testing, mutmut usage, workflow, and more on this episode.


Transcript for episode 138 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


00:00:00 Your test suite tells you about the quality of your code under Test.

00:00:04 Mutation testing is a way to tell you about the quality of your test suite.

00:00:09 Anders wrote Mut mutt for mutation testing in Python. In this episode, Anders explains mutation and testing and how mutation testing with mutt mute works.

00:00:33 Welcome to Testing Code.

00:00:43 Today on Test and Code. I am thrilled to have Anderson on the show. Welcome to Testing Code.

00:00:48 Thank you.

00:00:49 For people that don’t know who you are, could you tell us who you are?

00:00:52 Yeah. Hi, my name is Andershovmader. Anders is fine.

00:00:58 I’m mostly self taught programmer. I’ve been programming since somewhere in the 90s. The memory is a bit hazy. And professionally since 2001, I have way too many projects to get anything done on. Most of them. And I work at Treoptima, which is a financial company. We work on minimizing systemic risk in the global financial system.

00:01:21 Wow, that sounds fun. Is that fun?

00:01:24 It is quite rewarding.

00:01:30 We’re the good guys in the financial system, in our opinion, and you really feel that you’re contributing something.

00:01:41 That’s nice.

00:01:42 What programming languages do you mostly use?

00:01:47 I’m Python all the way.

00:01:49 Okay.

00:01:50 You have to do JavaScript, right?

00:01:53 Obviously, because it’s a web thing, and then we actually do Elm, which is a bit unusual for the front end.

00:02:02 One of the reasons why I asked you to come on the show is mostly to talk about mutation testing, because I don’t think I’ve covered it much before. I think I’ve covered it in passing. But you know quite a bit about mutation testing. Is that correct?

00:02:16 Yes, I believe so.

00:02:22 I wrote the mutation tester and I think it’s the Premier mutation tester for Python. There are two other mutation testers. I’ll get into them a little bit later, I think.

00:02:36 Okay, so which one is yours?

00:02:38 Matmat. Okay, it’s a StarCraft reference if you play.

00:02:43 Okay. I was curious about if it was a reference to something.

00:02:48 Yeah, StarCraft reference, but it’s also mutation. It’s mute mute, and then the logo is muted twice and in a different sort of font. It’s kind of clever. Maybe a bit too clever.

00:03:07 I’m going to go look.

00:03:09 Okay.

00:03:10 It’s nice, but. Yeah. And if I had that as a sticker on my laptop, I’d definitely get questions about it. What’s that?

00:03:25 That’s a good idea.

00:03:30 We’ve actually heard quite a bit about.

00:03:34 I’ve been pronouncing it mutmut. Is that how you say it?

00:03:37 It’s fine. I pronounce it mutmut because it’s the StarCraft reference to mutalisks.

00:03:48 Maybe it’s mutmud. I don’t know.

00:03:50 Okay.

00:03:51 It’s fine.

00:03:54 Nice.

00:03:55 So mutation testing for the person that never has heard of it. What is mutation test?

00:04:02 Yeah, we should probably discuss that with. So actually, a lot of people actually do a sort of mutation testing manually in practice. So what happens is what you want to do is you write a test, you have a test or a series of tests for some bits of code and you go into your working code and you change it in some way that is broken, and then you run your test again. And if the tests pass with your broken code, that’s bad, right?

00:04:38 Yeah.

00:04:39 At least the tests you’re running aren’t hitting that piece of code.

00:04:42 Yeah. Your tests are incomplete in some way.

00:04:45 Yeah.

00:04:46 So that’s what mutation testing is. And a lot of people actually do that manually all the time. You have a bug, you introduce a bug or something in the code because you think something is fishy. You run the test and you see the test all succeed. That’s not good. And then they work on it. So that’s actually a lot of people use it. They don’t actually know that that’s what it is. But mutation testing normally means, of course, automated mutation testing, where you just give it a code base and it just goes.

00:05:26 PyCharm is the Python ID for professional developers. Pycharm’s huge collection of tools out of the box includes an integrated debugger and test runner, Python profiler, a builtin terminal, integration with version control and builtin database tools, remote development capabilities with remote interpreters, an integrated SSH terminal, and integration with Docker and Vagrant. In addition to Python, PyCharm provides first class support for various Python web development frameworks, specific template languages, JavaScript, Coffee script TypeScript, HTML, CSS, Angular, JS, node. Js, and more. High Term integrates with IPython notebook and has interactive Python console and supports Anaconda as well as multiple scientific packages including Map, Lib and NumPy. Try out all of these timesaving features of PyCharm Pro for four months with the link Test And Code. Compaidcharm. Make sure your editor is working with you to save you time. Use PyCharm.

00:06:28 So what is it doing when it goes?

00:06:31 Yeah, so my mutation test there, and actually nowadays cosmic rate too.

00:06:40 We’re playing catch up with each other a little bit.

00:06:43 So what it does is it reads the code or the file it’s supposed to do something with. So it goes through the code and it changes it and it writes it actually on disk.

00:06:56 So it changes. Like if there’s an if A less than B, it’s going to change it to if A less than or equal to B, then it’s going to run your test suite. And if your test suite passes right, then it didn’t detect that I changed your code. So that’s what we call a surviving mutant, and that’s bad.

00:07:19 So the nomenclature is a bit difficult with mutation testing. So there are mutants. Surviving mutants is bad. Killing a mutant is good, it’s not great, but that’s the wording that we use.

00:07:36 Okay, normally killing is bad, right? But in mutation testing, it’s good.

00:07:42 So what does killing and mutant mean?

00:07:44 It means that you change the code and your test suite failed.

00:07:49 Okay. Yeah.

00:07:50 It means your test suite can detect that your code was changed, which is good. That means you’re testing all of the behavior that’s in your code.

00:08:00 Yes.

00:08:01 Making a change, like changing from less than to less than equal or something like that.

00:08:06 Yeah.

00:08:09 I was curious about what kinds of it’s going through and making changes to your code and then rerunning the tests.

00:08:18 So it’s probably making lots of changes one at a time. Or how much does it do? A bunch at a time.

00:08:26 One at a time.

00:08:28 There are theoretical papers out there talking about doing multiple changes at one time, because then you can run because then that’s just faster. You can change a bunch of times and you run the test with once, and that’s just way faster than one by one by one by one by one. The problem is I’ve never gotten that to work because you just get false positives and it’s just a total chaos.

00:08:53 So I didn’t do that. I gave it a shot, and I’ve talked to a bunch of other people doing mutation testers in other languages.

00:08:59 None of them have gotten that to work in practice. So one by one.

00:09:06 Okay.

00:09:07 Yeah.

00:09:07 My first thought on mutation testing was that it would just flip a character somewhere, just any character to something else. But that’s not really what’s going on. Right.

00:09:19 Well, you can do that, but mostly you’ll end up with syntactically Invalid code and nobody cares. Right.

00:09:24 Right.

00:09:27 So what I do is I modify the AST the abstract syntax tree of your code. So the library I use is Parson, which is a fantastic library. Everybody should check it out, even if you’re not interested in mutation testing or whatever.

00:09:46 It’s a spin off from Jedi, which is something you might have heard of, which is the autocomplete system now use by I. Python, actually.

00:09:56 Fantastic. So it’s an abstract syntax tree sort of library that can round trip your code, which means that you can read your code into the abstract syntax pre. Make a change, write it down, and only your change will appear on disk.

00:10:15 So you can use it to make refactoring tools, for example.

00:10:19 Okay, cool.

00:10:20 Which is sort of what mutation testing is. It’s an evil refactoring, kind of, because you’re changing it.

00:10:33 It’s a great library.

00:10:36 I used to use another library called Baron, but that wasn’t maintained. The parcel is extremely well maintained because it’s used by so many projects, so it needs to be straight on the bleeding edge has to support all the new features of all Pythons and everything. So it’s perfect for this.

00:10:54 And another good thing about using this library is because I can make these mutations and write them down to disk without changing everything else. I don’t touch your comments or anything. It’s very easy to actually take a mutant. That’s right. Apply it on disk. You can see it, you can have it on disk. The mutant it’s very nice to work with.

00:11:17 In the beginning, when I started, my competitors like Cosmic Ray and Modify, they actually modified the abstract syntax tree of the builtin Python abstract syntax tree standard library thing.

00:11:33 The problem with that, it doesn’t maintain your white spaces, your comments and everything, so it just throws them away. So when you got a mutant, your diff is not your code and not your code mutated.

00:11:48 And that’s just so frustrating. Right. It’s really hard to work with. Yeah.

00:11:54 The line numbers are going to not line up and nothing’s going to line up.

00:11:58 Yeah. It’s just total chaos.

00:12:01 So now Cosmic Ray has switched to Parson, which is really nice. So we’re both sort of head to head on this front.

00:12:10 Okay.

00:12:12 So there’s logic changes, like flipping the logic. And I’m looking at the documentation. There’s also things like changing hard coded numbers. So if there’s like a five there, it’ll try six and four and stuff like that, and then changing breaks to continue and vice versa.

00:12:32 Yes.

00:12:36 So these are selective. These aren’t just random things. These are things that people actually get wrong a lot of times.

00:12:41 Yeah.

00:12:43 So these are great.

00:12:47 Are there any others that are interesting that people might not know about? Other things that you flip around and change?

00:12:54 Yeah, I modify dictators, for example. This is a fun one.

00:13:01 So I’ll flip the keys on the Dick literal, but I’ll also do it on the DICT literal with you DICT parenthesis syntax.

00:13:13 Okay.

00:13:14 And there’s a config option to put in your own stuff tab because we do like a lot of people have destruct classes, which is just addict where you can access it with dot syntax. Right. It’s fairly common in Python. So we have these at work.

00:13:29 And I add that as a sort of a synonym for Dick. So it mutates the keyword arguments in those. So, you know, if you’re passing in something and it’s just nobody looks at it on the other side, that’s going to be caught.

00:13:43 That’s a nice one, I think. And then I dropped the first I think it’s the first parameter of function calls.

00:13:57 That’s a fun one.

00:13:59 Yeah, it sounds a bit weird.

00:14:04 This sounds great. Tell me the thinking behind that.

00:14:07 The thinking is we found bugs in production on code that we had run mutation testing on and had no surviving mutants on. And those were real bugs, and that mutation would have caught it.

00:14:22 Okay.

00:14:23 So that’s just it.

00:14:26 And we had one really bad one, which was we have this structure library. We use it all the time and it has C implementation, so it’s really fast, which is nice, which is also why we’re very worried, because then we have two implementation. We have Python implementation and a C implementation. So it’s very nice to have a really good test suite. So mutation testing seems the perfect thing. Right.

00:14:56 If you’re enjoying this podcast, make sure to check out Electronic Specifier Insights. Their Editors dig into the electronics industry, how new tech is shaping our postcoded 19 world reviews from all the top electronics shows and the latest tech that electronics companies are releasing. You can find them by searching for electronic specifier insights in any streaming service or by going to electronicspecifier. Comnewspodcasts.

00:15:28 Hopefully, it’s obvious by now, but mutation testing is not about making sure your code is solid.

00:15:36 It’s a judgment of your test code.

00:15:39 Yes, exactly. It’s an extremely strong judgment on your test suite. Right. You cannot have behavior in your code that is not tested. Yeah, there’s no way to do that.

00:15:57 The other, I guess way to try to think about it is like trying to do path coverage. 100% path coverage. But that’s still not going to catch as much as this is.

00:16:10 No, this covers way more.

00:16:14 For example, I muted string literal.

00:16:18 I used to insert xx and then startup the string.

00:16:24 So that means that if you have an assert, for example, with a message anywhere in your code and you don’t actually have a test that says with Pyte races assert error, and then check the exact error message that you get, the mutation tester won’t accept that.

00:16:48 It’s kind of brutal.

00:16:49 That is brutal.

00:16:50 Yeah.

00:16:51 And it finds a lot of bugs.

00:16:56 Let’s say I kind of don’t care about the error message as long as there’s something there. Is there a way to turn that one off?

00:17:05 Yes. So we have white listing and the other ones. Also, I’m talking with the Cosmic Ray, so we try to make our systems a little bit compatible. So if you invest in one of the tools, you can actually try out the other one and have some of that investment sort of pay off still in both tools. So we do whitelisting. The most trivial one is on the line.

00:17:33 You write a comment Pragma, no mutate.

00:17:38 Right.

00:17:39 Similar to no cover.

00:17:40 Yeah.

00:17:44 Okay.

00:17:44 That’s cool.

00:17:46 Yeah. You need that in some cases. Like I changed break to continue and continue to break. For example, changing a break to continue can introduce an infinite loop.

00:17:57 Right. That’s annoying.

00:18:01 Okay.

00:18:02 Yeah, it’s not fun. So I detect when your test suite is taking way too long and I kill it and say this timed out, something is wrong.

00:18:11 So you have to whitelist that because otherwise the mutation testing is just going to take way too long on that specific mutant.

00:18:18 Okay, so tell me in the places where it thinks I had to put those in there.

00:18:24 Yeah. Okay.

00:18:26 It’s okay.

00:18:28 It’s rather annoying, but it’s okay. You can introduce infinite loops in some other subtle ways, but this one is fairly easy to do.

00:18:38 Yeah.

00:18:39 And then I also have an advanced mutation whitelisting system where you can have a little bit of code, sort of Python module. Actually, I load it and I have callbacks that I call you back during processing.

00:19:00 And then you can look at the line and do what? Like say, skip to the entire thing and do some other fiddle bits. This is very experimental, though, but I use it successfully in one big project.

00:19:18 That’s cool.

00:19:20 So one of the things that everybody is probably thinking about right now is I make a little change.

00:19:29 For example, the first thing you probably wonder, how many Newtons do I have in our code?

00:19:34 And my average is I’ve looked at some averages. Normally you have 0.5 Newtons per line.

00:19:44 It’s fairly normal.

00:19:48 Okay.

00:19:50 The number of mutants you have to test is going to be the number of lines of your code divided by two.

00:19:56 Like that’s a fairly big number. Probably the best way of doing it is I run your entire test suite.

00:20:09 Okay.

00:20:11 With a fail fast flag, obviously, because otherwise that’s just a waste of time.

00:20:16 And that’s just super slow.

00:20:19 Okay.

00:20:21 Because, say your test suite is 8 seconds and it’s 30 lines of code, which is I have a project like that, 3000 mutants. So it’s 6000 lines of code, roughly 3000 mutants times 8 seconds. That’s not fun.

00:20:42 Right?

00:20:43 Right.

00:20:45 8 seconds is quite annoying in a normal test suite, but multiply it by a thousand and it’s not fun anymore.

00:20:54 So what you want to do is you want to severely restrict what mutants you run.

00:21:01 And I have some ways to do that. So right now will actually call you back saying I’m going to run this mutant, and at that point you can modify the command to run the test. So I can add actually run only these tests, like this specific test file, for example.

00:21:28 And then suddenly your mutation testing scales linearly with the code base instead of exponentially.

00:21:35 Yeah.

00:21:36 Which is nice.

00:21:39 And that’s also sort of experimental. And I’ve used it in one big project and it works amazingly.

00:21:46 I guess the mutation testing doesn’t really matter which test framework you’re using, right. Or does it?

00:21:53 Mine does not. I have a command line. I run it, and if the exit code is zero, it passed. Done.

00:22:02 Okay.

00:22:03 The problem with that, of course, again, is that it’s slow. Everything is slow.

00:22:09 I have to start another process. I have to deal with any overhead of the test runner.

00:22:14 I’m not sure how hypothesis gets around it or if hypothesis hypothesis is running within one pipe test process. So hypothesis isn’t calling by test, it’s the other way around.

00:22:26 Yeah, exactly. So they don’t have that overhead problem, which is great.

00:22:32 If I want to use this on a project to look at my test. Is this word for one time thing, an occasional thing, or do I incorporate it into a continuous integration workflow, or what’s your recommendations around that?

00:22:47 I would say intermittently, yeah.

00:22:49 You do it once a blue moon. Right.

00:22:55 So another thing that Matt does is it keeps track of where you are in the test runner. So it knows that this line, you killed all the mutants on this line and it’s not going to try and rerun it.

00:23:09 And it knows that if you didn’t change your test suite, you can’t have killed any mutants.

00:23:18 And it does a lot of stuff like that. So you can actually run it for a while, let it go, collect a few mutants, work on them, make some new tests, rerun, and work forward. And it’s fairly quick, too, so it remembers what it did, so you don’t have to rerun all that work all over and over and over again, which is really nice, because, as I said, it’s super slow, so that helps a lot.

00:23:46 Yeah.

00:23:47 And it’s starting to look like that little database that keeps track of what is done is looking very stable right now.

00:23:58 So I’m thinking maybe we can actually start running it in CI, as long as we keep the little database. It’s like an SQL Lite database that keeps track of this.

00:24:09 And if you just keep it between runs, it knows what it needs to check and what not to check. And it should be okay. I haven’t tried it yet.

00:24:17 Okay, right. Yeah. It’s not complicated to run.

00:24:22 You set up the configuration for it, of how to test it.

00:24:25 And if you’re just using pytest, you can literally do people, install mod one done.

00:24:35 That’s it. There’s nothing else, there’s no config, there’s no nothing, because you don’t need it.

00:24:43 If you can run pytest where you are, of course.

00:24:47 Okay, so if running the test suite just with pytest works, you can just do pipette or run and it works.

00:24:56 Yeah, cool.

00:24:58 And there’s like a nice, fairly nice little command line thing. But I try to explain very thoroughly what everything needs. So I have emojis for the different state. Like the killed section has little sort of a party pooper thing.

00:25:22 This is a good thing and a sad face for surviving mutants and stuff like that, because the nomenclature is fairly difficult.

00:25:33 Killed is good, survived is bad.

00:25:36 So the emojis actually help a lot.

00:25:39 Oh, nice. Yeah, I bet they would.

00:25:42 I care a lot about the UX a lot. I rewrote the front end, like the GUI, totally redesigned it, I don’t know, five times.

00:25:53 I really want it to be really good. So I have a little spinner that always spins, so you can see that it sort of gets stuck. Like it has to feel intuitive.

00:26:04 Sort of a good feeling in your stomach. Right. You should start it and run it. You see that it’s doing something. It’s counting up, it’s running, it’s doing something.

00:26:15 So it’s not like you start it and you see that it takes CPU and there’s no feedback. I hate that, especially if it’s going to take a long time.

00:26:24 I got a few projects I’d like to throw this at. The question helping you answer is which parts of the code you’re really not covering with your tests. Yeah, I assume it tells you the diff of like, this is what I changed. And your test didn’t catch it.

00:26:39 Exactly.

00:26:40 I’m going to assume that it’s taking longer than I want to wait for. So I get it set up on a project and just let it run for a while and then come back and I’m going to assume that I have problems and I have to fix those. Is there a workflow that I can work through and try to add tests and rerun it and then will it be faster for subsequent runs?

00:27:05 It will, yes. It’s a little bit naive. It’s like if your test suite, the entire hash of the entire test suite changes, it’s going to rerun all the surviving mutants. So that’s a bit naive.

00:27:24 But you can say I want to rerun this specific mutant, like with a number and it’s just going to try that one. You can also say rerun only this file which is also very handy. So you do run and then the name of the file. It’s just going to go through that file and not the entire thing, which is also extremely handy.

00:27:48 You’re saying okay, so I can either say go through the mutants. The mutants have numbers then and I can say rerun this mutant or I’m working on the test to cover a particular file. So I can say rerun the mutants for this file.

00:28:03 Exactly. You can also run a show and then the file name and you get the diffs for all the Newtons for that file, which is also very handy.

00:28:14 You probably know this is the part that’s really critical. You want to work with that first. So you just run that file first and you start working on that. What you want to do probably maybe is hit coverage first because if you have a line that’s not covered by definition, all the mutants will survive.

00:28:37 Yeah.

00:28:38 Because your code never reached it. You can do whatever you want with that line. You can write the total solids of Shakespeare in that line. Nobody cares.

00:28:46 Right.

00:28:49 So coverage is very important and you’re going to hit most of you’re going to hit probably faster and easier to find real bugs that way.

00:28:59 I’m really excited to try this on a project.

00:29:02 I really made a lot of effort trying to make it easy to just Pip install and run.

00:29:08 That’s my goal.

00:29:10 Thank you for telling us about telling us about it, but also just writing it in the first place. I think it’s a good thing to have in the Python community. So kudos.

00:29:20 Thank you.

00:29:25 Thank you, Anders, for talking to us about mutation testing. Thank you, PyCharm, for sponsoring the show. Try PyCharm yourself at testandcode.com. Pycharm shout out to electronicspecifierins podcast.

00:29:38 It’s a cool podcast.

00:29:39 All those links are in the show notes at testing co.com one, three, eight. That’s all for now. Go out and test something or maybe mutation test something.