107 - Property Based Testing in Python with Hypothesis - Alexander Hultnér

Hypothesis is the Python tool used for property based testing. Hypothesis claims to combine “human understanding of your problem domain with machine intelligence to improve the quality of your testing process while spending less time writing tests.” Alexander Hultnér introduces us to property based testing in Python with Hypothesis.

Transcript for episode 107 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

00:00:00 Hypothesis is the Python tool used for property based testing. Hypothesis claims to combine human understanding of your problem domain with machine intelligence to improve the quality of your testing process while spending less time writing tests. In this episode, Alexander Holtner introduces us to property based testing in Python with Hypothesis.

00:00:36 Welcome to Test and Code, a podcast about software development, software testing, and Python.

00:00:47 Welcome to Test and Code. Today I’ve got Alexander Holtner or actually, are you from Sweden?

00:00:54 Yeah, I’m in Sweden now, so it’s dark where I’m sitting. It’s pitch black outside.

00:00:59 Oh, really?

00:01:00 Yeah.

00:01:00 What time of the day is it?

00:01:02 It’s eight in the evening, so it’s not that late, but during the winters it gets stuck really early.

00:01:07 Okay, well, I appreciate you joining us. You pronounce your name really cool. Holder is how it looks to me, but what’s the Swedish pronunciation?

00:01:16 Alexander Holmes.

00:01:18 Okay, I’m not even going to try that, but that’s really cool. I like it. I came across your name and you because you’ve talked about property based testing. Who are you? What do you do and how did you get into property based testing?

00:01:30 Yeah, sure. So my name is Alexander. As I said, I run my own company, so I do some consulting in Python and I work with a lot of different companies.

00:01:42 And I would say that I’m a born hacker. I always hacked with electronics ever since I was a little child. So as soon as I got my hands on a computer, I wanted to see what I could do with code. So I’ve been coding for as long as I can remember, far before we had the Internet or Internet existed, but we didn’t have access to it. So I went to the library and borrowed whatever books I could find on programming, usually on assembly or C back then. So I learned programming that way.

00:02:17 Do you remember what computer you were using?

00:02:19 Oh yeah, AMDK 62. I had overclocked it using a pencil to bridge so the front side bus speed could be increased. So it was like 50% overclock or something like that with, I think like 32 Megs of Ram and one gig of hard drive.

00:02:39 Okay. Wow, that’s cool.

00:02:41 Yeah, some time ago now, but yeah. So that’s how I got started. Computers and electronics and all those things always interest me. But when it comes to property based testing, I didn’t find out about it until I was in University. And I had a professor named John Yu who is kind of the father of property based testing. He wrote the Quick Check Library for Erlang and Haskell. So we had him in the Haskell course teaching us about property based testing.

00:03:16 And at that point I had already had another company prior to that and I had worked as a programmer before I started uni. So I was really intrigued when I saw property based testing because I saw it kind of like taking automation of testing to the next level. Automating creation of your test cases. Okay, so I really liked it. And of course there wasn’t a lot of libraries for Python until a few years back. I remember when I first looked, there was one named Python Quick Check, which was a copy of the Quick Check library implementing a subset of the features. But then hypothesis came along, I think in 2016 or something like that. And it blew me away because it’s a really, really good library for property based testing. It even does some things better than Quick Check, especially how it handles what’s called shrinking, which is where it finds the minimal test case that can reproduce the given error. And it does that in a smarter way than Quick Check does, where it will always produce the same error, while Quick Check can actually, if you have two errors, it can reduce into another error.

00:04:36 Okay.

00:04:40 Thank you PyCharm for sponsoring this episode. I’m working on a few new projects and I’m so glad PyCharm is there by my side when I think I am ready to commit and push some code. I love the seamless Get integration, the ability to revert some files, commit some files, add files that aren’t tracked yet, add some files to the Get, ignore, and even do partial commits of files leaving out debug statements and whatnot all within one interface is super cool and super intuitive. I even use it for non Python files. Of course, the integrated testing support is unsurpassed. I really like to put some code in test functions just to try it out instead of using the REPL, even though PyCharm has the built in reply. It’s just so easy to have a bunch of snippets of code and different test functions in a file and just be able to run one at a time. I even do this just to play around with new data structures or APIs. Pycharm saves me time in so many ways. Find out how it will save you time. Go to testandcode.com PyCharm and try out PyCharm for four months before deciding if you want to stick with the pro or use the community.

00:05:44 Let’s pretend we’ve got an extra person in the room and they’ve never even heard of property based testing.

00:05:51 How do you describe what it is?

00:05:53 That’s something I tried to really stress in my talk that it’s not too hard to get started with it. The concept is really simple that you have a function that takes inputs, and based on those inputs it has some rules like this is the way the function is allowed to behave. So just like we normally have asserts, instead you have more general asserts, more defining the behavior. For instance, saying given this kind of input, this function should return a value integrated value, and if it returns something else, we have an error. And then you can have different levels of these properties. But the most simple case is mainly just adding some input arguments to your normal pipe test, and then just adding a decorator saying given integers, for instance. And then it will just generate lots of lots of integers. It will test everything from super small negative numbers to super high positive, and it will price zero and everything in between. And if it finds an error, it will try to minimize the input. So for instance, if there is an error at minus five, but it starts testing -5000 then it will try, okay, something in between, maybe -2000 and there’s still an error, and then it tests minus one and there’s no error and it will search for the point where it breaks, which also makes finding the bug much easier.

00:07:32 Okay, so there’s a lot of things that come to mind. When I first got played around with things like property based testing, it wasn’t around traditional testing of behavior. I used them for things like checking for a range of input. So I’ve got a function that takes certain kinds of input. I just really want to throw everything at it and make sure it doesn’t crash so you can do stuff like that. The other side of trying to figure out the tests are different, though, so it’s easier, at least for me. It’s easier to think about a concrete example to know if I have a certain amount of certain input. I know what my output is going to be. I can calculate on the side. You have to think a little differently with a property based test because we don’t know what the input is going to be, right?

00:08:18 Yeah, exactly. So for instance, maybe if you have something adding up numbers, you could have a property saying that the addition of two numbers should always be larger. Any one of the individual input numbers that could be a property or add.

00:08:37 Yeah, it’s a different kind of a thing. Do you think it’s appropriate to use in conjunction with other testing? Or can property based testing just completely replace all of your tests?

00:08:48 I would say use a mix of property based testing and normal testing.

00:08:54 Actually, earlier today I put a small example together where I just made a fast API application.

00:09:04 I just used the sample they have on their Start page, and then I use the library called Schema Thesis, which is the strategy for Hypothesis, which takes the Swagger Open API specification, and then it generates the tests based on those. So I could run thousands of tests against these free endpoints and basically see if it conforms to the specification.

00:09:32 Those kind of places where you have a specification and you just want to make sure that the specification actually holds true, then it’s very easy to do a property test. Or another example I’ve had is where you have an input validator that you can run separately, and then you can run the entire back end all the way down to the database, and then you could write a property where you run the input validator and see if that generates an exception. If it does, it should always generate the same exception. If you run the full endpoint, and if the full endpoint generates an exception but not the command validator, then you’ve missed something in between there.

00:10:16 Okay, so if you’re running thousands of tests, you want to make sure they’re fast then, right?

00:10:21 Yeah, but the really nice thing about hypothesis is that mainly when I use it, I run maybe 100 or ten tests in development. But then in my CI system I can just change a parameter configuration saying run 1000 tests or 10,000 tests on Blue request or on the nightly run 100,000 or whatever.

00:10:49 Oh, that’s cool. So you can change that for different environments.

00:10:53 Exactly. And one really nice thing is that it caches all your failures so you get automatic regression testing, because if you ever had a bug, it will test that again and again as long as you have the hypothesis cache so it will remember the failures.

00:11:10 So if I’m working on a CI system, I might recreate my system every night. Is there a way to save the cash?

00:11:17 Yeah. Then you would have to store the cash folder separately. Okay, so how I would do it is I upload at the end of the test my cache somewhere and just download it at the start of the test.

00:11:30 Okay, so it’s all saved in one place where I can turn around. Okay.

00:11:34 For local testing, I usually just keep my own local cache, and all other developers do the same thing, basically. Okay, so you don’t really have to do anything. As long as you’re not clearing the folder. You automatically get this behavior for free.

00:11:49 Having a mix of tests. Are there particular parts of the testing process that you think are better suited for property based tests?

00:11:57 Yeah, one example that’s really good is if you want to refactor something, or if you want to optimize something.

00:12:05 If you need better performance or something, then you have actually unknown working copy of your code, and then you can use that as something we call an Oracle, a source of truth. So what you could do is then run your new function, compare it with the results of the old function that’s slow, or if you have a known bad native implementation, but that actually does the right thing, you could use that as your baseline and just compare the new function with that. So those kinds of things are really powerful because you will see that you actually always have the same behavior in your new function.

00:12:43 So is there a way to seed hypothesis to come up with the same input in two different runs?

00:12:49 But you would run both functions in the same run, so you would have two copies of the function you would have, for instance, some function slow and some function Fast, for instance, while you’re working on it, and then you could compare the results of those two.

00:13:06 Okay, so that would be the test that you’re running is calling the two functions and comparing the output.

00:13:12 Yeah, exactly. And another nice use case I’ve had lately is that if you have a Python two implementation of something, you can do a foreign function, call Alt to the Python two function. And when you put it to Python three, actually ensure that you have the same behavior in your new Python implementation. So when you’re porting from Python Two, you have your old function and you compare the test results with the old function and the new function.

00:13:42 You can call another interpreter.

00:13:44 Well, you would have to use call out using foreign function interface. You would have to do an Fi, but it’s possible.

00:13:53 Okay, that would be neat.

00:13:55 You mentioned that you can even use hypothesis with Schema and to be able to use this against Rest API or something.

00:14:03 Yeah, exactly.

00:14:04 That’s pretty cool.

00:14:05 Yeah. There used to be one named Swagger Conformance Testing. It still exists, but that one has lagged behind on support, so it doesn’t support the latest Open API standard. But there’s a new one around called Schema thesesis. I can give you the link afterwards.

00:14:25 Yeah, please.

00:14:26 It’s really nice. It’s building on all the IDs from the old Swagger conformance library, but just improving on it, making it even better. So it’s got built in parallelization, so I can let it run with 16 workers in parallel. And I can tell it how many examples I want per test. And overboard, I want it.

00:14:51 It’s really fun because I just tried it earlier today and I just watched the log from the development server for my Fast API app. And I started this Schema thesis test and I hadn’t written a line of tests, and I could just see how it hit API thousands of times with Chinese characters and foreign languages and weird numbers and all those things you want to try but probably don’t think about in all your tests. So you can really visualize how I tested a lot of things just by running that simple command.

00:15:33 Oh, nice. That’s pretty cool. And I mean, the Swagger system is a pretty neat way to expose an API, too.

00:15:39 Yeah.

00:15:40 In your consulting work, then what kind of projects do you normally take on?

00:15:44 Are you normally working in web based projects or mainly software as a service companies and IoT? It’s my two main areas. Okay.

00:15:57 And I work with everything from big enterprises with 50,000 employees to small companies with ten or 20 employees.

00:16:09 Okay.

00:16:11 Do you have a framework that you normally work with, like Flask or Django or anything in particular, or are you all over?

00:16:18 I prefer Flask and Fast API. Lately, I usually work with these micro frameworks is my go to database when it comes to relational data, but I’ve worked with pretty much everything under the sun. I usually adapt to what the client needs and what they prefer.

00:16:40 When I’m using a hypothesis or property based test in Python and we’re just going to call those do the same. I don’t know if there’s another property based system that’s caught up with hypothesis. I’m pretty impressed with where it’s at now, so I’ve got like say a Django app or something running with a Postgres database back end, maybe.

00:17:01 Am I going to want all of that stuff running with my hypothesis test, or are we going to try to stab out the database or do you have recommendations around that?

00:17:10 I would say think about it like you think about your normal testing. You want to test everything together, because then you actually see that everything works all the way. But sometimes you also want to test isolated parts. But from my preference, I prefer to first try to test the entire system integrated before I go in and mock everything, because otherwise I’ve seen many cases where the mocks kind of fool you into thinking that everything works, but then when you connect all the parts together, sometimes it fails in the seams if you understand what I’m getting at.

00:17:52 But I would say use the strategy you usually use. I mean, have a mix, some integration, some EndToEnd, some units. And of course, if you have like a decode function or encode function, you’re probably easier just try those out without database or anything else. But if you want to focus on what’s most value or bang for their buck, I prefer to test the endpoints and have them connected all the way back to the database and just see if anything froze an error somewhere.

00:18:31 Okay, the database testing and stuff it’s testing with an entire system with a database has gotten good enough that you really can do all of these thousands of hypothesis tests against a semi live system. You probably wouldn’t want to do it against your live system, but you can have a test environment.

00:18:51 Yeah, I’ve even run hundreds of thousands of tests against Microsoft SQL Server on Azure using property based testing on their Azure pipelines. And of course it will take a bit more time. But that’s also why maybe you have a fewer amount of tests in your local instance, and then on your nightly build or weekly or whatever you have, you can have a much higher amount and then also have a higher degree of security in what you have there in your experience.

00:19:26 Are those longer ones finding things?

00:19:28 I found those obscure, really hard to pinpoint bugs those ways.

00:19:35 I mean, usually when you see the bugs, they are quite apparent, but not always. I’ve had some really strange time based bugs that are really hard to understand, and you just have to read up a lot on weird time stuff. But other than that, it’s usually quite easy to understand when you see it. But yeah, sometimes you really need to test a lot of data to find this obscure box, but your most value will always be from the first, maybe 100 or a couple of hundred tests. So starting with that and running that in your development environment will probably be good enough. And it’s probably much more than what you’ve had previously if you have hard coded test cases.

00:20:25 And now for normal test flows, I try to teach where possible to structure a test case with a given when then or a range act assert. Does that apply to a property based test as well? Or are there other rules of thumb that people need to think about?

00:20:44 There are other rules, and there is a really good article that’s actually written for F sharp for FS check, but it applies to all types of property based testing. I can give you the link afterwards as well, but it explains how you should think about your properties.

00:21:04 But I would say just try it out, start to play around with it, and you will get a feeling for how it works. And you can definitely structure.

00:21:18 But you would have the given part in your decorator instead. So you would have given a list of integers and a string and that would be a decorator. And then you have your function and which takes those as arguments.

00:21:35 Then you would have to, of course have your asserts more generic because of course you don’t know the exact input. So you would have to think a little bit more about how it behaves. For instance, it should not throw an exception. Maybe that could be one thing, or it should return a number instead of saying it should return 56, or it should return an object of this type or something along those lines.

00:22:09 One of the things that I’m getting excited about, so I haven’t really put hypothesis to use much, but I’m thinking about trying to use it more. And one of the things I’m excited about is just this thought process is beneficial to both testing a system and to developing a system. If you’re doing this while you’re developing a system because you kind of have to think about splitting up your test cases into behavior cases to where you can kind of say if a particular interaction, like I don’t know, something concrete, like I’m pulling something out of a set. So pulling a person out of a list of people or something, there’s going to be cases where there are errors. So I have to set up a scenario where the person doesn’t exist and I’m trying to pull out information that’s not there. So I have to test that behavior. I’ve got cases where there’s duplicates, I’ve got cases where there’s all the different kinds of behaviors that might exist around an interaction. It really doesn’t depend on what the actual data is. I need to be able to set that up and then check for the behavior. It’s a different way to think about a system, and I’m hoping that that will help people actually test all the think about behaviors instead of thinking about specific individual test flows, which isn’t really that helpful.

00:23:36 Exactly.

00:23:37 And that’s also one point I want to often bring up is that usually when you write your tests, it’s very easy to just model the happy path cases. So you know this will work, because that’s how it’s supposed to work. But it’s very hard to think about all the tricky cases and all the undefined behavior of your application.

00:24:04 And it’s very easy to miss stuff because we’re humans and machines are better at testing lots of things than we are.

00:24:12 Yeah. My first thought of property based testing was just that it was an easier way to do, like, weird case scenarios, like the normal things like you said of sending in different languages, making sure that unit code worked correctly, that if you empty strings and really long strings and things like that. And it’s way more than that. That’s useful. But there’s more going on.

00:24:40 Yeah. I don’t remember who said this, but I think it was Hilar Wayne who said it. He said that we have type with type annotations, and those are really good because they test the structure of our code, but what they don’t test is the behavior of the code. So it doesn’t matter how much types you have, because the type systems aren’t expressive enough to explain all the behaviors of our system. And we already have a very good language for explaining behavior, and that’s our programming language. So by writing property based tests, we actually define the behavior more formally as well.

00:25:23 I want to link to that talk that you gave recently, because I think that was probably one of my favorite recent introductions to hypothesis. I think you did a really good job.

00:25:33 Thank you. Yeah, sure.

00:25:35 Are there any other good resources for people to get started with hypothesis that you think of?

00:25:42 I do link to a lot of resources in the slides of that talk. Okay.

00:25:49 And I have a GitHub repository where I have all the links as well, so that’s a really good starting point if you just want everything in one place. And there is a link to the F Sharp article about designing your properties as well, which I’ve talked about earlier. And there is a link to the talk by Hillary Wayne, and I don’t remember who it was. It was someone else who made a talk about it when it was quite new a couple of years back in Python. Matt Bachmann.

00:26:22 Okay.

00:26:22 I recommend just looking at the links from my slides or from my GitHub repository for that talk. Watch those talks, read articles, link and read the hypothesis docs as well, because those are really good. They have really good quick start guides.

00:26:40 There’s a new thing in Hypothesis that’s really nice to get started with as well. It’s able to infer the types from your type annotations now. So if you have a type annotated test function with arguments, it can automatically infer what data to give it. And I know I even tested it out with data classes referencing other data classes, and it managed to automatically create those classes with the reference classes and everything. So that in peril was really nice as well.

00:27:15 That’s cool.

00:27:16 Yeah, that’s really cool. But the one thing I want to warn people about when using inference is that you don’t restrict your input in any way, which means that you will probably need a lot more test cases before you find the tricky things. Because usually when you write your hypothesis decorators, you start with them in a very general way. But then you can find some things where you know that, okay, this should always be a positive integer and maybe this list should always be around this size and stuff like those things. It really makes it faster to test a lot of cases.

00:28:00 Do you know about fuzzing?

00:28:02 Yeah, a little bit about fuzz testing.

00:28:04 Yeah, exactly. So for those who don’t know, it basically means that you just randomize a lot of input and throw it at the program and see what happens. And if it crashes, you might have a bug or you probably have a bug. And it’s been utilized a lot in security sensitive applications to test, for instance, curl and those kinds of things. That dependent on a lot of important stuff. So what’s really interesting, though, is that passing is very similar to property based testing, but it’s more like a brute force approach where you test everything and see what happens. And with property based testing, you can more fine tune your passing. Kind of say, I want to test a lot of stuff, but I want to do it within this given sorts of rules. It’s kind of like you write the law for how to run the tests, but the computer will do everything it can given the laws of written.

00:29:11 I like that also, because it’s a pragmatic approach to software development as well. Parts of the system just to be able to say this function just doesn’t handle negative numbers, so we have to fix that. Well, you might have to fix it, or you can just say, okay, it doesn’t handle negative numbers. Let’s just make sure we don’t send negative numbers into it. And as long as as a system, you can say this part of the system won’t handle these cases, you don’t have to go through and test it. So it’s very cool. That hypothesis has that ability to say all functions that take numbers don’t have to handle all numbers. They might have to take a certain range, but then on the engineering team and the testers need to know, hopefully they’re the same people. Sometimes they’re different, but they have to be able to understand and be specific about what range of inputs are allowed to different parts of the system. Having to be clear about that is actually a good thing. So I think that’s great. I’m glad you brought it up, because I haven’t thought about fuzz testing for a long time. And it does sound now that we talk about it throwing random stuff, it does sound like that’s very similar to property based testing, or at least similar to hypothesis. And I think that’s where the mistake I had was. I thought hypothesis was a fuzz testing tool, and that’s not the intent, right?

00:30:32 Yeah, exactly. It’s a much more structured approach where you can model your world much more efficiently. So it’s kind of like thinking that passing is the brute force approach and using property based testing is the more well defined algorithm.

00:30:55 There’s some realms in domains where the specific answer is very hard to come by anyway. And so being getting used to these kinds of test techniques, I think are good. I’m thinking about environments where there’s a lot of noise in the system for some reason, where you can’t have even if you send in the same data, you will get slightly different data out. That sounds horrifying to some people to think about that. It sounds like unpredictable behavior, but it’s not with real environment. So if you’re working with RF equipment or electronics or actual, like, sound, you get a different answer. But you can still test for it. You can still test that things are in a certain range. If I send in an input for the right frequency at a certain level, I know there’s loss in the system, and my answer should be in these ranges. It’s a more difficult way to test, but it is definitely possible. And I appreciate these tools being around to be able to help us think about that.

00:31:57 That’s very interesting that you bring that up, because you probably know about Erickson, which is a large Swedish telecommunications company. Yes. So I live in the Ericsson city in Gothenburg, and actually quick check, the grandfather of property based testing, was used heavily at Ericsson to test their telecommunications and radio systems. And they used it to find some really hard bugs in the telephone switching systems and stuff that Ericsson engineers have been tearing their heads apart for a long time, trying to find young youth who wrote the library came there and helped them find it in a very short amount of time by just writing some property based tests for them.

00:32:51 That’s so cool. Nice. I’m trying to think about how I can use this with Indian system with RF systems. I know that it’s going to be possible that the times are a little I think I guess I just need to do a test run. I’m a little worried about the times because there’s one thing to have a fast database, but there’s another thing to have a fast system where you have settling times and test equipment that you have to deal with.

00:33:14 I work a lot with hardware. I used to build industrial greenhouse systems for vertical farming and stuff like that, and we used pipe and air and we had our own radio control equipment as well, and we had lots of sensors and I mean, a modern greenhouse is basically like a factory. It’s very optimistic. It’s a lot of systems. It’s a lot of sensors, a lot of dynamics is probably more dynamic than most factories because you have to account for your level of humidity in the air and you have to account for sunlight. If you have glass panels or if you have an isolated environment, you have to regulate the light of your fixtures and you need to keep track of the stress levels of your plans. There is a lot of things too very much dynamics to keep track of windows systems.

00:34:17 Okay. I want to kind of wrap this up. I really appreciate you talking with us about property based testing. We’ll do a whole bunch of links in the show notes, so we say it out loud. If people want to know more about you, is there a good place for them to go?

00:34:32 Yeah, sure. So if they want to know, contact me or my company. The best place to go is Holkner Se. And if they want to get in touch with me personally, I would recommend Twitter. I’m a hostner at Twitter.

00:34:49 Okay, cool. Definitely. We’ll include those in the show notes as well, but some people read them.

00:34:55 Yeah.

00:34:57 One more Thing. I just remembered I actually started making a course based on Property based testing.

00:35:05 I haven’t announced it yet, but if you want, I could just add a quick Google form or something if people want to sign up, if someone is interested.

00:35:16 Oh, yeah, I’m interested.

00:35:19 Yeah. So then I’ll just create a quick Google form and whoever can sign up who’s interested and I will send an email when the course is ready, or at least when there is enough to start showcasing it.

00:35:34 Wonderful. Thanks. Cool. And thanks for coming on the show.

00:35:37 Yeah, thank you.

00:35:42 Thank you, Alexander. I’m more excited than ever to use hypothesis and propertybased Testing. Thank you so much to Patreon supporters for supporting the show. Join them by going to testincode.com support and thank you, Pie Charm, for sponsoring this Episode. The link to the extended pro trial is at testincode.com PyCharm. That link is also in the show notes at testingcom 107. That’s all for now. Go ahead, go out and test something.