86 - Teaching testing best practices with 4 testing maxims - Josh Peak

Josh Peak walks us through 4 maxims of developing software tests that help grow your confidence and proficiency at test writing. This is especially helpful for helping bring your team up to your level.

Transcript for episode 86 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

You’ve incorporated software testing into your coding practices and know from experience that it helps you get your stuff done faster with less headache. Awesome. Now your colleagues want in on that superpower and want to learn testing. How do you help them? That’s where Josh Peek is. He’s helping his team add testing to their workflow to boost their productivity. That’s what we’re going to talk about today on Test and Code. I’m your host, Brian Hawkin. And I’m Super excited to welcome a new sponsor for Test and Code, a company called Reagan. I reached out to them because their focus fits so well with what we’re trying to do here at Test and Code. So thank you to Regan for sponsoring this episode. Reagan builds tools for error crash and performance monitoring for web and mobile apps, including, of course, Python based web apps. Find out more at regan.com. And a little later in the show, we’ll talk about the super cool crash reporting.

Welcome to Test and Code, the podcast about software development, software testing, and Python.

I’m really excited. Today on Test and Code, we have Josh Peak. He’s been writing about pytest a lot, which is great and testing. So, Josh, welcome to the show.

Thanks, Brian.

Before we get started.

Tell everybody who you are.

So I’m Josh Peak. I’m a data scientist and software engineer and I work for commercial mining at the moment. So whatever I say is my own opinion. So Disclaimer out of the way.

Okay.

But yeah, so I’m working in the smart solutions section, so I do data mining on mining data, which is.

Yeah, I like it. Have you been with this company for a long time?

Almost a year to come out soon. But prior to that, I worked for a startup called Invoice to Go for about six years. I got some venture capital. I got to work in San Francisco for a little while. So that was a lot of fun coming from Australia, but then getting to work in Silicon Valley for a bit. So live in the dream and then work for a couple of defense companies for a while. And then another company called Cloud Sense, they do integrations between Heroku and Salesforce and all that invoice to Go and Cloud Sense that I got my biggest taste of doing testing practices.

Is data science something you’ve been doing the whole time, or is this something new recently?

It’s pretty new. So it’s only this year that I finished that qualification. It was when I was working at Invoice to Go. I was the first employee at that startup, and after about four years, I knew the entire data model. So when we got funding and they’re like, oh, we need a data team. Josh knows the most about the data model, so I became a data engineer. But then the first thing you do with data engineers, you got to meet your responsibilities of is data available? Is it clean? Is it secure. So doing those three things meant ETL system, which I was writing a lot of SQL as unit tests. So we’re using the CTEs or Common table expressions on Amazon Redshift to be able to write a whole heap of select fragments.

If I go and select a certain property out of this table to assert, are there no nulls, no missing values, no duplicates? These were common tests that if the CEO got the email at 06:00 A.m. And he’s like, can I trust this data? It’s like I already know that I can or cannot trust the data. So I was writing a lot of those manually. Then we got to the practice of how do we just automate these sequel statements?

You can do, like a select into, which means I can get those results and it’ll append those results to a new table. And that’s how we had our daily CI system for a data warehouse. And that’s where I got that taste of. If I automate these tests, I know overnight without being woken up at 03:00 A.m. That the CEO is going to get an accurate email in the morning. So that’s when I actually got over that initial hurdle. Oh, my God. Automated testing.

You won’t wake me up in the middle of the night. This is good.

One of the things we were going to talk about today is helping other people get into testing.

Yeah. So that first hurdle for me is, I guess, where a lot of my team are at the moment. They’re like, yes. People tell me that testing is a good thing. I just don’t know how to do it. It seems daunting I get over that first hurdle. So a lot of my team are mechanical engineers, and the company is like, all right, anyone with the keyboard concurred, right? Then they hired some software engineers, and they’re like, that’s what software engineers do, right? So just trying to put some engineering practices around, I guess the code as our products and how do we have the best practices around? Make sure it’s quality products at the end. And then we’re building data products and making sure that the data has testing around it to make sure that product that gets to our technicians on site, they’ve got reliable products around understanding what machine faults are in detecting anomalies with that. So trying to get the team to go from headless chickens to, oh, I don’t know how to test. I wrote a little blog article which is meant to capture the last few years of my understandings or like the checklist that I go through or how to get over that hurdle. And just here’s a series of steps. We can dive into those things. It’s four points.

What’s the name of this article?

I just call the article from Zero to Test, turning Hurdles into steps.

What do we have first?

Yes. So the first one I’ve got is how every test should be structured. This is just an opinion, but it’s a given when then framework. And I know you’ve pushed this in a really good book that you’ve recommended a few times. The podcast book from Prey Frog.

Yeah, that’s a good book. I like it.

But I’ve read the refactoring book by Martin Fowler and Cling code, and I think they mentioned it too. It’s just something that it’s a repeated pattern that a lot of people recommend to how to structure it. And I like how the when part, that’s the least amount of code. That’s the bit that’s under test. The given part is how do I set up the test? And there are all the fixtures and all the context and state that I need to set up my test. And when I went and studied the data science course during the statistics part, it’s interesting how all the things that I learnt there are mapping to a lot of this. What is a good test and all my assumptions and what’s my hypothesis? This is everything that I need to set up. And that’s why the given part is actually quite tricky in a test. How do I set that up? In a way that if that doesn’t work, that’s just an error. I don’t have correct assumptions about the test.

And if those assumptions are correct and I’ve got my given and fixtures set up correctly, then I can actually start doing that. When I do this code, then all of my assertions and these are the properties that I’m going to be testing on. So in statistics, you do like a T test or Nova, and these are properties of the data that you’re going to be testing on.

One of the things I see a lot with people new to testing is we get these big workflow tests and there’s doing a whole bunch of stuff and asserting all over the place, which you can’t really do if you’re following this given when then model. So have you had any pushback from anybody?

So I’ve got buy in from management to allocate some time for training. So actually this Friday we’re doing the first workshop on getting everyone to actually clone the repo of this library that we’re working through and trying to get test coverage up. And I’m going to walk them through. All right. There’s an existing test file that we’re going to delete and get them to write those tests to get the coverage off and walk them through that process. So I’ve had them actually read this article and so far they’re like, oh, yeah, looks good. So we’ll find out how in practice goes. But at least like that, laying it out in a format makes it less daunting.

Yeah.

It gets rid of the blank page problem of like, I don’t even know where to put my code.

Yeah. Where do I start? What do I even write? It’s just that scary blank page. Okay, now I’ve got a bit of framework to have to structure my narrative of what is a test.

Yeah. A lot of people have used the given Win then thing. I borrowed it from Behavior Driven Development. But we’re not talking about Gurkin tests, we’re talking about the straight old code tests, right?

Yeah. I think with the given part, that’s where tests can get very complicated, because that’s where you got the test doubles like mocks and stub, and that’s where it can get quite hairy very quickly in how you construct test.

Are you utilizing marks within your testing?

Yes.

So one of the other articles that I wrote about was Advanced Python testing. So we had some flaky tests, and that’s because they’re just sending a test off to our database, which had a lot of broken data, and it was very intermittent. It’s like that’s not a good test. So I started using a thing called Snapshot Testing and framework called Vcrpy, where it will record the request request to a local YAML file and then replay that. So a testing is faster, but B it’s repeatable.

Okay.

And with that, I was actually able to get one of our tests that worked perfectly on my machine. It was failing on CI. Like why? So I was able to replay the difference between the request that was being composed and what it was expecting. And it turns out we had a time zone issue. Depending where we ran the code, it composed the request differently based on the system time zone. So I had to pick up a bug where that should have been UTC every time. So quite grateful that we actually put that in. That’s just a sort of bug that the request worked, but it wasn’t the right response back. Okay, cool.

This episode of Testing Code is sponsored by Regan. Reagan has awesome tools to help you monitor your application for errors, crashes, bottlenecks, and performance. But let’s Zoom in on crash reporting. Whether your app is a side project or something your company and your customers depend on every day, you want it to be a good experience for your users. It takes just a few minutes to set up with simple code snippets to get you up and running quickly. Drop in the code snippets and push it to production. That’s it. Now Regan has your back. With smart Python error monitoring software, you’ll now be alerted to issues affecting your customers the second they happen, and Reagan’s Dashboard and tools help you diagnose, debug, and fix those errors fast. Even though Reagan captures every single error, you will not be bombarded with a bunch of duplicate reports. It offers smart alerts to notify you when you need to take action and gives you the controls to filter even more so you’ll be able to see the problems that are the highest priority and affecting the largest amount of users. Take control of your app monitoring with Reagan, check them out at reagan.com. That’s Raygun.com.

What other techniques you got for people.

So the next one we call Happy Paths and Sad Parts and actually learnt this from Salesforce. So whenever you’re writing any code for Salesforce, they’ve got their own version of Java called Apex and you never have access to the compiler. You have to send it to their servers, they’ll compile it, but you also have to send tests to their server and they recommend that you do exactly this.

What is the happy path for your code? You write a test for that and then you have some sad paths. You break it and they recommend give it one data point, give it a thousand data points, give it something that’s intentionally going to break. And these are just the obvious things and that actually gets you a good amount of test coverage just doing some obvious things. They also recommend like a security checklist, like you would put locks on your house. You don’t leave the key under the doormat or just in the lock itself. So putting a basic checklist of all right, I’m going to test with these things. You get a lot of code coverage just from doing these happy paths and sad paths. Once you’ve got that framework set up when you have a bug in production, all right, you go and reproduce that. That was an untested sad path. I can just add that scenario to my list of sad paths and I’m going to catch that every time. So I like the idea of this. It’s just what’s my checklist to make sure that everything is working well and failing as expected.

I do like the idea of separating the sad paths, whether they’re actually in a different file or directory or just named something that people can see right away, that this is testing in unexpected bad behavior or something. I think that’s good.

That’s a great idea. I like that. I can actually use that. So what I like doing is pushing my team to think, oh, hey, here’s the happy path. If we haven’t written enough documentation yet, I’m using the happy path to test coders. Here’s how you use this part of the library. So I like that separation because I don’t want to show them. Here’s a sad path, here’s how you meant to use it to break it.

Yeah.

Well, I mean, it doesn’t always work. So sometimes I’ve had a test where there’s lots of different input and I want to just test like, for instance, a bounce checking. I want to just go if there’s a range of good values, I want to go just outside those ranges and check those. And in those cases, I think it’s good to keep it just right with the other tests. But for instance, I’m working on a package where if you’re not connected to the database, most of the other commands should just report an error. So keeping all of the testing, all of the different interactions to make sure that they behave correctly. If the database isn’t set up, that makes sense to just throw in one test instead of having it spread all over the place.

Yeah, that makes sense. Cool.

Now, I am very interested in this next topic. I’m curious about this inner outer thing.

I wish I had the person that mentioned this. This was a thing that was on Twitter, or it might have even been a guess that you or Michael had on one of the podcasts. But someone mentioned about the inner loop and outer loop of testing, where if I just run Pi test, that’s going to run everything under the sun, and that’s going to take a very long time, and that’s not a really good cadence to be all right. If I make a change and then I’m waiting 510 minutes for the full test suite to finish, that doesn’t make sense.

So what you do is you would have your inner loop of I’m trying to work very focused on one piece of code leveraging. The pytest has a lot of filters out of the box to narrow down to just a test file or a class A test suite or a specific test. So I’m just running that one test, and that can be seconds. And having a really fast feedback loop of that one test that you’re working on will get fast feedback. That’s amazing. But then once you’ve got that working, that’s where you need it to be. Then you pop back out and say, I’m going to run the full test suite in context, and you’ve got those two different layers. It’s like Google Maps. If you’re looking at the suburb, you kind of lose context of what the country looks like. So you need to Zoom out a bit and then make sure that all right. This suburb actually fits in with the context of the city. So written a whole heap of different things that pytest can do to have what’s that really fast in a loop. And then I’ve written a fair bit of about how to have your outer loop, but then also how to speed up the outer loop. And our test suite at work, it was taking about 20 minutes to do the full run through, and it was a bit flaky as well. So we had to have the test running in a very specific order.

So we had quite a few tests that I couldn’t use. The Pi test exist where it would run things in parallel because it required certain ones to be like this test and this test. So I had leakage of state and the fixtures across tests, so I had to go and fix that up. But once it did, we got it down from 20 minutes to four minutes, which I’m pretty happy with. And then I started putting that snapshot testing where it wasn’t waiting on network. Io, which can take 20 to 30 seconds per test. And by having that snapshot that it plays back against locally. That became milliseconds per test. So I’ve got it down to about two minutes from 20 minutes. So that’s pretty good improvement for that’s. Our outer loop and the inner loop on any test is quite responsive as well. So people see by example, oh, hey, this testing stuff is actually worthwhile and it’s not this flaky thing that we can’t trust, it’s just we didn’t know how to maintain it. And I think that’s part of the tension with teams where it’s a good idea, but when it’s not maintained, people are like, oh, that was a waste of time and they lose sight of the value of it. I think it’s these little things where it’s like the time cost really adds up.

And I really like your comment of that. It’s a focus testing versus coffee break testing. When you’re focused on one problem, you’re just focused on the test that’s failing or one little bit of it that you’re working on. And then when you think it’s pretty good, you can just let the two minutes run and go grab a coffee or whatever and then come back. I know pytest has a lot of built in stuff, but I heavily utilized Editors. Both PyCharm and Vs code have the ability to say, I just want to run this function and then I want to run all the tests within this file and then of course just run everything at the top. So it’s a good workflow. I like it.

Yeah. So you just zooming out a little bit by a little bit. I’ve had an issue recently where I was focusing on one test worked perfectly. I updated part of the API layer, but then at the top level we had a test that was saying, here’s what I expect the API layer to look like. And then when I popped out to the outer loop that started failing because of user plugin called Pi test randomly. So I don’t have the same order every time. And when I was doing my outer loop one time it worked. The second time it didn’t. And that’s due to the order of when things were imported for the base level for the library. Okay. I’m glad that I was running it randomly because I would never have picked up that import order. So it’s good that I’m getting like a random sample every time.

Yeah, randomly is great.

I don’t think people should feel bad if they’re unable to currently run their test suite in randomly because of all the failures. It’s a good thick goal to get to, but you’re not alone if you’re having trouble with that. We got three out of four so far. What’s the last one we got?

So this was an interesting one where I was trying to get the code coverage to at least 75% because that’s what Salesforce is, mandating. What I was working with clubs, and I think that’s a good baseline to get everyone to go from. I think the existing code coverage was like 40% just from that’s when I started. That’s all it was there. I’m glad that a they had a test suite. There were some tests, but it was 40% most of the code. I have no idea if it works. If I make a change, that’s kind of scary. So if we can get it to at least three out of four lines of code are automatically tested every time. Amazing.

So I was looking at one of these classes and what lines weren’t covered, and a lot of the lines that weren’t covered were identical and they were just repeatedly copied and pasted from function to function. And like, hold on, I could write more tests or I could refactor this. So the first thing that is explained in the refactoring book from Martin Fowler is extract method. And I think PyCharm has a lot of really good refactoring tools built in where it’s like you can highlight a piece of code and go extract method and it’ll take that and create a new function. It realizes what variables are in that code block that come in and what gets modified after that code block. So it’ll go and add those to the parameter list of the function, pulls it out. All right. Now I’ve got the name of that function, and then everywhere that was repeated, I just replaced. I think it was like eight lines of code with one. And then I had a lot less lines of code that needed to be tested. I knew that one, which was critical to all of them, had been tested by some existing tests.

Okay.

So I managed to get code coverage up by deleting a lot of code, and that was quite an insightful thing. Oh, my God. I don’t have to write more tests. I can just write less code that needs to be tested because there’s no bugs in zero code.

Yes. I love that example of you can increase coverage by removing code.

Quite happy with that one. It’s a good sort of trying to motivate people. Hey, refactor your code a little bit. Actually, one little trick and improve your test coverage. Yeah.

Now I have some sad news for everybody and myself, and that is that I have to run and I have only scratched the surface of all of the value and all of the cool testing advice that I’m sure we can get out of Josh Peak. So I am going to twist your arm and say, I really hope that you’re willing to come on the show again. I really enjoyed Go walking through this article, and I think we can all learn a lot from you. So are you game for coming back on?

Yeah, sure. It’d be great. I can actually probably follow up after I’ve gone through the workshop with my team. We can touch base again on that. How well it went.

I think that’s a great idea to even check in with you as this goes on to see how things are going with it. Thanks a lot and I’ll enjoy having you on again.

Thanks LeBron.

Thanks to Josh for writing these awesome articles about pytest and testing, for spearheading the testing efforts at work and for talking to us today. I look forward to keeping in touch with you and finding out how it’s going in the future. Thank you also to Patreon supporters for continuing to support the show. Join them by going to Testincode. Comsupport and thank you so much to Regan for sponsoring this episode. Reagan builds tools for error crash and performance monitoring for web and mobile apps, including of course, Python based web apps. Find out more at raygun.com. That link is also in the show notes at testandcode.com 86 along with links to articles from Josh and the plugins he talked about. That’s all for now. Now go out and test something or maybe try to help the others on your team to get up to your level of testing. Awesomeness.