All test suites start fast. But as you grow your set of tests, each test adds a little bit of time to the suite. What can you do about it to keep test suites fast? Some things, like parallelization, are applicable to many domains. What about, for instance, Django applications? Well, Adam Johnson has thought about it a lot, and is here to tell us how we can speed up our Django test suites.


Transcript for episode 135 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


00:00:00 All test suites start fast, but as you grow your set of tests, each test adds a little bit of time to the suite. What can you do about it? To keep your test suites fast, some things like parallelization, are applicable to many domains. What about, for instance, Django applications? Well, Adam Johnson has thought about it a lot and is here to tell us how we can speed up our Django test suites.

00:00:35 Welcome to Test and Code.

00:00:46 I am thrilled again to welcome back Adam Johnson. Hello.

00:00:50 Thank you for having me back.

00:00:51 So if people for some reason don’t remember who you are, a contributor to Django, heavily involved with the Django community and anything else that we forgot.

00:01:01 I like tea and drama based music.

00:01:03 Yeah. Okay. And tea. What kind of tea do you like?

00:01:06 Right now I’m drinking a Japanese tea called cookie Cha, which is actually made from the twigs of the tea Bush. Really low caffeine evening drink that has some caffeine, like a 10th of the caffeine of normal tea.

00:01:17 Okay. I’m going to have to look that one up. I’ve never heard of that. And I kind of thought I knew everything about tea.

00:01:24 I haven’t heard of it until a few months ago. And now it says, quickly become a favorite.

00:01:28 Okay, I’ll try that out. I got into Oolongs many years ago. Anyway, we were going to talk about speeding up Django tests.

00:01:36 Yes, indeed. That’s the title of my book. Speed Up Your Django Test.

00:01:40 Yes. You wrote a book on it. You care about so much about this that you wrote a book on it.

00:01:44 Maybe I also just know so much about such a narrow problem.

00:01:49 Yeah. Okay. So let’s talk about that. You’ve worked with Django a lot. You do that, you said, remind me again, do you do that professionally? Do you work with Django and consulting with companies?

00:01:58 And so, yeah, I’m like a solo consultant with my company, AWS Adams Web Services. And working.

00:02:06 How clever of you. Aws.

00:02:11 I always write it with dots. They can’t sue me because they’re a dots.

00:02:13 Right, Adam’s web service. So you work with Django a lot and speeding up Django test. In your experience, are there a lot of Django suites that are slower than they need to be then?

00:02:26 I think, yeah, it’s a bit of a boiling frog problem. Right.

00:02:31 As you’re working on a project, you add more features, the test speed slowly increases, maybe beyond linearly.

00:02:40 It might be some fraction times the number of tests you have. And that fraction increases over time as you add more features. Perhaps it’s like an end squared style problem. Even so, companies can very quickly find themselves with a half hour test suite that really drags out any kind of development.

00:02:58 Okay.

00:03:00 I’d be thrilled with a half an hour test suite.

00:03:03 But half an hour.

00:03:05 Yeah. And then it just grows from there. So with Django and software, it seems like more of a pure software thing. So I say that because the test suites I’m working with, the reason why they’re slow is because they’re working with hardware and there’s some parts of that. There are some parts of it that we can speed up, and then there’s some parts of it that there’s just nothing we can do about it because of the time of settling time and stuff like that. But with Django, it’s software.

00:03:31 If you’re working possibly, I guess if you’re working with a browser, there’s some slowdowns with task switching and stuff like that. But for the most part, we think of it as just software. It should be fast. Right. So what are the bits that how do we bite this? What are the bits that make the suite slow and what do we do about it?

00:03:48 Right. That’s obviously the broadest question possible.

00:03:53 Okay, we could be more specific if you want.

00:03:56 I’ve got to try Outline, so I have a whole chapter on parallelizing tests. I find that a fairly easy thing to add, especially the younger your project, is that some projects don’t take advantage of. You can do this outside of Django as well, for example in Python with the pytest extinction. Then there’s the whole idea of moving from disk to memory. Even with the best SSDs, you can only do go ten times faster for raw operations by using Ram. And Django. Projects sometimes take advantage of this by switching their database to SQL Lite. But I’ve got a technique to use with any of the major databases Postgres or MySQL where you can tell Docker to use Ram for the drives for those containers.

00:04:46 Otherwise, mostly to do with how the code is actually structured.

00:04:49 Yeah, actually the parallelization is totally cool. And actually I’m surprised how that seems like tricky with something that works with a database. So are there got you with that? Can you do parallel testing with the database? Do you have multiple databases or something?

00:05:05 Multiple databases is indeed the solution.

00:05:08 Django test framework and the PYT Django plugin both will kind of create the base database as the test start and then copy it into each of those subprocessors running tests so they are isolated in that way.

00:05:22 Okay, that’s actually very cool.

00:05:27 I want to tell you about a really amazing sponsor I’m partnering with. Monday.com is an easy to use, flexible and visual teamwork platform that powers teams to run processes, projects and build custom workflows in one digital workspace. Beautifully designed to manage any team, organization or process online. Monday.com powers over 100,000 teams daily work, and they just launched a contest to build apps that will be included in their Marketplace launch. You can build an app that can improve the way teams work together on Monday.com. Whether it’s an app to help marketing, construction, sales, software developers or anything in between, they are looking for creative impactful, amazing apps to feature in their upcoming apps marketplace. They’re giving away $180,000 in prizes, including three Tesla, Ten MacBooks and more. Have you ever dreamt of building an app that impacts the daily lives of hundreds of thousands of people? Well, now’s your chance.

00:06:22 Check it out at monday.com, testendcode and start building. Now, that’s Monday.com test and code.

00:06:34 So it does sound like a lot of the hang up is around the database interaction with the disk of memory, because Janko itself is not really doing disk. That must be the database, right?

00:06:47 Yeah, there’s a bunch of layers. The database is the main one. People often will throw in a cache server like Redis, and that Redis actually will write to disk if it’s in persistent mode. And then you’ve also got Django’s file uploads, which. Yeah, they’re not used in many views, but if you’re using them, you can go a lot faster by pushing them to memory. Instead of writing out to discovery time file uploads.

00:07:11 You can save those to memory. That’s cool. Wow. Yeah. All sorts of tricks in here. So I could see the benefit.

00:07:19 I wasn’t intending this to be a complete ad for your book, but I don’t mind it either. But it sounds like it’s a no brainer for people to try to speed up their tests with these techniques, any sort of metrics that you’ve seen. Like if I had a typical Django test suite, and I don’t know if there is a typical test suite, but let’s say I had one that was running a half an hour or something. How much time do you think we could squeeze out of it?

00:07:44 Yeah. So that’s actually been my experience. The first test feed I work to optimize was my job at Y Plan, and it was a team effort. We were hitting 30 minutes, and this is too slow, especially since we don’t have a very good CI setup. So developers were regularly running the tests on their own machines, so they go for lunch or something.

00:08:06 So after a few months of effort, we pushed it down to three minutes. So that’s like a ten times speed up and massive boost.

00:08:14 Yeah, that’s great. One technique that I use, I talked about instruments. Developers seldom just run everything. We leave that up to our CI system. And so developers themselves run. We have the tests we split up into sections for roughly the different types of things that you’re wanting to test or the different parts of the system. So if I know I’m working in a particular part of the system, that part of the system is with the test suite I’m going to run and even down into specific test and code specifics. But to have just like a few minutes for a test suite for the thing that I’m running, I assume that we can do that with Django as well, right?

00:09:00 Yeah. It kind of naturally has that for you already because you can split up a Django app into multiple apps. Django apps so that will be like a selfcontained portion of URLs and database models and the tests for them, and it can still get slow if you’re testing. I think I have them.

00:09:19 Okay, well, we don’t want to give away the book, but can we give away a few tips? Any good quick? Sure.

00:09:25 I’ve been posting a few up on my blog.

00:09:27 Paralysis is really the first step. I have a chat to run it, but you can get started straight away by passing the parallel flag to the test runner. You might have some isolation problems when you do this, because you’d be splitting your test suite across multiple processes. If you had any non isolated tests, or you could track them down by using randomization or reversal, as we discussed in the last episode. Yeah.

00:09:52 And from then on, paralyzing should work pretty well.

00:09:57 Okay, yeah. So paralyzing some of that, and then how many do you usually do? Do you have to specify how parallel or does it do that automatically?

00:10:06 Yeah, both with Jagga’s Test framework and pytest exists. If you don’t provide a number of processes that will use as many as you have CPU cores, whether they’re virtual through hyperthreading or actual physical cores depends on your actual machine, but that tends to be a good number to use.

00:10:22 How about database size? Do we need to care about being do I normally test with a copy of the live database or just a smaller database with the recommendations around that?

00:10:35 Yeah, that’s a good question. There are a lot of different practices here. I think the easiest is that you run your migrations, which should leave an empty database around, and then you populate the data as needed per test. But some shops will be taking a copy of the production database may be anonymized and cut down, and others will be using a lot of Django fixture files. The fixture file system. It exists in Django’s Test framework, but it’s kind of recommended against has been for many years because it’s basically storing your database in JSON becomes pretty hard to maintain over time.

00:11:07 Okay, so tell me more about that.

00:11:10 I’ve heard of Django fixtures. They’re different than bytes fixtures, right?

00:11:14 Right.

00:11:14 And they’re just a JSON file that has a you don’t write the JSON file, though, right. Or does Django do it for you?

00:11:22 Well, you can offer them if you want to match the format, or you can load data and dump data from your database, whether it’s your production or development database into these files, they do support YAML instead of JSON, which makes them maybe a little bit easier to edit. The problem is really like ongoing maintenance.

00:11:41 You might dump your user model at the start of the project, and then each time you add a field that’s required, you’re going to have to go edit all the fixture files that refer to it to add that field in for every instance.

00:11:55 Okay, so let’s say, I did like a copy of our system database, our live database, and did whatever randomization or anonymisation that I needed to to make sure that there was no security problems or no breach of user names and things like that.

00:12:16 I guess it depends on your database size. Is that even a workable thing to do? It seems like that would slow down things dramatically. Are there ways to do that and have it not be slow?

00:12:28 I haven’t seen a way that doesn’t become slow over time. Even if you’re Loading 50 megabytes of data, that’s a significant amount of IO before you start running tests. And then there’s a maintainability problem. Because you’d write tests that depend on some data that exists in your production database, you can’t see what they depend on directly. By reading the tests, you have to figure out all the different queries it runs. If Production needs to change that data for some reason, then your test break. It’s not a fun position.

00:12:57 So what do you recommend then? Do you recommend using fake data then, instead of live data?

00:13:03 Yeah, absolutely.

00:13:04 You can start your test by writing out all the pieces that you need. Like you say, I might need a user in this group with these permissions and then move towards factory functions, which I guess are somewhat what Python fixtures can solve for you that will you say give me a user and it automatically fills in the necessary fields for you that you didn’t specify? You can write these factory functions by hand, or you can move on to something like Factory Boy, which I’m a fan of, which provides declarative base for filling in any kind of data model with prefilled data, sometimes randomized.

00:13:44 Thank you, Datadog, for sponsoring this episode. Are you having trouble visualizing bottlenecks and latency in your apps and not sure where the issue is coming from or how to solve it with Data dogs? Endtoend monitoring platform. You can use their customizable built in dashboard to collect metrics and visualize app performance in real time. Datadog automatically correlates logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python applications. Plus, their service map automatically plots the flow of requests across your app architecture so you can understand dependencies and proactively monitor the performance of your apps. Start tracking the performance of your apps, sign up for free and install the agent and Datadog will send you a free Tshirt. To get started, visit Test And Code. Comdateddog.

00:14:35 Does the use of Pipe test or unit test matter in terms of performance?

00:14:40 No. I think pytest might have like a very small amount of extra overhead to get started, but it can also help you write faster tests because it’s slightly cleaner style, using fixtures and making them shared. But overall you can have a fast test suite using either.

00:14:56 Okay, I’m used to things where if I’ve got, I guess with the database, is there a way to share a database scenario with multiple tests? Or is Django like building up and tearing down the database for every test?

00:15:15 Django does some smart things for isolation that can be copied in other projects, so it’s default test case class uses a transaction per test and then rolls that back after the test is done. So any modifications you make to your database will automatically be undone and they cannot affect another test. So you get isolation there at the test level and then it’s gone one level beyond that per test class to get a transaction as well. So you can set up some piece of data that’s needed for every test function in the setup class, or there’s a Django specific set up test data function and then that can be shared. This is like a pytest module level fixture automatically undone in the database.

00:15:59 Wow. And can I utilize that sort of a feature in both Unit Test and Pi test? That sort of thing available?

00:16:06 Yeah. So when it comes to running Django test suites under pytest, I actually recommend still using the Django Unit test based classes because you can write the Plane asserts. But the helpers like this per test and Per class transaction, they’re just not available with Plane function tests.

00:16:27 Okay, that’s cool. That’s an interesting way around it.

00:16:31 Neat.

00:16:32 This is all very interesting. Anything that we missed that we want to cover?

00:16:35 Mox.

00:16:36 Oh, MOX. Scary topic.

00:16:38 Shocking topic.

00:16:39 Yeah. Tell me, tell me what your recommendation is around mocks.

00:16:43 What would you think it would be?

00:16:45 I don’t know. Always use them everywhere.

00:16:47 Oh, no, definitely not that.

00:16:50 I have a chapter called Targeted Mocking which goes over some of the theory of what mocking is.

00:16:56 And then there’s like the two things that Unit Test Mock is doing. One is allowing you to swap out attributes of arbitrary objects, and the other is providing this casual mock class that kind of does everything. And the first part that’s perfectly fine. We do need a way in Python to swap attributes during tests. It’s a great way of changing how things work for a test scenario. But the mock class itself, that’s what can lead you into loads of problems. It’s very easy to write tests that they pass, but they’re not actually testing AC because everything’s been swapped from mock objects. So I covered a few different more Targeted Mocking libraries that can provide you with the speed boost benefits and the isolation benefits without any of this problematic behavior of the mock class. So for example, there’s Requests Mock, which is a great library for mocking out the Requests library and faking what responses you would get if you were actually making real Http requests. There’s my library Time Machine and also Freeze Gun, which are great for marking out what the current time is, which is quite a common need for testing. What else is in there? Vcr Pi, which is another way of marking out requests. Okay.

00:18:12 And a couple more.

00:18:13 These are marking requests. Are there times where it’s reasonable to mock the database or do we do that? Not a thing.

00:18:22 Mocking the database. I don’t think it’s easily done with Django’s Orm the Orm is really focused on like writing SQL and then getting the database to figure out the answer. So anytime you mock what the answer would be, you’re not really testing the SQL generation side, so you might as well just not have a database during those tests, maybe refactor your code so it doesn’t need the database.

00:18:50 Separate the business logic from the actual saving to the database.

00:18:55 Okay, but we already talked about the memory stuff, so if we wanted mocks for speed up reasons, we could just move to an in memory database.

00:19:04 Right? You can probably get 30 or 40% speed boost using an in memory database. In reality, like the actual disk operations speed up, but not much of the other stuff that happens in a database like networking and SQL parsing.

00:19:18 Yeah, actually I was surprised at how, for instance, I was surprised at how fast I had a test suite that I was testing a plugin by test plug in and it was just an in house thing, but I was just writing little tiny files to the disk to be able to run them. Every little test was very small and very fast, so I didn’t really think about it. It’s one of those boiled frog things, by the way. I’m thinking of a Pragmatic programmer when it has the boiled frog story in it. I’m not sure where you got the analogy from, but.

00:19:56 I think it looks around many different places. But I do remember reading the Pragmatic program.

00:20:01 Yeah, I was just curious. It wasn’t like the entire test was pretty fast. It was a few seconds and I was like, but I am touching the file system. I wonder how much this would speed up. And it was actually an unreasonable amount of time for me to probably if I would have thought about it more, I could have done it quicker. But switching all the rights to a kind of a fake in memory file system and then running there, it turned out to dramatically improve the speed. The entire test suite was instantaneous almost, but it was like from second to instantaneous. So for my time it wasn’t worth the effort. But it is interesting that it does make quite a bit of difference. I think that’s a lot for people to get started with and even just thinking about doing a few of these things to speed up tests and even just think about test speed as something that you almost never care about right away. And then you almost always care about later in the project. So thinking about it early on, maybe instead of getting to the point where your test suite is problematic and if you’re starting a project that you think might have a chance of lasting a long time. Definitely thinking about speed early on is a good thing and if you’re working with Django, I highly recommend they speed up your Django test book. I’ve been reading it, actually. I’ve been reading it to learn as a tutorial for Django as well.

00:21:27 I know that was probably not the greatest Django tutorial but we would give you a test oriented intro.

00:21:34 Okay, so I’ve got a meta question for you. There’s a lot of people writing Django. Most of the people writing Django are also writing tests but you seem to care about this aspect of working with Django make getting the test right more than your average Django developer. Is there any reasons for that or you just think it’s a genetic thing with you or something?

00:21:56 Maybe I’ve got a few battles cars at this point from code that didn’t work in production and late nights spent fixing it. Okay. And yeah, I think testing also was something I wasn’t really taught at University and when I got into industry everyone was doing it and I was like oh, teach me more about it.

00:22:15 I just kept going.

00:22:16 Yeah. Interesting. I had the same experience of not learning it very much in College. I know it’s still a problem of people not really talking about that too much. I’m not sure why. So much for test first. I think maybe we should start programming classes with teaching kids how to test before they learn how to code. I don’t know if that makes sense. Well, thanks so much. Thanks for your effort. Thanks for coming on the show and also your efforts with Django and with trying to make testing more pleasant for all of the Django developers out there.

00:22:47 Thank you very much for having me. Again.

00:22:52 Thank you Adam Johnson for talking with us about speeding up Django test suites. Thank you Datadog for sponsoring check them out at testingco.com Datadog. Thank you Monday.com for sponsoring join their contest at monday.com Test And Code and thank you all the listeners that support the show through Patreon join them by going to testandcode.com support all of those links as well as a link to Adam’s books beating up your Django tests are in the show. Notes@testandcode.com 135 that’s all for now. Now go out and test something.