82 - pytest - favorite features since 3.0 - Anthony Sottile

Anthony Sottile is a pytest core contributor, as well as a maintainer and contributor to many other projects. In this episode, Anthony shares some of the super cool features of pytest that have been added since he started using it.

Transcript for episode 82 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

Anthony Sotilly is a Pytech core contributor as well as a maintainer and contributor to many other projects. In this episode, Anthony shares some of the super cool features of pytest that have been added since he started using it. Thank you to adding Azure Pipelines for sponsoring this episode. Many organizations and open source projects are using Azure Pipelines already get started for free at azure.com pipelines welcome to Test and Code, the podcast about software development, software testing, and Python.

Welcome to Testing Code. Today we have Anthony Sotill.

Now you just told me and I forgot already.

Sotilly Anthony Sotilly. Okay, but the Italian version. Tell me that again.

I like that.

Yes. So my name is Anthony. I work on a lot of open source software, in particular pytest Flakegate, precommit talks and spattering of other things.

I currently work at Lyft as an infrastructure developer. I work on the Developer Experience team where I build tools and software to make developers productive, or at least we try.

Okay, that’s for internal developers.

Yeah. So we target basically any service or feature developer at Lyft will use the tools that we build.

Okay.

So we also build like the development environments that users will use as well.

Oh, cool.

Do you use pytest at work then also?

Yeah, we actually use pytest in almost all of our 600 ish Python services. Maybe it’s 500 ish somewhere around there. We use pytest in almost all of our services, except for the really old ones which are still using nodes.

But we’ve been trying to migrate all of them to pytest and we also use it in our Python libraries. We have a number of internal libraries that help our services move along smoothly.

Cool. Nice.

Now, one of the things I want to talk about what brought you on is for some of the cool new features of pytest, but pytest has been around for a long time.

I got started way back in the two X days, and we’re up to like five X now.

How long have you been using pytest?

Yeah. So I actually got started working with pytest as a user in early 2014, which I think around then was the two six release.

Before that, I was using a different testing framework from the company that I worked at, which I used to work at Yelp, and they wrote their own test framework called Testify. And around that time, Yelp started sort of filling out other test frameworks to see whether Testify was actually a good idea or not, and largely landed on Pi test as the way forward. And so I was doing some experimental personal projects at the time and decided to check it out before I needed to learn it for work and found that it was way better than what I was using before.

Yeah.

Changed quite a bit since 2006, but I think it’s actually pretty impressive how much the core has remained the same. I was actually just looking back at some of the older code that I had written in that era of Pi test.

And most of the tests look about as about the same as I would write today when you listed all the stuff you contribute to. It’s like my entire tool chain.

So I’m glad that we’re getting a chance to talk, so I know who to ask questions about.

I’ll be on your speed dial there.

But there’s a big difference between using a tool like me and contributing to something, pretty much any of them. But how did you go from being a user to developer?

Yeah. So other than Precommit, which I created, the story for most of the other tools is about the same where I was either a longtime user or I had used it enough to find enough of the rough edges, or just like getting beyond using to where I’m trying to integrate things in interesting ways. And so I inevitably either, like, made bug reports or found little things to improve or features to add. And that’s kind of how I got started with any of those sorts of projects in particular with pytest.

I was a kind of longtime user of Pi test, and I originally got started in around 2017, and my initial commits were adding features that I actually needed for work at the time. My first feature that I added to pytest was Capsis Binary and Cap SD binary.

The idea was to make the capturing fixtures work for binary streams as well as text streams, and I basically needed to test some obscure integration where I was writing. I think it was Tar output to standard out and that’s not text output, so I needed binary output. But I basically got started with a few small contributions and then did some issue triage because I was like waiting for someone to review my code and notice some other interesting issues and added a few comments here and there, and from there it was basically just like I got more and more involved with the project and was eventually given the commit bit and haven’t really looked back since. I’ve been contributing more and more as time goes on as well, and some of that is because I learn more and more of the code base. So it’s easier for me to work on more different parts of Python, but it also I also just have more free time to work on it nowadays.

Yeah, there are corners of it that are more difficult to understand, Dan, than others.

Absolutely. Yeah. There’s some parts that I still don’t even understand.

I think I jumped into the wrong end. My foray into looking at the source code was how does the assert rewriting work?

Yeah, I’ve actually been touching that code quite a lot. Recently we added a feature that ended up getting rolled back, but it touched a lot of how assertion rewriting needs to understand the expressions that are being asserted on and kind of do some fancy bytecode rewriting in order to give us those nice assertion messages that Pytech gives us.

The last thing that I touched, there was a new hook that got added so that I think it was you can hook into when an assertion passes such that you could write metrics on how many assertions are in my code base.

That’s actually really cool because there was a test framework I used was one of those custom rolled in house things that not only counted test cases and basically as test functions or test cases, but also the number of checks within the test case. And in this case, it would be how many asserts we’re trying to hit or we are hitting.

Is that just in the test function? Or I guess within pipest, it doesn’t really matter. Any code that’s running during a test and assert causes a fail. Right?

Yeah. So I think the way the hook works is it works for anything that gets rewritten by the pytest machinery that’s usually tests but also plugins, and you can manually Mark other files to get rewritten as well.

Oh, yeah. Right. Okay. But like the source code that you’re testing, that won’t be rewritten unless you explicitly say so. Right, right.

Yeah.

Or unless your source code happens to be either a Python plug in or it’s named test, which I don’t know why you would do that, but it could happen.

Yeah. Okay. You have a list of some of the features that have been new since you started that you like.

So let’s talk about those.

Sure. Yeah. I’m going to go version by version and tell you about the features that I want to highlight.

But. Yeah. So the first kind of new version for me was way back in pytest 3.0. But this was the first version that pytest supported yield fixtures in the normal fixture decorator and therefore deprecated yield fixture.

I think this was one of the first times where we were able this is when I worked at Yelp. We were able to internally write some shared testing libraries that had some of these nice fixtures that we could reuse without depending on experimental features.

Yeah. And this is an incredible feature that was added in. So I’m sure everybody listening is familiar with it. But after the yield fixture went in, you have one function. Your fixture, like the top half of it is the setup that runs before your test and then a yield statement, and then everything under the yield statement happens after your test.

And with Scoping, you can control when this happens, of course. But before that, your setup and tear down were in different functions, right? Yes. So this bundling of a resource with setup and tear down within one function is huge. It makes Python so much easier to learn.

Yeah. And it also made it really hard to do things like use mock patch, for instance, because it’s really easy to use it with a yield fixture because you can just use the context manager for them, but you had to manually start and stop stuff, and sometimes your tear downs wouldn’t necessarily run in the order that you wanted. It was just a whole mess. And yield just made that way, way better.

You can set up a context manager and yield within the context.

Yes, that’s actually the biggest reason that we used it at Yelp Yelp Testify Framework had a similar feature called Setup Tear Down, which was essentially yield fixtures before pytest had it. But since it became a real feature, we were able to really easily migrate our tests to pytest.

Cool. Okay, well, sorry to get off on that tangent, but I’m just a huge fan of yield.

So am I.

Okay, so one of the next things that landed that I thought was pretty important to call out was some of the plugins that got integrated from the community into pytest core.

The first of those is the Warnings plugin, which landed in 3.1, which made warning output. That happened from tests much more visible. And at the same time you could write tests that used the Rec warrant fixture to capture warnings and perform assertions upon them. And this also made warnings a lot more visible in tests because now you could see, like when you’re using deprecated features or when the code that you’re running might not work in the future and surfacing that at test time, I think is really important so that you’re starting to see those before your users see them.

Yeah. And the behavior of code issuing a warning in certain situations is a behavior, and now you can test for it. So that’s super cool.

Yeah. And another thing that got released in three, one that I actually use a lot is the it’s not really new anymore, but pytest Param got a bunch of new functionalities that make it much easier to add markers or change the test name of particular permutations in a parameterize decorator. So when you want to test a bunch of tests, but sometimes you want to control what those tests are named Pi test parameters makes it really easy to pass like a name parameter for your different permutation.

Yeah, I love that it’s sometimes jolting to go from some of the parameters are just not using pytestprem, but then having a few with Prem, but it’s still very useful. I like it.

Absolutely. Yeah. I actually didn’t realize this feature until probably like a couple of months ago.

Someone showed it to me in a code review. I was like, this changes everything. I don’t have to have tests that are called like test rewrite, zero one, one true false. They can be like test rewrite this scenario instead.

Yeah. That’s one of the my main uses that I use it for is having the intent of the test case in the name instead of just the random whatever is there.

This episode of Test and Code is sponsored by Azure Pipelines. Azure Pipelines is a continuous integration, continuous deployment service that supports Python and any other language on Windows, Mac, and Linux and lets you run automatic builds and test of your code.

It’s fully integrated with GitHub and lets you define your continuous integration and delivery pipelines with a YAML document.

Azure Pipelines is free for individuals and small teams. If you are maintaining an open source project, you get unlimited build minutes and ten concurrent pipelines.

Many organizations and open source projects, including Pythons, are using Azure Pipelines already. Get started for free at Azure. Compipelines for 3.3. You have your binary versions that you did.

Yes. Talked about that one already, but yeah, three was when I first started actually contributing to Pi test. So everything from there onwards I either helped work on or was able to review and push out, so that’s when I joined the project.

Okay, what’s next?

The next thing we already kind of talked about the warnings plugin, so we’ll skip that. It did get better in 38. The next biggest thing is pytest added a bunch of temporary directory fixtures that are based on path. Lib instead of the Pi not Path library.

This just brings all of those fixtures much, much closer to standard Python tooling instead of using the kind of deprecated Pi library which originally grew out of yeah.

I love this part.

Both of them are pretty easy to use, but the path. Libs are supported by the rest of Python better and also using the temperature. The object that is passed back is something that if I was trying to describe it to people, I’d have to point to some hidden documentation somewhere.

Yeah, like a Pi underscorepath underscore local Path object or something ridiculous like that. It’s like, well now I can at least just say it’s a path. Lobe object, and that’s way simpler to tell people.

On the other end of it, the usage of it hasn’t changed that much, so it doesn’t take that much time to convert. If you wanted to convert mostly the same temper to Temp Path.

There’S a couple of things that aren’t supported, like there’s no good context manager for changing directory in path. Lib, but the Monkey Patch fixture from pytest has a built in way to change the directory as well.

You can actually combine Temp path and Monkey patch to do it in one go.

Yeah, that’s good. Okay, next.

Cool. Next is in 310.

This is just like a very, very minor change, but I think it helps make the output a lot more readable.

The little dots in the pytest output now have colors by default.

This was actually the inclusion of a third party framework, but it ended up only being like a couple of line change to make it happen. And it was their first contribution to Path test, and it was pretty cool feature for not that much code.

What color are the dots?

So, like, past tests are green, failed tests are red, and I think skipped and Xfail are both yellow. But I have to double check that.

Okay, so it’s not just the dots that get color. All the other little things get colored, too. Yes, that’s cool. Yeah.

And actually around that time we tried to make a lot of coloring a little bit more consistent.

That was partially my effort because I wanted to make the docs look better. So I wrote a pigments plugin that highlights plain text pytest output so that you can have colors in like Sphinx documentation.

So I did a bunch of stuff to clean up where we color stuff and make it a little bit more consistent.

Okay, cool, nice.

Cool. The next two features are pretty tightly related, but in two different versions. And so one of them is a removal, and then one of them is a reintroduction of basically the same feature, but a slightly different implementation.

So there was this testing type called Yield Tests, which I didn’t use all that much, but it was sometimes useful as an alternative to parameterize. Basically, the way it would work is you would write a test case that was a generator, and you would loop over some construct inside of your test and yield an assertion, and each of those assertions would show up as a separate test.

It was an idea that was, I think, borrowed from Nose, but I’m not really sure where it came from.

But in Pythest 40, that feature was removed because we changed how the internal discovery system worked, and Yield Tests just were utterly incompatible with them. In Python 44, we reintroduced we changed some parts of Core so that a plugin could kind of reintroduce the idea of yield tests, but in a much more supported way.

And from that grew the Pi Test subtest plugin.

I’m still not 100% sure how it works, because I haven’t used it yet. But the basic idea is to support basically the same idea as Yield Test.

Where you can fine tune a parameterization and create tests at runtime the subtest was one of the features of Unit Test that it didn’t work right in pytest because you can use pytest to run Unit Test tests.

But if your Unit test had sub tests in it, it didn’t work right. But with this pipe test subtest plug in, it does work right?

Yeah. I was about to say, like, I was pretty sure subtest came from Unit Test, but I couldn’t remember the exact place. But yeah, that makes a lot of sense.

My recommendation is to not use them, because when I tried to figure out how they work completely, I was utterly disappointed.

And then I thought it was maybe the Pi Test implementation, and I went and tried it in Unit Test and no, they’re just very hard to use. Yeah, if you’re using them and you’re like, I don’t know what this guy’s talking about. They work fine. Well, then good for you. My issue is I have to report stuff to something like JUnit. Xml or some other reporting mechanism. And the issue is you get more tests. If you have a bunch of failing tests, you can get a whole bunch with subtests, you can get a whole bunch of failing tests, and your total number of tests can exceed the number of tests. And that’s weird.

All right, so you can have, like, 110% completed.

I remember I fixed the bugger on that at some point. The highest was like, you are now 400% complete.

Okay, sorry for the tangent, but anyway, it’s fine. What’s next?

So the last two changes are pretty related, but mostly around the release structure of Python moving forward.

So in pytest Four Six, we announced our last version that will support Python Two as well as Python 3.4. But I think that’s less Earth shattering for most people and announced that the 50 and plus releases will be the first releases that will only support Python Three.

Now, this is not saying that we won’t continue to support Python Two.

The Four Six branch continues to receive bug fix patches and will continue to do so towards the end of the year.

After that, we will still accept bug fixes, but we won’t be maintaining it ourselves. It’ll mostly just be community contributed fixes.

And moving forward, we’ll be able to use Python Three features and do a lot of both internal cleanup and fix. A lot of longstanding bugs that were basically because we needed to support Python Two at the same time.

And some of the stuff that we did around that was like we upgraded all of the syntax in pytest to be Python Three only. This allowed us to use some newer features like removing inheritance from object like simple super, and getting rid of Unicode literals and a bunch of stuff like that.

We also were able to redo the import system of pytest. It was using a deprecated module called Imp in the standard Lib, and Imp has a bunch of bugs, and it is slightly different from how normal imports work. And so by switching to the import Lib based system, we were able to fix some bugs around like case insensitive, file systems, namespace packages, and a bunch of other stuff that was slightly wrong. Also, we got rid of some deprecation warnings that would always show up in our test suite.

Yeah, that’s cool. I’m a big sigh of a relief to get to that point. So now we can new contributors don’t need to worry about trying to deal with two and three support. That’s not fun.

Yeah, that was a big deal. People would come by and add a feature and all the tests would pass and Python Three. And then they’d be like, well, what is this Unicode decode error thing? What do I do with that it’s good to not have to deal with that so much anymore.

How does that work? If somebody’s testing something, do they need to test on all of the different versions? Or is there a tool like a continuous integration tool chain that tests on multiple versions?

So the way pytest tests work is we use some free providers to run our tests. So we use both Azure Pipelines and Travis CI to run our test and code configure those to install a bunch of different interpreters and test them on the platforms that we support. So we currently test, I want to say Windows, Mac OS and Linux, and we test everything from well, depending on the branch. So for the four six branch, we test Python 27 and then four six up.

And in the five X branch, we just test Python 3.5 and up.

Okay.

The CI tools give us a nice way to build, like a matrix of different operating system and Python version tests. And then we use talks to provision the virtual environments and actually run the tests.

So if somebody pushes does a pull request, the pull request will get tested before it notifies people of it.

Yes. So you’ll basically be able to see your PR status either as red or green, and then have the ability to drill into each of those failures if you have failures.

Okay. I love that workflow because a contributor can test it on one version and then let the CI tool test it on everything else.

Yeah, because it’s sometimes hard to install all those different interpreters. And I didn’t even have a Mac machine to test on for the longest time. And so I would largely rely on CI to validate that platform. And I don’t expect our contributors to install every version of Python under the sun.

I was going to ask you how people can get involved, but one of the things I want to cover first to make sure people know about it is there are funding models now if people want to contribute, not time, but some cash.

Yeah. There’s two models that pytest now has opened up for people to contribute. One of those is through a service called Tide Lift. The idea behind Tidelift is you basically sign up for a subscription model. And I think the way it works is you say what things that you use most often, and it redistributes the funds that you pay for your subscription to those tools that you want to support.

pytest also supports one off contributions through Open Collective. And I assume you have a description, so we can put links down there, but you can also find it from Pytheft’s GitHub and click on the sponsorship button up at the top.

Yeah, we’ll put links to these.

The other really cool thing about these funding models is they’re very open about how the contributions get used because all of that information ends up being public.

You can see when we put in a line item and you can see what that is going towards.

Yeah, it’s nice. Okay. Now if I do want to contribute time, I assume that there’s room for more developers to get involved, to help out.

Absolutely. There’s always going to be more room for developers. We can never have enough contributions. I think the biggest thing that Pytech really needs right now to kind of grow as a community is to burn down our issue backlog. We’ve gotten pretty good at burning down the open pull request count, but the number of issues is just like a lot of the stuff that needs to happen. There is really just like simple issue triage or like closing duplicates or figuring out whether things are bugs or not, writing simple test cases for them, or collaborating together with other issues that have similar problems.

But it’s a lot to burn down. And sometimes I’ll spend an hour of my free time and just go through a page of issues and either close them out or find duplicates for them, or if they’re like simple bug fixes. I’ll make a small patch that gets those resolved and out of the way, but a lot of that needs to happen in order to drill down that backlog.

But beyond that, I think the other way to get involved with the Pi test project, or at least the easiest way to get started is to go to the issue tracker.

We have a label for easy tasks. We try and Mark some issues that are something that’s going to get you introduced into the code base or is like a one or two line patch for some minor bug or minor situation. That doesn’t happen too often. And those are often useful to both understand how our development flow works, what sort of tools we use to develop on pytest, how we self test pytest, which I think there’s like almost 3000 tests of Pytest for Pytest, which is quite a few.

But it kind of gets you into that workflow that the current contributors use. And from there you can kind of expand out further, find other bugs to fix, write new features if you need them.

Also, like contributing highquality bug reports is also super valued. I love getting a bug report where the user has gone the extra mile. It includes a reproduction that has basically full source, the expected output, what they’re seeing, crash logs, version information, all that other stuff.

And even adding that to an existing issue is super valuable.

Yeah.

One of the things I love is when I see somebody takes the time to not say, I’ve got this huge test scenario and I get this failure, but they boil it down to a simple toy example that fails.

Absolutely. Yeah. My two least favorite bug reports are here’s this 100,000 line code base, it’s broken or this isn’t working and then doesn’t explain it at all. But I think people are pretty good about bug reports, at least on the Pythons project, we have an issue template that tries to guide you through building out a high quality either reproduction or issue, although some people just delete the template and we don’t get a good bug report.

But how many people are involved with pytest do you know?

It’s a good question. So from the core developer perspective, right now it hovers between like four to six that contribute most of the patches, but there’s probably around like 30 ish people that contribute either frequently or semifrequently.

Okay. But that’s still pretty small. That’s a small group of people that we have a whole bunch of people depending on pipest now.

Hundreds of thousands leaning on the backs of essentially a bus load of people.

Yeah, that’s incredible.

Any cool things coming up that I can get excited about?

So I kind of have an inkling about two larger ideas that we eventually want to solve. But one of the interesting things about pytest is a lot of the features come along organically, and we don’t necessarily plan them super far ahead. But the two that I’ve been thinking a lot about, and hopefully we’ll be able to get in at some point.

The first one of those is smarter fixture ordering.

Right now, pytest will just greedily instantiate fixtures and lazily deallocate fixtures. And so sometimes fixtures will have life cycles that are longer than you would expect, or they get instantiated many more times than you would expect.

But it’s kind of a hard problem. We would essentially need to build a Satisfiability solver and apply that to our fixture set up in tears, but it has the potential to improve a lot of, like, memory overhead or CPU time overhead for repeated operations that don’t necessarily need to happen all that often.

That sounds like a great problem for a grad student.

Yeah. It’s one of those Super Mathy problems that I don’t think I know enough about fat solvers in order to solve, but it would be really great for somebody to jump in who knows more about how you would.

It essentially comes down to a graph problem where you would plot all of the fixtures and tests as a graph and then optimize the usage there.

Yeah, be cool.

Okay.

Anyway, and one other thing that’s kind of on the horizon that we’re looking into is better interleaving discovery and execution.

Pipest right now has a model where the first stage it does is it collects all of your tests, and that basically involves importing every module and permuting out every parameterized decorator. And then only after that is completed, it will start running the tests.

But this can lead to a huge amount of memory overload, especially for test suites that have hundreds of thousands or millions of tests. For instance, the cryptography test suite hits this all the time. You basically need to have like six gigs of Ram in order to run their test suite and a lot of that is because PY test has a bunch of overhead for the objects.

But there’s some thoughts on how we could kind of like iteratively discover, but at the same time start running tests as soon as they get discovered.

But it has a bunch of implications around. Like, well, what if discovery fails part way through running test and code? How do we adjust plugins and such to handle those changes? And there’s a lot of problems that we would need to figure out. And so it’s really just at the birth of the idea.

Other than that, I don’t really know too much that’s in the pipeline in the future.

Okay. Yeah, both those sounded like interesting problems.

I didn’t think about the dependence on plugins and stuff.

Yeah, there’s some other hooks. Like there’s one hook that allows you to reorder items and change how they execute or add decorators and fixtures at discovery time and we would have to completely change how that works.

I don’t even know if it would be compatible. It’s possible this idea will never get off the ground.

So how do people get around it now? Potentially one of the things people could do if you run into that is a brute force method Would be to split your test suite into two directories and run them separately.

Yeah, that’s what most people do. We have a couple of services at lift that hit this problem as well. And so we have test unit one and test unit two and test unit three. And we split those test suites up so that they don’t consume so much memory when discovering tests. And we can also run them in parallel, too, without needing something like exodus.

Yeah. Okay, cool. So much exciting things going on. And yet there’s not that many things that were commonly used that have been deprecated.

Even though there’s all these new features, tests written 3456 years ago, Most of them still just run fine.

Yeah, a lot of them.

I think the biggest thing that I’ve had when upgrading the version of pytest is just like make sure you’ve upgraded all of your associated plugins and for the most part, things just work.

Well, this has been a total fun episode, I think, for pytest nerds. Thanks a lot. And we will talk to you later.

Yes, have a good one.

Thanks again to Azure pipelines for sponsoring this episode. Get started for free at azure.com pipelines. That link is also in the show notes at testandcode.com 82 a great way to get started started with BYTest, of course, is with the book Python, testing with bytes and I’ll drop a link in the show notes for that, too. That’s all for now. Now go out and test something.