Andy Knight joins Brian Okken in discussing Feature Tests as opposed to Class Tests


Transcript for episode 51 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


Welcome to Test and Code, a podcast about software development and software testing.

On today’s episode, Andy Knight joins me in discussing the concept of feature testing. A feature testing is a test verifying a service or library as the customer would use it, but within a single process. That was a quote from an article that appeared on the Twitter Engineering blog. The article describes a shift away from class test towards feature tests. The benefit of the shift and some reactions to it. Feature tests are similar to something I used to call functional subcutaneous integration tests, but it’s a way better name and I plan to use it more often. The idea fits well into my testing philosophy. Andy Knight is someone who is still holding on to the test pyramid. Poor guy. So I thought it would be fun to ask him to discuss feature testing with me. I think it’s a balanced discussion. I hope you enjoy it and learn something. Thank you to Pie PyCharm for sponsoring this episode.

Now let’s hear about feature testing.

Good morning. How’s it going?

Going well. How are you?

I am great.

Welcome back to the show.

Thank you. I’m glad to be here.

So we actually didn’t have you. It wasn’t that long ago. It was on episode 47. We talked.

Correct.

And we talked about all sorts of stuff. So anything changed since then? You’ve given any cool talks or anything?

Well, I did go to Pikachu.

All right. Yeah. How was that?

It was a lot of fun. I’ve been in New York City before, so that was all right. Getting around.

It was a lot bigger than Pi, Ohio. I thought they were going to be about the same size, but not surprised that it was bigger. But the conference was really good.

Well, great. Did you talk there?

Yes, I gave the same talk as I did at Pai, Ohio.

Just giving an overview of how to proceed with test automation if you’ve never done it before or if you’ve tried and failed miserably.

Okay.

So covering things like, hey, this is Pi test, and these are different types of tests you can write and different libraries you can use those well received kind of.

What we wanted to talk about today is a concept called feature testing. I got this from an article that was on the Twitter Engineering blog, and it was published in October 2017. And two authors, one of them backtracked over and one of them, we don’t know his real name. It’s Dean’s Old tweets, but Dean’s Old Tweets is defunct. So Dean somebody it’s an article which talks about kind of the level of testing. So they start off with describing different types of testing that they’ve used. And the first is single class tests, which is testing a single class and verifying the methods of that class and mocking out the dependencies around that. So that’s what a lot of people would think of as a unit test. And a feature test is a test verifying a service or library as a customer to use it, but within a single process, generally going through the API and generally not knocking anything out unless you really have to for like a speed or a service. Or they also reference using mocks to inject failure cases. And then a unit test is a test verifying a unit of functionality. And what qualifies as a unit test is ambiguous since the size of a unit can vary or could even mean that any test defined in JUnit. So they just don’t even use that term. I’m kind of on that case. There’s so many definitions of unit that I don’t even know if it’s worthwhile. So what I normally think of as a unit test, they called a single class test, and what I’m mostly familiar with, they call it feature test, but I was referring to this as a functional subcutaneous test. Feature test is easier to say, so I like that. And then they talk about how generally they’re trying to move more towards feature tests. And it isn’t just adding to their test suite, it’s replacing stuff. They’re like removing single class tests and replacing them with feature tests and some advantages. And I kind of agree with all these advantages, the safety net for refactoring. So since you’re testing at an API level, you’re not going to change that as much. And the refactoring is usually refactoring internal code testing. From a customer point of view, I love this because since you’re testing from the API level, you pretend to be a customer while you’re testing, you’re testing the NDN behavior, which has to be done anyway. So we may as well do that, and then in the end you write fewer tests and then I don’t get this one. One of the advantages is service as Plugable library.

They say if set up correctly, feature tests lead towards a service design in which the service module itself is embedded in another application.

I don’t even know what that means, but I guess if you have it tested well from the API level, you can reuse it in another something. I guess I’m not going to go through all of the concerns, but they did talk about they’ve had some people have concerns with this approach and then they address each of them. But I wanted to know what you thought since you kind of do different types of testing. That I do, sure.

Yeah. I’m really glad you shared this article. When I first heard it, I found it fascinating.

I’ve used the word feature tests as well for things like integration and tests because I’ve struggled with the word right. You said you call them functional something something tests. Yeah.

So I think it’s interesting that we’ve both had the same problem in trying to define what those kinds of tests are. I’ve called them above unit tests or above unit black box tests. Integration and end to end tests. But I have in the past also settled on feature tests, a term kind of borrowed from Behavior Driven Development a bit.

What makes this articles approach extremely interesting to me is it’s an extremely pragmatic approach to risk based strategy for testing.

I don’t think anybody would disagree that feature tests are useful and good and arguably necessary when delivering product. But the proposition here is to say, well, let’s do only feature testing. Let’s throw away what would be called classically unit testers. They turned in single class tests.

That makes it really interesting because I totally sympathize with many of the advantages. They say, specifically the safety net for refactoring. Many times when you change code, even if it’s a small change, you run the test and you find out tests I should say you run the unit test or the single class test and you find out the tests are failing and the tests are not failing because you broke something. The tests are failing because you forgot to update the tests. Right? And so then there is that extra work you have to do to go update those tests to put them back in sync. But it does add a lot of work.

What guys are saying is, well, let’s just forget all that. Let’s take an acceptable risk to not even worry with this type of single class testing and we’ll go straight to the future tests.

I can’t say I would do that for myself, but I can see why they would want to take such a risk based strategy to move fast and to really try to get value out of what their tests are delivering to them.

One of the hopes for your automated test is that you can change one or the other, but not both. You can add new tests or you can change code, but if you’re doing both at the same time and not changing behavior, then your safety nets gone because you don’t really know. Did you just change the tests appropriately, or did you change them just so they would pass?

So you said you wouldn’t do it yourself.

Why is that?

I would have hesitations.

I still very much believe in the test automation pyramid where you have unit tests at the bottom, integration in the middle, and end at the top.

I don’t necessarily think you need to have hard proportions on that. It’s the idea of a layered approach to your test automation strategy.

Personally, I still think there can be value in unit tests or single class tests.

And I think specifically to when I’ve done Django web app development in Python, there are many pieces to web app development and Django does a great job of separating concerns and putting things here and there.

And so certain things like custom template Tags in Django, or if you have like a utilities library to do very low level things.

A lot of times I like to just write unit tests for myself, even if I know you’re going to hit that refactoring issue just because I can catch things right away. If there’s a problem, I can check myself before I wreck myself.

And I mean, I’ll be honest, there have been times where I’ve written, let’s say, four or five unit test variations of one particular function call. And in my test, when I was thinking I put my expectations in at that low level and the test failed because I implemented something wrong. So I caught it early rather than seeing it or not seeing it, even missing it Puke on the screen later when I’m trying to do something bigger. Right. So I still would want to write some of those single class tests or unit tests.

I don’t necessarily think it’s always right to have 100% code coverage of those.

And so I find myself gravitating these days much more towards that pragmatic approach of making sure, regardless of what level the test is or the type of test that it is, delivering value for the return on investment you expect out of it.

Yeah. One of the things that I always struggle with with lower level tests is with the requirements of the code under test are when I’m really low level, because I don’t know how to tie that to a customer requirement for things like utility libraries. I think it’s completely reasonable to test the library and isolation of the rest of the application.

I still think this falls under feature testing because I’m thinking of it as a utility library and testing it at the API level. Okay. At work I’ve got utilities that we use for testing, of course, and a couple of different projects that have completely different repositories. We kind of want the same tools around for our little utility libraries for testing and so pulled the utilities into their own package and do tests for those as I put them in a different package and I was going to share in between projects, I definitely thought I need to put tests with these. It’s the safety net park because I want one code set base to be able to add to it and change it and not break the other one.

I think I remember that you weren’t somebody that uses a lot of mocks. Or do you?

I use them sparingly.

Okay.

I will use box in what I would call my unit tests because I don’t want them to have any dependencies on anything else. But I also recommend the pragmatic approach is that if you are mocking too much, maybe you should not do that because you’re putting a lot of effort into mocking versus testing.

Going back to this article a little bit, one of the complaints they had was we depend on too many remote services that require mocking, and their response is to start with feature testing specifically to avoid this. One of the ways they’ve done it is to compose multiple service calls into one. So they essentially create a new interface around their external service and a. Welldefined, subset of that external service, and then just mock the new interface when they need to, and it reduces the entry points into the external interface or the external service. I thought that was a really interesting idea.

Yes. Mock some things and leave other things to be real, or rather, use the real things and then mock only what you need to in order to be successful. Right. So partial mock approach.

Yeah. And also just to limit your interface into a different service, like, let’s say I’ve got an API to Twitter or an API to Twilio or something. If I’ve got code to access those external services all over my code, it’s going to be hard to mock because it’s all over the place.

But if I funnel all of my external calls through a Shim layer between my application and the external service, then I can just mock there.

And it simplifies that I find myself actually using mocks more for failure injection to see how my system behaves during failures, rather than because of some sort of trying to speed up my test with mocks.

A lot of time services are just darn really good, especially like a database or something. How do you get it to fail?

And one of the ways is to just pretend it failed and Mark a failure out.

Indeed. Also, I want to comment. Their approach reminds me very much of how you do testing with Angular Web development, the Angular JavaScript framework. Okay, I’m not sure how familiar you are with Angular, but I’ve done a little bit of it, and I think it came from Google. Right. Angularjs then turned into Angular, and now it’s just going crazy awesome. But they build themselves as being a testing obsessed framework or something, and they have very good test beds set up for this type of thing. Because with Angular, because it’s a front end framework, you’re constantly calling out to get different things and then bringing it back to render in your page. And so a lot of the extra support they add to the Jasmine test framework is for things like mocking different components and services and all the buzzwords I can’t remember.

And so, yeah, it’s like this. You can choose what you want to make the real thing, and then you can also choose what you want to mock and kind of blend them together. And it works quite beautifully.

That’s cool.

I’ve not done the Angular before this episode of Test and Code brought to you by PyCharm. I started using PyCharm because of the amazing automated test support, especially for pytest, but the more I use it, the more I realize how much time it saves me. And the rest of my day I can click, commit and walk through all of my changes and make sure everything is really something I want to keep. If I see a print statement that I didn’t mean to leave in the code, I just Uncheck that part of the file and it isn’t committed.

Want to try out a snippet of Python? There are tabs at the bottom for quick access to the Python console, as well as to the command line console, version control, and even a Todo list that’s populated by to do comments in the project.

I find myself opening other tools less and less that time saving of contextswitching adds up. If you value your time like I do, try PyCharm, head to testandcode.com PyCharm and try the Pro version for free for four months.

One of the things that I found as I’m doing more testing that looks like this feature testing. One of the complaints they saw that they wanted to address was I’m going to quote for this, I have no idea what this test is testing or doing. We should do single class testing instead. I don’t know what the API really is and the response really is to me it seems like this is an issue of not understanding your customers and their use case. Feature test naturally provide great documentation. So step through the use case, reverse engineering and retrain the team and lost knowledge. This is definitely something that rings true to me because I have also been one of these developers and I’ve worked with developers like this before that they know their piece of code that they’re supposed to write and they don’t really know how it fits into the rest of the system or how the system works. And there are some times where you’re like, why should I have to care about the entire system? I just want to code my piece, but trying to keep in mind the entire system and think about it from the API level. From the customer point of view, I’ve never regretted that the knowledge gained in the entire system and how it’s being used. Eventually I think we’ll make you a faster developer even at the low level, because you’re able to keep in mind the pieces of your code that are really important and the pieces that maybe aren’t too much in specific. I’ve seen people waste some time in the corner cases to make sure some obscure input doesn’t fail when it’s probably never going to get hit. Maybe we can even just limit the scope of the library or something.

When I read this particular point, I have no idea what this test is and what it’s doing. I was thinking I find that to be the case as much with the single class unit testing as with feature level testing. Right? I agree. The issue here is not the approach. The issue here is the person.

Either they don’t really understand what the business it is or they really don’t understand the code.

Right?

I mean, people need to write good tests at whatever level they write the tests and tests need to be descriptive, they need to be documented. Well, they need to test the right things, as you mentioned.

So, yeah, this cuts both ways.

It does cut both ways. And maintenance is definitely a real thing. And the benefit of being able to test your system with less tests makes it so it’s a smaller time investment to have more people read it regularly and make sure they understand what the testing is doing. I’m not saying that that’s in opposition to unit test. I think unit test should be the same way or single class test. More people should be reading them and understanding them.

I still think it’s bizarre to have people talking about how awesome it is because of the thousands of test and code larger number of tests, the better. If they’re useful, great. But are you really reading them all on a regular basis? It also depends on what you count as a test. I have quite a few tests now that are using Pi test parameterization. And so a single test should be understandable, but it’s inputs. I might iterate through lots like a huge matrix of parameters that go through it, in which case it’s hitting different paths through the system, maybe 100 different test cases, but it’s still just one test that people have to be able to read.

Sure.

Actually, I have a question for you on that. What’s the typical table size you find when you write test using the pytest parameterize feature?

I don’t think I have one.

Okay, no worries.

The test needs to be readable, but the test table needs to be readable also. Right. So I usually don’t actually try to go too deep in, like a matrix thing where we’re really talking about one call to pipe test Mark parameterize.

Each row will be one test case. However, you can stack the parameterization calls, you can stack those up. In those cases, it’s going to be like it’s going to loop through it. So it’s essentially multiplying all your different. If you have like two variables and they both have five rows in their parameterization, then you’re going to have 25 different test cases run.

Oh, wow. I didn’t realize test could do that. That’s pretty cool.

Yeah. But I like to think about that more because it does add test time and you might be happy with yourself that you’re being more robust. Is it necessary? And so often, instead of doing that, I will pick like one or two different. Like, if I’ve got two variables that are going in, it might be sufficient to, say fix one of them and vary the other, and then do another test call with fixing the other one and varying the first one or something.

Sure.

Then you end up with like ten test cases instead. And it might be sufficient. But definitely there’s times where a matrix makes sense.

Yeah. I’m so glad you bring that up, though, because one thing I’ve fought with time and time again is an abundance of redundant test variations.

Where I’ve seen people just make these gigantic tables, these gigantic matrix matrices. And as part of review I always want to ask are you sure that each one of these things is truly unique and needs to be tested? Because it adds maintenance time, it hurts the readability, and it adds execution time to your test.

If it’s white box unit level, it may not be that big. But once you start getting to integration and especially end to end level, and especially with web testing. Right. Test runtime is expensive and I mean just to give a scale of magnitude, I’ve seen certain tables that might have like ten columns and 100 rows. Right. And you just scroll and it’s nothing but a wall of text and you really lose meaning in that. So I like what you said, kind of hammer down certain things to be one and then vary only the variations that need to be varied. And that can slim down your testing burden.

So to speak, during development, if you want to go crazy and brute force hit everything.

If the domain isn’t too large, if multiplying it out, you might have like 1000 test cases, but it runs fast enough for you during development, you go for it. But I wouldn’t commit that to the main branch. I would probably prune that down to something reasonable and thinking about like splitting up your input vector into reasonable regions that represent kind of the full scale. And the other thing to do is if you really are going to have the mode where you’re testing a huge set of input just because you want to be exhaustive, you’re probably not as exhaustive as you think you are.

And using something like hypothesis might make more sense.

I was just thinking about that. Yeah. Hypothesis or other property based test tools can really get in there and generate those variations for you.

The property based things. Again, I think used sparingly. They’re very helpful. And one of the ways they’re helpful is to make you think about what the properties are. Because if you’ve got something randomly generating input, you have to figure out how that input is supposed to affect your system. So you’re not really having a full complete random input. But it’s a range of input that should affect your output in a predictable way. Correct.

Where you divide by equivalence class.

Yeah, equivalence classes. Both equivalent classes on input and also equivalent classes on output.

Exactly.

And behavior. Anyway, this is interesting, the thing to think about. I’m glad that you’re talking with this. You’re a reasonably sound person, but you still like the testing pyramid. And I’m kind of somebody that’s already thrown the testing pyramid away.

I guess with the testing pyramid. If we can touch on that real quick.

I’m assuming most people know what it is, but I’ll just describe it very quickly.

Imagine you have a triangle with three layers. And at the bottom layer you have unit tests. The middle layer you have integration test and code. Top layer you have end to end tests. For web apps, this might mean service tests and web UI tests, respectively. The pyramid States, your proportion of number of test cases should follow that pyramid structure where you have the most unit, followed by a little bit more integration and very few end to end tests.

I know a lot of people love it. I know a lot of people hate it.

What I hate when people do it, the pyramid is when they set these hard ratios for each level.

And this happens and probably happens at big companies, more than small companies. And I’ve actually been at companies that tried to do this, but they’ll say 70% of tests should be unit tests and 15% of tests should be integration test and code remaining. Actually, no, that probably wouldn’t be a pyramid balance. But you get what I’m saying, like setting percentages for target ranges. And I look at that and I think that’s going to drive really bad practice. Right. Because people are going to write an abundance of useless tests to fill a quota or they’re going to skip tests that are critical. And when I see people doing the testing pyramid that way, I hate it. I see the testing pyramid as more of a general guideline to say, here’s a decent approach, right. If you are pushing down the pyramid, that means you’re testing closer to the code. They don’t necessarily have to be single class or single function unit tests. They could be unit tests or white box tests that test a whole bunch of things together.

But I like the idea that the closer you test to the code, the more likely you are to find issues earlier. And that’s why I still remain a pyramid man.

Yeah, okay, I get it. I just don’t agree. I think it’s perfectly fine to talk about the pyramid in terms of effort. So like, let’s say to cover the similar amount of code and a similar amount of use cases, you need about ten times more unit test than what you would need with integration tests, which would be about ten times more than a higher level, like a feature test. So I would say let’s write that 1% of the feature tests that cover just as much code. And it’s a lot easier when you’re done than having to write everything else. I like to keep my options open. So I like being able to refactor and allow people to refactor and to be able to as long as you’re testing at a level where you’re reasonably certain that you can leave a big chunk of tests alone and have them be a safety net and completely revamp some internal chunk of code, even moving functions from one class to another, moving them around, deleting files, adding files.

As long as you can do all that without having to rewrite your test code. I’m cool with it because I want to remain agile. I want to remain having the ability to say after learning how to implement this once, I’m really good at it. Now I want to rewrite something that I’m proud of as a second draft, and I can use the tests as a way to make sure that I got everything right.

Absolutely.

And I think that’s important to allow the freedom of people doing rewrites.

That’s where I’m at. As long as I keep people having to be able to be creative and be able to rewrite stuff, it’s all good.

I don’t think we’re that far off. We’re just calling things different.

Yes, I agree with everything you say.

I really appreciate you coming on and talking about this with me. It’s a lot more fun than just talking with myself about it.

Sure. Anytime. I’m glad. I know there’s plenty more that we can discuss.

Yeah. So, got anything that you want to be doing lately?

I’ll be at PyCon Canada in November again talking about test automation. So that will be cool. It’s in Toronto.

Shoot, I forget the days the 10th, 11th, and I think they’re doing sprints on the 12th and 13th. So if you’re in Toronto, definitely check it out. I’m hoping it’ll be pretty cool.

The talk I’ll be giving is called Behavior Driven Python with pytestbd. So when I was back in May at PyCon 2018 in Cleveland, I gave a talk on Behavior Driven Python with the Behavior framework, and now this one for Python Canada. I’ll be talking about the Pi Test BDD framework instead.

Nice. Cool. Have you used the integration with PyCharm for pytest BDD?

I have not yet. I’ve used it for behave, but I would love to give it a spin for pytest BDD. I know they added it this past year, which is really exciting.

Yeah.

The integration with an idea is something that is making me want to try some of these Pythest PDD and cucumber a little bit more because having it help you out getting all the crust in place is nice because that’s one of my reasons for resisting trying this in the first place. But anyway. Awesome. Are you going to go to Pycascade?

I sure hope so. That’s one of the top conferences I want to go to next year. I’m picking up all the Python conferences in a row for sales. Pycon Pi, Ohio, Pai Gotham Pi Canada I submitted proposals to Pi Cascades, and I would love to be able to speak there.

Yeah, cool. I just submitted last minute. I submitted a bunch of proposals just this last night.

Awesome. Good luck.

Yeah, thanks. You too.

Thanks again, and we’ll talk to you later.

Great. Thank you. Bye bye.

Thanks again to PyCharm for sponsoring the episode. The offer they set up for this show’s listeners is only good until December 1, so to try PyCharm free for four months. Go to testandcode.com Python that link is also on the show notes at testandcode.com 51 thanks again to Andy for his time in researching and recording this episode and thank you for listening for sharing the show with friends and colleagues, for supporting the show’s through Patreon and for using the link in the show notes to try out, Pisa. That’s all for now. Now go test something. Maybe write a feature test.