105 - TAP - Test Anything Protocol - Matt Layman

The Test Anything Protocol, or TAP, is a way to record test results in a language agnostic way, predates XML by about 10 years, and is still alive and kicking. Matt is the maintainer of tap.py and pytest-tap, two tools that bring the Test Anything Protocol to Python. In this episode, Matt and I discuss TAP, it’s history, his involvement, and some cool use cases for it.

Transcript for episode 105 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

00:00:00 The test anything protocol or Tap is a way to record test results in a language agnostic way. It predates XML by about ten years and is still alive and kicking. Matt Layman has contributed to Python in many anyways, including his educational newsletter and his Django podcast. We forgot to talk about his podcast on this episode, but I’ll link to it in the show notes, so be sure to check that out. Matt is also the maintainer of Tappy and pytest, Tap, two tools for bringing the testing protocol to Python. In this episode, Matt and I discussed app, its history, his involvement with it, and some cool use cases for it. I’m your host, Brian Akin, and I couldn’t put this show on without the amazing support of Patreon supporters and our sponsors. So thank you, Patreon supporters, and thank you to Pie PyCharm for sponsoring this episode.

00:01:06 Welcome to Test and Code, a podcast about software development, software testing, and Python today on Test and Code. I am excited to have Matt Lehman on. I usually massacre everybody’s name, but yours seems easy. Did I get it right?

00:01:26 You did.

00:01:27 Okay.

00:01:27 It’s pretty easy one.

00:01:28 I’ve known Matt for a long time. I don’t know if we just, I guess followed each other on Twitter. We met at a couple of Pikon’s also and hung out. So, Matt, for people that haven’t hung out with you at Python, who is Matt Lehman?

00:01:41 So I am a software engineer. I’ve been doing software for many years at this point. I started my career out at Akheed Martin doing satellite software, which will be pertinent to the story that we have today, I think, at least in part, and did that for about eight years as communications satellite software and then moved on to do web development in Django at an education start up. And today for the last couple of years, I am at Doctor on Demand, which is a telemedicine company, and I am the software architect there. So I’m responsible for making sure that the system is going in the right direction and helping our engineers unlock their potential with getting them the right tools and all that sort of stuff.

00:02:22 That actually sounds fun. Do you enjoy it?

00:02:24 I do. It’s something very different from the day to day of doing a lot of coding. And I’d still do some software development on occasion. But getting the opportunity to step away from the code a little bit and think about it at a higher level and think about how the big pieces fit together. Because we’ve got Django, we’ve got JavaScript, we’ve got all sorts of languages for different iOS and Android clients and that sort of thing. It’s a fun opportunity to get your head wrapped around something that big. Okay.

00:02:53 And do you get to work with lots of people then helping other teams?

00:02:56 I do. The engineering team is not gigantic a Doctor on the man, but we have probably in the range of 50 or so engineers, varying from Python developers to JavaScript to iOS and so on.

00:03:09 Sounds fun. I probably could get into some of that quite a bit on maybe a separate episode.

00:03:14 Absolutely.

00:03:15 One of the things I wanted to talk to you about and you volunteered. Whether you knew you volunteered or not, I can’t remember was to talk about Tap or the Test Anything protocol. I don’t even know. I’ve tried to get in understand this a little bit more, mostly because of the testing podcast. Seems like I should know about this, but I have a hard time getting my head around it. So how do we start this?

00:03:37 It’s a great question. I think I would start by thinking of Tap as well. It’s a protocol, so it stands for Test Anything protocol, and it might be most analogous to, say, USB for test output. There’s a well defined format that can produce Tap. That’s what it’s generally called.

00:03:58 Those are considered producers. And on the flip side, there are consumers that know how to consume Tap. And so it’s this basic interchange format that a test runner can produce, and then some other component on the other side can read it and process it and display it or aggregate it or do whatever it needs to do based on test results.

00:04:18 Okay, how did you get involved in this, or how do you care about this?

00:04:23 Well, I got involved with this because my first language, the language that I really grew to love, was Pearl. So I started out at my University and learned C in College, and I just absolutely hated it. No offense against people who love C Plus Plus. It just was the first language I was taught, and it did not click with me at all at the time. So I really felt not great as a developer, frankly, when I got my first job, I didn’t have a solid foundation to work from. And when I was at my first gig, I worked on the GPS ground control software. We had to do a lot of testing, and so there was an entire tools organization because this is a project that has millions of lines of C and C Plus code in it. And an important project like GPS, as you might imagine, has a lot of rigorous testing around it. And in that space, I ended up working on the tools team doing a lot of Pearl development. And Perl was the glue that was holding everything together. This was early 2000 timeframe, so it’s still a very popular language, and I understand it’s a popular language today, but maybe less so. And we needed to write some tests for these tools. We’re writing layers of tools here, and they were such an important component to how all this was plugged together that I got into learning how to write testing in Perl. And that’s important because that’s where Tap actually originates from. It came out of the Pearl project. In fact, if you go back and look at the history test, anything.org is the website for this, if anybody is curious. And there’s a history page. And the first commit of Tap was all the way back in 1988 by Larry Wall, the originator of Pearl. So Tap has been around for an extremely long time.

00:06:11 Okay. Larry Wall is the originator of Pearl, right?

00:06:14 He is. He’s the Guido Van Rosam of the Pearl community.

00:06:18 Okay. But then it’s relevant to Python as well. We can talk. Python talks Tap as well.

00:06:24 Yeah. I think this is the really interesting part about Tap. Is it’s a specification or a protocol that you can understand in roughly ten minutes? The spec is on testonathan.org, and it’s seriously something you can read during a lunchtime and get 99% of what it does. I’ll describe it on audio. I know that’s a dangerous thing sometimes, but I think it’s understandable enough that this is expressible even in audio format. So the Tap format works by producing an okay or not okay for a test result. And it sort of boils it down to that boolean state of the world and will print out a line. It’s a line based protocol, and it will print out a single line for each what they call test point. And so that could be individual test in the resulting test runner, or it could be a set of subtest, or however you want to group. It sort of depends on the language. But there’s a point assigned for each test in a test suite, and all this stuff gets dumped out as individual lines. And so what makes that so appealing is that you could write a Tap producer in a sitting of less than an hour going through the format of adding in okay and not okay are the two things that are critical to this, and just printing that out. So it’s a pretty quick and convenient way. And that’s exactly what Larrywell did. In fact, in the first version, it did nothing more than print out okays and not okays for the test suite. That was how they built the test runner. Okay.

00:07:55 I’m thinking in Pearl or in Python, either one. I don’t think I’ve ever written any tests in Perl, but both of those languages, it’s fairly common for people just to be printing stuff to the standard out.

00:08:09 True.

00:08:10 So is this stream going to standard out also, or is it going to a separate stream?

00:08:15 Well, it can do either, but there are parts of the spec that say if you’ve got extra stuff in your output stream, Tap will just ignore that. So it’s only looking for lines that start with okay and not okay. And there’s a couple of other things in there as well. You can include diagnostic information. So let’s say you have a failure line that starts with a not okay and some sort of description, and you want to dump out whatever your test suite was producing at the time to show the context of what was the failure. So you can start a series of lines with a hashmark Octave or whatever you want to call it and dump all that information. It will be associated with that test point. And is there a way to give you the extra diagnostics you need that go along with the test format? But anything outside of those basic descriptions and a couple of other things are just ignored by the specification. So that’s what makes it super flexible as well. It follows Postal’s law in that way of being Liberal and what you accept and conservative and what you do.

00:09:16 Wow. Okay, so I really could just have this protocol. I could implement it myself. If I had some sort of even a hand built test runner or something, I could output this stuff and then have something else read it.

00:09:29 Yeah, that’s exactly right. And in fact, that’s exactly what I did. I mentioned that my history in satellite software was a piece of this and what I did in a particular project. It was for the Iridium satellite phone project, the application software that runs that we had a need to run stuff on a test board in a testing environment and we’re very limited in what we could use there. And so we made a Simplistic tap producer that dumped out.

00:10:01 I can’t recall now if it was either on a connected serial Port or if it was a local file that we could read off the test board, but it was able to dump out the okay and not okay line as we did high level integration tests on a variety of subsystems that were all brought together to make this stuff work.

00:10:16 That’s pretty cool.

00:10:20 Thank you PyCharm for sponsoring this episode. I’m working on a few new projects and I’m so glad PyCharm is there by my side when I think I am ready to commit and push some code. I love the seamless Get integration, the ability to revert some files, commit some files, add files that aren’t tracked yet, add some files to the Get, ignore, and even do partial commits of files, leaving out debug statements and whatnot all within one interface is super cool and super intuitive. I even use it for non Python files. Of course, the integrated testing support is unsurpassed. I really like to put some code in test functions just to try it out instead of using the reply, even though PyCharm has the built in reply. It’s just so easy to have a bunch of snippets of code and different test functions in a file and just be able to run one at a time. I even do this just to play around with new data structures or APIs. Pycharm saves me time in so many ways. Find out how it will save you time. Go to testandcode.com PyCharm and try out PyCharm for four months before deciding if you want to stick with the pro or use the community.

00:11:24 Okay, I wanted to come back before we can move ahead. You said a couple of things. You made reference to something wrong.

00:11:29 Yes, postal Postel, and I don’t remember the individual’s first name, but it is a law that was connected to.

00:11:40 Oh, gosh. Testing my knowledge, TCP.

00:11:44 I’m not positive, but the idea behind it is to be Liberal in what you accept from others and conservative in what you do. It’s a design methodology that allows certain protocols to operate pretty well with others. They have a very narrow scope of here’s what I’m capable of doing, but it’s saying I will accept all the garbage you try and throw at me and you can look at things like HTML. This is why one reason why HTML is probably so successful is that you can write pretty garbage HTML as your first attempt if you’re learning it, and a browser will bend over backwards to try and render whatever it is you produced. And so that’s an example of postal slot in action where it is doing everything that it can to accept your input and give you something meaningful back in conservative in what it’s actually trying to do.

00:12:38 Okay, that’s cool. I appreciate it. The little lesson I’m chuckling because I’m remembering my early days of learning HTML in College with Perl CGI scripts on the back end, and I made Liberal use of how Liberal the web browsers would allow you to just write stuff. Because a lot of my Pearl generated HTML was really garbage HTML things like just intentionally leaving off the closing tag because I didn’t have to put it there. Things like that.

00:13:11 Yeah, it’s a fantastic method for learning when you have that opportunity to grow into what you need to know.

00:13:17 Who cares if you got something a little bit off, if something is able to correct it for you and perhaps tell you, hey, silly, you need to go adjust this and probably do it better if you want it to be perfect, but it gives us all the chance to grow into something.

00:13:32 And also, thanks for the reminder that the pound sign or hash Mark is also called octathorpe. It is so much more fun to say, so I think I’m going to try it.

00:13:40 It’s a great word.

00:13:41 It’s a good word. Okay, so I’m outputting this stuff. Why? What’s the benefit? I have a different protocol. I already have protocols for collecting test results data. So I’m assuming that the consumer side is where the benefits happen.

00:13:57 Partly. And also it has to come down to some history. I was looking at this before we got on the call, trying to think of, well, why does this exist today? And I think it might be mostly because it’s old, it’s just been around. It’s something that’s understandable in a couple of minutes and does its job. But there are other techniques for collecting test results today. Like if you look at X unit frameworks, we’ll often see a JUnit XML file which will dump out test results. This is something that is commonly done if you want to work with coverage in Python and true of a lot of other languages, but you have to remember that the XML One auto spec didn’t even come out until 1998. So this protocol predates that by ten years. And so it was a way at the time to get test results when there wasn’t a conventional structure to report on test results.

00:14:51 Okay. And it also is kind of cool. Well, yeah, we use Judith, I guess not all of us JuD XML is kind of fun in that you can have multiple languages produce it, but there are limitations of it being from Java to rear its head every once in a while.

00:15:10 True. And there are other aspects too about it of not everybody has an XML parser handy.

00:15:16 In my satellite system example that I gave you, we didn’t have XML available to us on the VxWorks real time board that was available. We had the ability to dump to a standard out or print file kind of location, and that was about it. And we probably could have cobbled together a very crude XML format if we needed to. So it’s not to say it’s not possible, but the tooling wasn’t there to make it easy to do.

00:15:42 But I did notice I was looking at the website. The testing was the name of the website.

00:15:47 Again, it’s testanything.org. Okay.

00:15:49 There are like plugins for some of the continuous integration systems that read this output as well. So you could have, for instance, if you’ve got a project that has JavaScript and Python that you didn’t want to deal with the June XML, you wanted to deal with something else. Various parts of your system could be tested in different ways and all produce this shared output. And then you could have a report like in Jenkins or something that reported on all of the test results no matter what language they were tested in.

00:16:21 That’s the beautiful part about the consumers.

00:16:23 By being a protocol, it’s totally agnostic to whatever the language is. It’s just a consuming whatever you give it. And as long as something can output that format, you could have a thousand languages in your project and a Tap consumer wouldn’t care.

00:16:36 Yeah. Okay. There were a couple of projects that it linked to in the Python world there’s pytest Tap, I think, and then tapping. Can you tell me what these are?

00:16:49 Yeah. In full disclosure, I am the author of both of those packages. Well, good.

00:16:53 I got the right person on the line.

00:16:55 You did.

00:16:57 So Tappy and Tap are what I saw was missing in Python’s ecosystem. So Perl has this by default. The default test runner produces Tap out of the box, but I thought it would be nice to have a bridge for Python to do the same thing. So if you’re using unit test or using pytest or I also have there’s a nose version of this plug in as well, and you want to output your stuff to tap. I wanted to make it as trivial as possible to add a package to your system so that you could produce that if you so desired.

00:17:29 Okay, so the pytest tap that’s like the producer part. It produces this output.

00:17:36 Yes, that’s true.

00:17:37 And does it produce it just in the standard output, or does it send it somewhere else?

00:17:42 There are a few options on it so that you can dump it out in a variety of ways. The default is to use the stream version where it will dump to standard out, but you could write to a specific test file for dumping all the results to a single place if you chose. Or you can also break it up to dump it out into individual files based on, say, test case. It’s a little less useful for Pi test, since it tends to be so function based, but for a unit test suite that by default has to use a test class, you could dump it out into individual files if you chose.

00:18:17 Okay, I’ve definitely abused the JUnit format before because what did I do? I think that I’ve been abusing it by if we are not using classes in our tests. The junior XML has a class field, so we’ve been filling that in with the module name instead.

00:18:35 Okay, like a great alternative.

00:18:38 Yeah, I’m not opposed to organizing tests within classes. Sometimes it’s a fairly handy thing to do.

00:18:44 It’s actually my preferred way to use pytest. I like having most folks that I’ve seen tend to prefer the function level and the module scope, which is great. I have nothing against that. I actually like putting my stuff into classes so that I can see if I’m going to test one class that’s in my production code and having a single test class to go with it. This brings a nice parody that I prefer, but that’s the nice part about Pi test is so flexible. You can do that. You don’t have to do that, and you can choose your own adventure.

00:19:14 Yeah, and for instance, if you’re using a text editor that automatically throws the self variable in for you within methods, it really doesn’t take any longer to write methods than it does to write just functions. And then you can also utilize that grouping to be able to just run like, if you’ve got two or three tests together in a class, you can just run them together. I’ve never done this, but it would even make sense if you’ve got a big test file to be able to just run a few of them. You could even develop them as class in a class and then take them out of the class if you wanted to right.

00:19:47 And you can add class level fixtures, which are a scope that’s available in pytest. That’s one of the probably infrequently used scopes, so that if you have something that only applies to a certain test case that you can limit it to just that test case.

00:20:01 Yeah.

00:20:02 A few related methods without the class level, you can still do a module scoped fixture that’s only used by three functions. Say it will kind of do the same thing as a class, but you have to jump through some Hoops. But classes are neat sometimes. How about Tappy? What is that?

00:20:20 So Tappy is the sort of the parent project. It’s where I did all of the work of collecting the different Tap results and giving one place to track all the statistics and counting that goes along with the Tap protocol and then the Pi test, Tap and nose Tap. Those are our children that are basically just importing the Tap module from Tapi, which is doing most of the work, and then pytest Tap has the actual pytest plug in. When I started the project, I originally bucketed everything together into a single package called Tappy, but the feedback that I got was, hey, I’m using Pytest, but you’re also making me now include Nos in my packaging, and that’s not great. So ultimately I satisfied it and split the things out into their constituent pieces.

00:21:10 Okay. And is there a consumer part of this within Python?

00:21:14 I have written a consumer as well within Tappy so that you can when you install the Tappy package, which I didn’t get a great name on Pipy, so it’s actually Pipinstall. Tap Python ones were the Title analysis protocol project or something like that. But what can you do? I was late to the party on that front, but when you install Tappy as something in your project, then you get access to a Tap script that will be able to read in Tap as a consumer, and in its default behavior, it will spit out in the standard Python format. So you can conceivably have a Python test runner that will dump out as a Tap producer all of the Tap. And you could pipe that into the Tap executable or Tap script and still get the same standard Python unit test style execution out the other side. So it can do end to end, if you will.

00:22:13 Okay. Interesting. The other thing I was curious about is it all a post process sort of a thing, or let’s say if I have a producer that’s producing to a stream or something, can I have a consumer reading and reading real time? Is that something supported it’s?

00:22:33 Not in my project. You’ve identified one of the three open bugs that’s in Tappy. I wrote it to just read after the fact. So it will sort of stall on you if you’re trying to do something in real time. But it’s certainly possible it involves me going in there and writing some generators to pull off from a stream, and I never prioritized on it, so I took the simple approach of just let me get it all at once and then dump it out. But there’s nothing stopping a Tap consumer from reading a stream as it’s coming off of whatever producer is producing it.

00:23:05 Hey, we’re on a platform here. We can call this out and say, hey, anybody that wants to jump on an open source project and help out, this would be a great thing to jump on and do a pull request and help you out, right?

00:23:15 Absolutely. Yeah, I’d be totally open to a pull request.

00:23:17 Yeah, definitely.

00:23:20 It seems like a fun project and neat project. You said you started being interested in. You’re the maintainer of these projects. How long have you been working on these then?

00:23:30 Oh, goodness. I think I made my first commit in 2014. Okay, so they’ve been around for a while at this point. Okay. It was mostly, frankly, a hobby of mine just to see if I could do it, because I identified that there was not a lot of there were some packages that existed that already did some Tap stuff, but they did it using non standard APIs that you would expect from Unit test. And I couldn’t find a tool that plugged in Tap as a producer with the built in Python Unit test classes, and that’s what I wanted to fill in and make available to the community.

00:24:11 Okay, this might be kind of fun as a jumping point for even like, let’s say this is a common thing to have a project have their own internal test runner and their own special format. But since this format is fairly easy to get your head around, it shouldn’t be too bad for somebody to be able to implement the Tap protocol within their own system. And why you would want to do that possibly is let’s say you wanted to start using pipest instead of your own one, or start using something else. Unit test. I don’t know if you’re going to change you may as well change the pytest, then you could combine the output and you could still run your old stuff and combine the output of the two, maybe, or use some of these as a jumping point to produce. It’s probably easier to write the Tap protocol into an old system than it is to make an old system generate juice XML files, for instance.

00:25:08 Yeah, it sounds really reasonable.

00:25:09 So I could even take a start producing Tap from an old system and then take that output and convert it to Judy and XML if I wanted to or something.

00:25:20 Sure.

00:25:20 We can also think of an example that might go the other way. So I’ve been reading a book called Crafting Interpreters. It’s by a gentleman named Bob Nicetrom. And it’s about building your own interpreter language. And it’s a fascinating book. It’s still a work in progress, but he does a great job of illustrating it of how to build your own language. And at some point he has a Sidebar talking about tests and how to write tests. And the statement that he makes in the book is that you should try and write your tests in the language that you’re building. And the value of doing that is that if, for example, you’re going to make multiple versions of your interpreter and he actually does two in the book, he does one implementation using AST walking in Java, and then he does a bytecode implementation in C, and by writing a test suite that was using the language itself, he was able to share that between the two implementations. I could foresee a world that if you’re going to do something like writing a domain specific language or writing a new language, if that’s your speed, although that’s not what most people would be interested in doing, you could use Tap as your output and pass it to a wellwritten consumer that already exists. So you could be testing your own brand new language and get the results aggregated in a programmatic way to make sure whether everything was passing or failing.

00:26:48 Oh, yeah, that’s cool. Like that. Where do we take this? Did we cover Tap enough to get people started? If they want to know more, they’ve got the Test Anything website. They’ve got pytest Tap that they can install. So I am guessing that I don’t have to install the Tappy. I just installed pytest Tap, and then I get both of them, right.

00:27:09 It’s got an install requires in its packaging, so it will pull in the right version of Tappy. Okay, cool.

00:27:14 And then if people want to get a hold of you, you’ve got your own website, right?

00:27:19 I do. I have a site at mattlayman.com, and I write a lot about Python and Django. I’m spending sort of devoting all of this year, mostly to Django topics, to kind of niche down for a little while and produce a bunch of content on giving people a good overview of Django. But in the past, I’ve done a variety of Python articles, like making games with my son, and all sorts of other random stuff that has been interesting to me at the time.

00:27:47 And you’ve started a newsletter. I think I’m getting your newsletter, yes.

00:27:50 That was something new that I figured I’d do. I try and do a lot of things for the Python community, and it’s hard to get your information out there sometimes. So I’ve got a monthly newsletter that I aggregate together all the things that I’ve done for the month. I do Twitch streaming. I write articles. I try and make an impact where I can in a small way. And so it was just a good way to broadcast. Here’s what I’m working on, in case you’re interested in learning more about Django or learning more about Python.

00:28:15 Well, that’s one of the things I appreciate because Django is one of the things that I want to do a little bit more of this year.

00:28:22 Cool.

00:28:23 It’s a good framework.

00:28:24 Yeah.

00:28:25 Well, thanks so much and good luck with growing the newsletter. Hopefully everybody will subscribe so Tappy it’s not really an active project. I mean it’s not dead but you’re not working under a lot.

00:28:38 It’s one of those projects where I could wish I could Mark it as active but stable. There’s not a lot that as I said, the protocol is like dozens of years old so there’s not a lot of churn in what happens. So I could spend a bit of time making it stream consumer input but the producers and the actual output side of it is a pretty stable project at this point. So if some group needed to depend on it and some do I get those poll requests periodically of hey, we needed to support this Python version or it doesn’t quite do this edge case this way I still do actively work on it. It just doesn’t need a lot of attention. Okay.

00:29:14 Yeah, actually, I kind of love projects like that so thanks a lot. Thanks for coming on and teaching us about it.

00:29:21 No problem. Happy to be here.

00:29:26 Thank you, Matt. I learned a lot from this episode and thank you, PyCharm for sponsoring this episode. The link for the extended profile is at testincode.com piechar. That link is also in our show notes at testandcode.com 105 while you’re there, check out the slack channel@testandcode.com. Slack maybe. Thank all of the awesome folks hanging out there for answering testing related questions. Cool group of people also head over to.com support and become one of the many Patreon supporters that help pay for the show. Thank you so much to our Patreon supporters. That’s all for now. Now go out and test something.