55 - When 100% test coverage just isn't enough - Mahmoud Hashemi

What happens when 100% test code coverage just isn’t enough. In this episode, we talk with Mahmoud Hashemi about glom, a very cool project in itself, but a project that needs more coverage than 100%. This problem affects lots of projects that use higher level programming constructs, like domain specific languages (DSLs), sub languages mini languages, compilers, and db query languages.

Transcript for episode 55 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

Welcome to Test in Code, a podcast about software development and software testing.

What happens when 100% code coverage. You just know that’s not enough, you know you’re not testing.

We’re talking with Mamu Dashemi this week about one such problem like this. He’s working on a project called Glam, which is very cool in itself. We also cover awesome applications and versioning. This is a real fun conversation and I really like talking with Mama. I hope you enjoy it. This episode is sponsored by Digital Ocean. Listen to their segment during the show and also check out Testingco.com DigitalOcean and you can get $100 credit towards your first project.

Mahmud, thank you for coming on Testing Code.

Yeah, my pleasure. Good to be back.

I don’t remember what I should have looked this up. You were on before and we talked about actually it was quite a while ago, wasn’t it?

Yeah, it’s been over a year, I’m pretty sure.

Too long. Too long. But yeah, we talked, I think, about sort of the testing pyramid and what that meant and like whether the kind of default prioritization is actually a good one, I think a common topic, but it’s good to get a lot of data points there.

Yeah, definitely. Well, it’s good to have you back. You’ve changed what you’re doing since the last time we talked.

So much has changed, Brian.

It changes. But in this case, what you call it pretty much across the board for the better. I’m just back from my honeymoon and Congratulations. Oh, yeah. Basically multiple weeks uninterrupted with one person. That’s about as intense of a stress test that you can give a relationship, especially while traveling.

Did you make it through? Oh, yeah. All tests pass completely green across the board.

Absolutely fantastic time. If you ever get a chance to go to South Lebanon, I Holy endorse it. We’ve never been before, but very nice place.

Nice. Cool. You’re not working at the same place you were before.

I guess my trajectory is sort of like keeping in line with my Silicon Valley dream over here is basically to work at smaller and smaller companies. Of course. Companies, what do you call it? Like they generally grow and so forth.

I just wanted to be at a smaller but growing company. And so I’ve been at Simple Legal for about it’s been over a year now, helped grow the team here. It’s been a really great experience. Looking forward to continuing it, too.

Nice. Are you doing Python there?

Absolutely. Yeah. One of the nicer things about going to smaller teams and so forth is you end up with a lot less sort of politicking around technologies, generally speaking. At least here we’re very much all on board with Python and Postgres Pi tests. All the good stuff.

Nice. I like to hear the pytest bit as well.

Absolutely.

Of course, you had a few things you wanted to chat about. You had sent me a list I think, but I lost it.

And I think that was before my honeymoon. So what do you call it? Like, I’ve cashed it back in a little bit this morning, but yeah, I’ve got even more stuff beyond that, too.

Great. So what do you want to talk about first?

Sure. So let’s see. I guess the main thing that comes to mind, the biggest thing that I’ve had since our last discussion, biggest technical thing is I wrote sort of a data transformation library with my frequent collaborator Kurt Rose, and it was met with some pretty good success. It’s sort of a new programming paradigm for transforming, especially like nested data is called Glam. You know, like it sort of derives from, like the word conglomerate or conglomerate. Conglomerate is a type of rock. Anyway. So I like rock solid software. But anyway, basically allows you to transform nested data. And what comes along with this new paradigm is what I really like about it is that you sort of give it a spec and it’s kind of declarative. Right. And what’s nice about this is that you don’t have to write a lot of brittle code with a lot of branches that are hard to reach in coverage. And that’s honestly one of my favorite things about writing with Glam, honestly, like, you know, having put it in practice here is simple legal and some side projects and so forth. I think it actually goes a little bit too far. It’s a weird thing to do, maybe to criticize your own paradigm shift.

It might be too early to call it that, but yeah. So the problem that arises.

You think it went too far, but I’m still kind of lost. So how is it different than just non Glam programming?

Absolutely. Yeah. I’ve been steeped in it for what do you call a couple of months now, even thinking about on my honeymoon, I should introduce it a little bit more. So the way that Glam works is that you have some data, probably from like an API or some sort of document store. It’s like lists of dictionaries of other dictionaries of other dictionaries. It’s nested, right. It’s complicated. If you just go do a request to Twitter’s API or GitHub’s API, you’ll see the kind of data that I’m talking about.

Well, basically you want to go over that data and transform it to fit some other shape, maybe flatten it out, maybe filter out some things, maybe make them more consistent. And if, you know the shape of the data coming in and, you know the shape that you want going out, then Glam is what you use to get it there. It’s like a templating library. If you’ve ever done Django or Flask or this sort of thing, you know, that you can sort of like take text and shape it to be like HTML. And the way I introduced when I gave a brief talk on Python was that like, if you’ve ever seen someone use like a Django template to render JSON? Right. It’s kind of strange to use a text processing thing to render something that’s actually structured. What Glam does is it’s like a text free templating library. It’s basically data to data instead of data to text.

Okay.

So, yeah, it’s really handy, especially when you’re making web services that need to output JSON, which is a lot of what I think a lot of us do day to day. But it’s also really handy when you’re just processing JSON lines, log files, that sort of thing.

Yeah, I’m going to use this right away. I’m kind of excited about it because I’ve got a problem where I’ve got it’s probably a normal problem. The output is JSON, but I don’t need all of the junk in it. I only need some of the information in there and I need it in a different format. So this might be great.

No, that’s exactly what it’s there for. And it’s very lightweight. It does have a small command line utility that is similar to the command line utility JQ, which is basically a JSON query and command line thing, but that one is written in C. This one is written in Python. Cool thing about this is that when you use the command line utility there, it’s very straightforward to turn your command line thing into a Python thing. So when you want to build up the rest of the program around that JSON query.

You can do so that’s a nice feature.

Yeah. And so, I mean, so many things fall out of like having a even though it’s basically just a single function call, so many things fall out of using it. So for instance, when you do a deep dictionary access in raw Python, you only get a key error about the very last key. And if you were, for instance, using variables in those key accesses, it’s going to be kind of unclear why that deep access failed in the Glam world. It will actually give you back sort of a path that was followed to get to that point and you get back a very full featured error message that’s a lot easier to debug.

So that’s another sort of like example of what you get with Guam.

Okay.

These days we’ve refactored the internals a lot without changing the external too much. And these days it even supports things like deep setting. So for instance, if you have like an element tree like you’re doing XML stuff, let’s face it, it has to be done sometimes. Well, you don’t really want to make a copy of the full structure, but you may have some mutation operation that could fail due to a key not being there or something. So we added the ability to do deep setting as well as deep getting.

So, yeah, it’s very active development, but I think one of the greatest things is that you don’t have to write a bunch of rote, brittle code that doesn’t have that Pythonicness that we want, even though it is technically Python.

Okay, thank you to Digital Ocean for sponsoring this episode. Digital Ocean is the preferred cloud platform of hundreds of thousands of innovative companies.

Digital Ocean makes it easy to deploy, manage and scale applications with an intuitive control panel and API designed for developers.

Get started with a free $100 credit towards your first project on Digital Ocean and experience everything the platform has to offer such as cloud firewalls, real time monitoring and alerts, global data centers, object storage and the best support anywhere. Join over 150,000 businesses already creating amazing things on Digital Ocean. Claim your credit today at Testinco. Comdigitalocion.

There’s a couple of things you said earlier. You said you were worried that maybe it went too far into the paradigm, right?

Exactly. So you basically end up with 100% coverage over this declarative structure because you basically just specify a dictionary or something like that. And as a constant, it doesn’t need to be covered. It’s just set once. It’s really good for getting to 100% coverage quickly. But in reality, you’re not really hitting 100% because basically there’s some stuff that is happening inside of Glam, which happens a lot more reliably and less typo prone lead than if you had done it yourself. But it’s kind of like a Reg X. Right. Like when you have a regular expression, that line gets covered, but that doesn’t tell you whether it’s going to do what you want.

What you mean?

If I were to use Glam in my program, the Glam parts of it are going to be any sort of test through. It is going to be 100% covered because there’s not that many lines of code.

Yeah. All the code that you would have written is sort of folded into the Glam spec specification for what you want the data to be shaped like. And because it’s sort of like a higher level language or DSL, very similar to how SQL and regular expressions are embedded sub languages that we use inside of Python.

The problem then is that we have to think more about the example input and output and not just rely on the coverage then.

Exactly.

And so rather than basking in 100% coverage, we’ve been thinking a lot lately about as we write more and more complicated Glam specs, how do we test these sub languages underneath and make sure that we’re hitting all the corners that we think we’re hitting in the test phase? I was doing some research. I couldn’t find a heck of a lot of stuff out there in Python, at least. Right. About getting coverage on SQL statements or regular expressions. And so I was kind of curious if you had any thoughts on the topic.

No, actually, that’s a really interesting thing. I’m glad we’re asking. Maybe somebody else out there can give us pointers.

Of course, there’s the tried and true fallback of just like writing more tests yourself.

Like there’s the property based testing where you can try to come up with different possibilities of input and categorize them and your expected output based on some property that it’s supposed to keep or something.

Yeah.

And then throw it through hypothesis or some other property based testing tool or something.

Or just like I actually usually just in something like that, come up with handwritten oracles and putting them through as parameters into a test or something.

Yeah, exactly.

Or for instance, like a lot of one example is like a lot of things that convert from one format to another. Like for instance, the markdown kind of the model of markdown converters have the expected input or the input and expected output in a couple of directories with some naming convention and they just run the tool through and make sure that it keeps coming up with the same output all the time.

Yeah.

It’s not pretty, but it’s pretty simple to write. And since this is essentially like a data converter thing, something like that might work, but you have to have some way to come up with the Oracle then and know that you have all of the test cases that you could. But something like this, it isn’t possible to come up with all of the possible test cases.

Yeah.

You could use property based testing to do some combinatoric kind of stuff and hope the fuzzer hits it, but you don’t get that sort of sureness that you get from seeing a line turn from red to green or yellow to green in coverage Pi like HTML reports.

And that does make you happy when you see those lines covered for something like I said before.

Right. If I want rock solid code. Right. And certain infrastructural things do need to be that solid, then yeah, I will invest the time to go to 100% coverage. I think right now Glam is like sitting in the high 90s. I think that I might have some version specific stuff that isn’t caught by my tests. But yeah, I mean, I have a lot of code out there that is kind of low on coverage because coverage only captures like unit test and doesn’t capture like what happens with Selenium and so forth. But yeah, infrastructural stuff I think does benefit from being fully covered.

Well, high coverage, since I’m not going to say that you can get full coverage.

Of course, you may have some commented no cover lines or something like that.

So one of the things that surprised me is like if else, for instance, the ternary operator, there’s a handful of things that can take different paths through a line of code, but it’s all on one line of code.

Sure. Partially covered lines. Yeah.

That’s not something that coverage can tell you about.

If one of those branches is hit, the line will be covered.

So actually, coverage shopping has gotten better over time on this. I still think it’s not perfect. I mean, they’re still coming out with releases that have fixes for this stuff, but it does do partial coverage, like only one branch coverage here, that sort of thing. It’s gotten better at it.

It’s a state of the art. Definitely.

You’re right. It’s not something to what you call it, like just that color change isn’t something to chase obsessively.

And also, if you get 100%, that doesn’t mean your testing is done exactly.

No, that is very true. And that’s kind of what started all of this. Right.

Glom got us to a 100% on what would have been like kind of a tricky area to code. And in many ways it is actually covered because global will produce very reliable error messages about as good as we would have manually written. Because a lot of these cases were erroneous cases like, hey, this key isn’t set, but we were just having to do try accept ourselves and raise more semantic errors than Python’s built in dictionary errors are.

Yeah. And so Guam produces something that is pretty close to that. And so we were happy with what Glam output, but in other cases, hey, we actually wanted to recover from them or have some more complicated behavior. And the Glam co was telling us we were fully covered, but we weren’t really.

Yeah. Interesting.

Anyhow, I think the coverage on sublanguages is a frontier that we need to work on.

Yeah. Well, it’s similar to the problem of how do you test the compiler?

Because you can’t possibly have all possible languages, all possible programs that you could write with that language, for sure. Yeah. How do you know if it breaks corner cases?

That’s why we have people that really smart people working in compilers.

And that’s actually something I’ve been sort of growing. My interest in these DSLs and more specialized, like programming languages like Python is general purpose and it’s great. But within Python, one of the things that makes it great is that it has a great runtime and you can build other things on top of it within it.

Yeah. Like many languages.

Yeah. And Glam, like if you look inside it has an interpreter loop.

The AST is what you are writing when you write a Glom spec and it has a little debugger hook and all this sort of stuff that you would have in a small language. But it’s actually within Python.

Kurt and I have been experimenting with more stuff like this, and so hopefully we’ll have some more stuff to share soon. In the meantime, you can check Glam out just like if you go glom readthedocs IO. I also blogged about it at sedimental.org. If you just search glam, Python, it should come up.

Yeah, I found it really easy.

Yeah. Gl Om, nice and short.

Not to be completely harsh on a tangent, but sure, let’s do a tangent. Anything else we want to talk about?

Yeah, definitely. So the other thing that I’ve been, what do you call it working on lately? Basically non stops since I got back from my honeymoon, is another project kind of akin to the zero verse thing that I launched recently. So zero verse is like when big packages have version zero, despite being used by everyone. Like, Bitcoin is one. Like, Bitcoin is not used by everyone, but it’s very broadly used. You see, it covered in mainstream news sources at this point, and it’s technically supposedly worth $100 billion market cap.

And despite this, it’s still version zero.

I love zero.

I was actually at last year at Python, I was sitting next to somebody from the Flask team, and they had just recently switched from zero versions to I don’t know if I can.

That was this year. That was 2018.

Yeah, definitely. It was in 2018. And he said they didn’t know about zero Vert. It was in the works for a long time. It just happens to be that before they got a chance to switch out, you called them out on it, right?

It’s a real phenomenon. It’s a real phenomenon. I have some stuff in production that is Regrettably still zero version. I just need to do two more things on it until it’s ready. But no, I mean, so it was like an April Fool’s thing, and it got kind of like popular and people are still submitting stuff to it. So it’s pretty cool. It’s zerover.org. You can use the number zero or the word zero, but that was sort of tongue in cheek. That was kind of like April Fools thing, like I said. But I’m working on something like that. It’s like one of these kind of awesome lists. I guess maybe I should make it an awesome list. So, you know, there’s like something out there. It’s called awesome Python. It might be one of the most popular repositories on GitHub or something like that. It’s just a list of all these great Python software projects, requests up there and all the big names. Pandas. Lots of good stuff, but kind of dovetailing off of the packaging stuff that I was working on for the last year or so, like just sort of documenting all the different ways that you can package Python projects. There was this really big split that I was happy to see talked about in the last episode with Brett Cannon. Like, libraries are not applications.

Most of the code that we write is structured as modules and meant to be called by other stuff. So we’re kind of in this habit of thinking in terms of, like, modules, packages, libraries.

Yes.

But a lot of the greatest software out there is actually something much bigger that we sort of forget there. So it’s the application.

Whether you’re checking the gram on Instagram or listening to Spotify, even using G Edit or something like that, you’re using something that is a Python application.

Yeah.

So despite being awesome. And despite being Python, it’s not listed in awesome Python.

Right. Because it’s not Pip installable.

It’s not Pip installable. That’s maybe one of the biggest splits here, right. Is that libraries are by and large, Pip installable. Applications by and large aren’t.

So one of my favorite applications is deluge. Dluge, and it’s like a torrenting application. And it is surprisingly robust. I see it discussed in forums and other places as one of the top choices if you want to just use BitTorrent. And there’s no discussion of like, hey, it’s written in Python, it’s just solid on its own. And so I was very surprised when I went and I found its repo just a few hundred stars, I’m pretty sure. Maybe even less. I don’t know. And it was being maintained with surprisingly little code. It was full fledged.

You could apt install it on the bunch.

It was actually really out there and deployed, living amongst others, like it’s C Plus Plus peers. And so I was like, well, this is a real literal application.

I would read the code, study the code, learn its patterns. Right.

Yeah. So because of Piper Pip and mostly because of that, we’ve got a lot of examples that we can look at for libraries. But you’re right. Examples for applications. I don’t know how to find those.

Right. And so it’s kind of funny. I’ve just been thinking of every class of application that I use and typing into Google like text editor written in Python, web browser written in Python, dictionary app written in Python, and instant messenger written in Python. And it’s turned up a decent sized list. Not huge. Right. But so far I’ve got probably 50 or 60. And so just this morning I was taxonomizing them.

I think that’s a great idea.

Yeah. And when I come to my job, I’m writing an application.

And when I’m thinking about a startup to start one day. Right. I’m thinking about an application. I have all these great examples of libraries to learn from for my glams and so forth. But what patterns can I learn that are actually practical in applications that scale to be mainstream? And so I’ve found some really cool stuff. It’s pretty niche in some cases, but I think that this stuff is going to be a treasure trove of real resources that people can use. And what’s funny is that most of the repos only have like 200 stars on GitHub, so it’s never going to be surfaced in your Discover tabs or your Explorer or your newsletters or stuff like this as being Python software. And I would say at this point, about half of them are kind of focused on developers, even though they aren’t really libraries.

So one example would be there’s, this one called Meld, and it’s written by, I think, the Gnome project. So it’s only got like 300 stars. But I know a couple of people who swear that it’s the best dipping application, and they don’t even think about the fact that it’s written in Python and Pi GTK.

I’ve never even heard of it.

Right. It’s just under GitHub. No Meld, it’s there on GitHub, and you can see how they do their tests, you can see how they do their docs. You can see sort of what design patterns they’re using, what works for them. And one thing that I mean, forgive me for sort of expounding, but I would like to think this project could lend itself too, is that if we, as Python programmers begin to use the software and then begin to sort of learn from the software we use, we probably constitute some of the best users of the software that the software could have because we would actually be able to contribute back if we were to find a bug.

Right now the list has been growing exponentially, but it’s actually starting to taper off a little bit, and I’m looking to get it out there probably by the time this episode there. I don’t know what your turnaround time is, but hopefully it’s going to be out there. And I think I’ll just go with that. I’ll probably call it awesome Python Applications and it’s featuring like, yeah, I mentioned Instagram, Spotify, YouTube. These are all Python applications, but I’m going to stick to the open source ones for this list.

Okay. Is this going to be just from your GitHub page? Yeah.

So my GitHub is just GitHub.com Maud and I’ll pin it on the front page so you don’t have to type the whole thing out. But I’m finding so many cool things that even I myself want to use. I found several sort of like video editing and audio editing things that are written in Python, and I’m just curious, how usable are these things?

Yeah, it would be great to hear both the good and the bad hearing if people try stuff out and say, well, yeah, there’s some caveats. If you’re using it for the use model that I’m using it for, it doesn’t quite work well or something like that. Absolutely good information.

Most of the ones I’m looking at here, I am trying to keep a standard. It does have awesome in the name. I’m trying to pick things that are maintained and appear to have users and so on and so forth. That would be actual good examples to learn from and not just use.

Right. But I was trying to imply that sometimes people think something sucks, not because it actually does, but because it just doesn’t fit how they’re trying to use it.

True.

I will maintain as much neutrality as possible and open mindedness when it comes to the good old Pull request.

Yeah, definitely. Well, cool. That’s exciting.

Yeah, I’m excited. I’m super excited.

Okay. Anything else on your mind that you want to talk about?

It’s lately just been those two things, like on the technical side, working on Glam and trying to get it to the point that like, you know, I have really extensive docs written for it, and I would like there to be a testing recommendations Doc. Like, I fully support that it is safer to use Glam than handwriting the stuff yourself.

But that said, I don’t think it’s perfect testing recommendation for people using Glam and how to test their Glam based application.

Exactly.

That’s sort of like on my mind, on the technical side and then on sort of the more content side and sort of surveying the landscape side, it’s basically been this awesome Python applications list that I’ve been accumulating over the years. And now I’m kind of focusing.

On a slight tangent. We did bring up zero ver.

Sure.

Are you maintaining Calvert as well?

Yeah, sure. I don’t know if we’ve talked about it before, but back in, like, I think 2016, I put up a site for a phenomenon I observed. I like to observe and survey the landscape called Calver calver.org. And it’s Calendar versioning. You basically set your projects release version number based on the date in some way. And so it’s kind of a suite of patterns and you can sort of pick and choose as suits your needs. But tons of projects do it like Ubuntu has 16 four because it was released in 2016, April. Right. And then 18 four, right. April. And what’s really cool about that is it provides a nice semantic for canonical where they say, look, if it’s an LTS thing, we’re going to support it for five years. So you look at your version, it’s 16, you add five to that.

Right. And so you have 2021 April. Your operating system is going to stop being supported.

Yeah, well, like, for instance, most Python people are probably going to be familiar with it because Pip changed from.

Yeah, definitely.

Version.

We jumped from version ten to version 18.

Exactly.

And a lot of people are like, well, what happened to the other seven versions?

It’s a good question. I can see the confusion. Semantic versioning has been out there for a very long time and Calendar versioning is sort of like one of I think it’s the biggest alternative to Semantic versioning, other than zero ver.

Well, I guess you could be zerover and Calver if the last time you released your software was in 2000.

Yeah, that’s true. I guess that’s true. I have not found that out there. I thought for a second you’re going to say year zero, and depending on your calendar, there is no year zero anyway. But, yeah. So with Calendar versioning with Calvert.org, I just sort of made a list of big projects using Ubuntu being among them.

But also like, sort of my big question with it is with Semantic versioning, we try to make it so that one of the promises that some people make with Semantic versioning is we’re not going to break backwards compatibility, except for possibly at major version numbers.

Right.

So is there some consistency with Calver? Like we’re only going to break your API once a year?

This is sort of like deep versioning discussion.

I don’t know how many people we’re talking to right now, but I have definitely thoughts on this in brief. What I would point out is that most mature software projects are adopted by enough people that it makes sense to sync with the calendar anyways. Right. Because those people out there have software release schedules. The rubber has to meet the road somewhere. I think most of the vocal developers online, including you and me, we think a lot about those early phases in the project where we want to have that utmost freedom to break stuff at any time. And the thing is that most software projects sort of mature past that phase, and that’s when versioning starts to matter most, because that’s when you have the most users.

Yeah.

So with my application that I work on at work, simple, legal, we actually do weekly releases and they are calendar based. And we basically have an SLA for our customers and contracts and that sort of thing. So we don’t like to, what do you call it? Just phase something out really, really fast. So we give them a certain period of time and there’s like a version number that helps guide us to make sure we’re not breaking things too quickly.

That actually makes a lot of sense because I was thinking about semantic versioning. And sometimes if a team wants to change something and remove something, well, then they just bump the semantic version. But if it happens too fast, actually users of it will complain and go, I just changed my code to fit this and then you remove it.

Yeah. I think that the semantic that matters the most is how long something is going to be supported for. Most of us probably believe that software is not math, it’s not forever. And so with Twisted, I think is a great example of this. They have an open stated sort of deprecation policy that something is going to be deprecated for at least six months or something like that. And you can infer how long your feature is going to be. Okay for based on the version of Twisted that you’re using. And that’s a really powerful mechanism, a lot more so than trying to remember whether something gets dropped in 8.0 or 9.0 or 14.0, especially if you’ve looked at WebKit and the browsers, they move so fast that the numbers stopped meaning anything. And you just wish that you could anchor it to something else. Like if I were to tell you version 181, when was that released? You have no idea. Right.

Okay.

Interesting. Well, okay. Well, I’m glad I asked. So that was a fun chat, and hopefully I’m sure we’ll have an excuse to bring you on and chat again.

Yeah, hopefully awesome Python applications comes out and some test and code like contributors help me find some of these hidden gems. Yeah, I think definitely we can come back on we can come back on and maybe talk about like, a survey of testing strategies from real life open source applications.

That’d be great. Yeah, cool.

Awesome.

Looking forward to it.

Well, thanks a lot and we’ll talk to you later.

Yeah, thanks very much.

Thanks again to Mamu for talking with me today. Also thanks to Digital Ocean for sponsoring this episode. Again, check out what they have to offer at testincode.com DigitalOcean they’ll give you $100 credit towards your first project. That link is also on the show notes page at testandcode.com 55 and thank you thank you for listening for sharing the show with friends and colleagues, for supporting show through Patreon and for using the link in the show notes to try out digital ocean.

That’s all for now. Now go out and test something.