140 - Testing in Scientific Research and Academia - Martin Héroux

A discussion about open software and software testing in scientific research and academia.

Transcript for episode 140 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

00:00:00 Scientists learn to program as they need it. Some of them learn it in College, but even if they do, it’s not their focus. It’s not surprising that the sharing of the software used for scientific research and papers is spotty at best. And what about testing would hope the software behind scientific research is well tested, but why would we expect that? We’re lucky if CS students get a class or two that even mentions automated tests, why would we expect other scientists to know how to test their code? But I think we should really. It’s a disservice by universities to not cover this for all fields, not just CS and not just science. But that’s my flag to wave, my drum to beat. However, Martin Haru is at least open to the idea that probably we could do better with open software and tested software and scientific research. I’m super excited that Martin reached out to me to come on the show. I hope you enjoyed this episode, because I really did.

00:01:11 Welcome to Test and Code.

00:01:21 Hey, Martin, how’s it going?

00:01:22 It’s going good.

00:01:23 So last name is Hiro, is French Canadian. But one time when I was at the Seattle airport, they had to check the passports. And the Lady American obviously looked at my passport and was on the speaker and kind of was calling out all these names. And she paused. I said, Martin.

00:01:41 And she said, Heroics. And I’m like, yes, it’s not heroics. But anyway, that was a highlight for me. That’s great.

00:01:50 Martin Haruv is fine.

00:01:51 Hero. Hero. Yeah, but how did you pronounce it ahead of time?

00:01:55 I don’t say the H, you just say. And it’s got the little thing on it. So it’s a and then the X is super. You don’t even mention it. So it’s just the middle letters.

00:02:04 E, French Canadian. But you’re in Australia.

00:02:09 Yeah. Job wise, I came out here to do research and kind of didn’t know how long I was going to stay. And here I am, I don’t know, eight or nine years later.

00:02:17 Okay, cool. So, Martin, what are we talking about? You sent me a list.

00:02:23 A big list.

00:02:25 Yeah.

00:02:26 I’ve just finished this thing with Bob and Julian, and so that kind of pushed me to that next level. I made the commitment to put a few things up on Pip, which as a process is a bit of a if you’ve never done it, it’s a bit intimidating. And also just all the like, I’ve heard you mentioned wheels and all those things a bunch of times on the podcast, and I’m like, what are those things? And so to face up to all those things and understand that. So I did all that, and that’s just something to discuss. But really, what got me into it is one of my to do with Bob and Julian was I want to learn to test because I had your book. I’ve read about it as a scientist, you had a guest on not that long ago that was highlighting that he was collaborating with researchers and he pretty much forced them to write tests because it’s just not something they did. And I see that all the time. I’ve been coding for 15 years maybe, and only since I’ve been listening to you guys’podcast that I actually really hear about Test And Code. And I just find it a bit appalling that as scientists, we don’t do it. But then on the other hand, you’ve made a point a few times on your show, and I agree with it, is that we write our code and then we’re done most of the time. Here’s a study. I write the script that goes through all the subjects, gets my data, do my stats, and oftentimes I never look at it again.

00:03:36 Oh, really? Okay.

00:03:37 So should I write tests then? I mean, there are packages that I would reuse, and that’s what I put up on pipe. Yeah, cool. That should be tested. But it seems that everything that we put out in our publications, the numbers that the world should trust, is left up to this black box behind closed doors at a University. And we just have to trust that the person is doing it well. And the incentive system now is kind of screwed up in universities and academia, it’s all about getting it out there. It’s all about high impact. It’s all about making it cool and flashy. And some people are kind of cutting corners. Some people are doing things that maybe they shouldn’t, intentionally or not. And so I’m kind of a proponent of the open science and good science movement that’s going on. And I think testing is a part of that. But nobody’s on talking about it just yet. And I’d be interested in having a bit of a conversation about how does that fit in. The testing is a lot of work. But on the other hand, in people, scientists, like everybody else, just want to get their work done. They just want to get it done. If the answer fits their hypothesis, they’re not even going to ask a question. They’re just going to go with it. And then down the line, nobody can ever check it. So this whole computer science approach of put it up on GitHub, you have a conversation about it, you go back and forth, people find a bug, all cool, we’re not perfect. There might be a problem in the code and you fix it and you improve it. Scientists, we kind of just write it one off, write our paper, and that’s it. And then everybody else has to trust that we did it well. So I found a lot of Excel type errors and people’s collaborators data before when I just asked to see it, people get their hackles up. When you even ask, saying, before we publish this, would you mind if I just had a look and they get kind of caught off guard that people want to look at that, but we’re human, we all make mistakes. But it seems that not all of it. There’s obviously some people who are really prolific coders who work with professional developers.

00:05:24 In a sense, that’s the epitome of what science should be. But it’s kind of a pyramid. I see it as there’s, like, these really top level people and then there’s like the rest of us. And I might be somewhere in the middle, but the majority of scientists aren’t trained, as you mentioned, a lot. Testing isn’t really taught in CS degrees all that much. So just imagine scientists who are kind of just learning it on the heat because they have to and they need to produce quickly. Testing is just not on the radar.

00:05:50 Well, let’s explore that. Let’s start there. But before we do, let’s jump in. But let’s find out who you are. Who are you and how you fit into this.

00:05:58 Yeah. So my name’s Martin Haru and I’m currently a neuroscientist, but in a previous life I was a physiotherapist by training.

00:06:04 Okay.

00:06:05 So meaning I have no background, self taught, and pretty much had a few people along the way that were nice enough to kind of get me going and coding, but they themselves were self taught.

00:06:14 Oh, self taught and coding. I thought you self taught in neuroscience. That would be a trick.

00:06:19 That would be impressive. Wouldn’t it be neurosurgery? But no. So formal training, I guess you would call it formal training in neuroscience, but the coding side of things is very much in grad school. I just had my supervisor and then another colleague kind of helped me out. Just get going. Primarily MATLAB at first. And from there, I have been coding ever since and got into Python maybe four or five years ago. I was looking for an alternative to the MATLAB scene and Python. I looked at it maybe in 2010, 2011, and it was still a bit intimidating. That whole scipi and the alternatives weren’t clear to me or how I would get going. The environment and all that wasn’t there just yet. So I looked and I went away. But then a few years later, I came back and then it was matured a bit and I felt much more comfortable and made the transition. And then Python coding ever since, for all my work.

00:07:10 Okay. And are you out of academia now or are you still somehow tied to it?

00:07:16 Conjoint, which just means I’m associated with a University here in Sydney, so University of New South Wales, and I work at a publicly or I guess it’s a national research Institute for neuroscience.

00:07:27 Okay.

00:07:27 And so that’s my full I’m a full time researcher, which is I feel very fortunate that I get to do that. I do lecture a little bit here and there to contribute, but really, I’m a full time researcher. Okay.

00:07:37 So some of the research, the scientific research. And in that, I guess, publications around scientific research, that’s something you’re still heavily involved in very much.

00:07:48 That’s kind of what we do. And that’s how we rate ourselves as scientists is a lot about being counting. It’s how many papers we put out. And so I do research and then the output would be research papers. And that’s what pretty much we drive ourselves to do. And that’s what I do on a daily basis is guide the research, move it forward, write up these papers. And a part of that process, obviously, is collecting, analyzing, and doing all the stuff with the data that we collect. And personally, if I can, whenever I can, I try to use Python to do that.

00:08:20 Thank you, PyCharm for sponsoring this episode. I first tried PyCharm when they started supporting pytest many years ago. Their support for pytest is now amazing. I was a longtime Vim user, so next I needed to test the idea Vim plug in, so all of my finger muscle memory still worked while editing. Check. It works great. There’s lots of reasons to love By Charm, but for me it is because they have the absolute best user interface for test automation. Then I learned many more ways by Charm can save me time. Like really great support for editing, Markdown HTML, CSS, JavaScript, remote connections to database, and amazing version control support. Really. It’s the best Get Diff tool I’ve ever used. And now version 2023 is out and the Shift shift, the Find anything key sequence even lets you search, get commit messages. What even that is so awesome. Tons of other cool features have been added in 2023. Check it out and I hope you enjoy it at testandcode.com. Pycharm okay, so let’s go back to the little bit. So I had a different maybe I’m not in academia or research, never was. But I had the impression maybe it’s just wrong that so much more research was going more open with published Jupyter notebooks and things like that.

00:09:41 I think definitely there’s some fields and that’s kind of how I think even in computer science, but also in academia, certain fields and certain disciplines are very much they lead the way. So, for example, psychology, there’s a gentleman who created the Open science framework. And so it’s this whole thing that allows you to register your study beforehand. So your hypothesis is kind of stated before you start, which sounds a bit as if it should have already been there. But some people kind of changed their hypothesis after the fact to fit the data, which is kind of a no no. But now that you’ve registered it, you’ve put it out there. That makes it a bit more efficient. You can put up your resources, your data, those types of things. And that’s more of a psychology push because they did this huge study of replication and found that a lot of the kind of big studies that were in a sense agreed upon weren’t replicable, they just couldn’t reproduce those results. And so there are people who are doing, as you say, the notebooks and that style. But I would say that it’s discipline specific, and some fields are way ahead. And where I sit more in the biomedical Sciences, rehabilitation, those types of things, it’s on nobody’s radar, really. I’ve read a paper in the Journal of Neurophysiology highlighting notebooks, but the uptake has been quite slow, and so it’s going to take a while for most people to know about it. And then the next thing is, are they going to be doing it if it’s not part of the normal workflow? I think that it’s just that added bit of everything, the open science side of things, having to register your things, having to make things accessible, it’s just that extra step that takes time, and people just see it as a hindrance. They’d rather just publish, get more things out, because in the end, that’s what they’re marked against.

00:11:21 Well, let’s do the smaller hurdle that you brought up a little bit already was the hurdle of letting somebody else look at your code or even your spreadsheet or something.

00:11:32 So that’s still something that some people are a little touchy about then.

00:11:36 Yeah, I find that a bit interesting. And I guess where it became the most obvious that there’s something a bit wrong here is when I started to understand just a little bit more how computer scientists and just people who code work. When you look on GitHub, it was just an example of the whole workflow is about finding not necessarily finding bugs, but having a workflow to work with them to either fix problems or to add improvements. And so great. And it’s almost like you’re thankful that somebody might actually take the time to look at your stuff and either improve it or fix it. And I guess science there’s obviously a lot of people that are top level and work with people in big teams and can do it at that level. But there’s a whole lot of other people like myself, we’re just fumbling our way getting into computer science because we kind of have to. But the code side of things is very much just an afterthought. We just do it because we have to we never release it. We never make it public. The only thing we make public a lot of times is just the written, summarized results in the paper.

00:12:38 And so in a sense, you as the consumer, the public have to trust that I did everything correctly and there’s zero mistakes in that.

00:12:45 Well, I’m a fairly positive person, but I have a hard time trusting anybody’s code.

00:12:50 And I can’t see it.

00:12:53 Yeah. And I mean, one side is true. A lot of people don’t even code, and that’s not belittling them. I think some research obviously doesn’t require code, but there’s a lot of work nowadays, even if you’re not doing anything computer intensive, you need to at least do your statistics most of the time well, you can make a script out of that and somebody could see the steps and how you did it. But even the most simple experiments we do, which might be like perceptual illusions, the data itself, I do it all in Python and there’s code to go with that. And so I try now, and to be honest, even myself, who I am like one of those advocates for it, I still find it hard after the fact I’ve finished my study, I’ve written it up. I’m now submitted get the reviews back at the end of it, if I have not prepared everything, ready to make my code presentable to the public. Because obviously sometimes you just kind of hack it together, sometimes I just can’t follow through and I just move on to my next project. And it’s a bit embarrassing, but yet I’m realizing that it’s because you can’t wait till the end, which is a bit also what I’ve been kind of discovering with testing is if you wait till the end, you’re not going to do it well and you may actually just run out of steam. So it might be something that we need to kind of think about as we’re doing the code is to now write it, knowing that you know what, there’s going to be other eyes on this and maybe write it in a more professional way and structure it more professionally. And don’t do so many comment this out if you’re going to run it on this subject. But uncomment that for this subject, that kind of stuff still exists a lot in people’s code. So it’s just I don’t think we’re there yet. Some fields are that’s great, but I think for the majority it’s filled with trust that the code is correct.

00:14:29 Well, and I think you bring up a really good point. And I want to pause here for a second in that I want to believe that most people in academia or in research are really trying to do the right thing. They’re not trying to hide anything. They really are believing in their research and believing in what they found. And it’s true that they haven’t. I mean, like you, you had to pick up coding on your own.

00:14:54 It really kind of should be part of all science training right now, I think, because almost every science, like you said, even in psychology, is going to involve some science or some open work. So this barrier of the software is almost like a crib sheet. It’s almost like they’re like lab notes. Those were never meant to be published. There are notes to the author themselves.

00:15:15 And if you’re writing software just to be read by the computer and yourself, that’s different software than if you expect other people to read it. And I don’t relate so much. I do kinda can remember that. But since I’ve been writing professionally commercial code that read by other people for decades. It’s hard to get back into that mindset. But I do understand it, especially I understand it you brought up the GitHub workflow that’s intended to go through iterations. The first posting is never the final code. We always make iterations. And Ironically, we treat software. If you do it well, you treat it almost like pros, like you would like a manuscript for a book or a short story or something. Ironically, the tools to do this, iterative process for actual prose writing is not caught up to software writing software is like totally surpassed it, and we’ve got the systems in place to make that happen. But I think that’s a big chunk of it isn’t that people are trying to hide something. It’s that like you said, that you didn’t think about writing the code for other people to read first while you’re writing it. Well, there’s a lot of it gets thrown away also, though. I mean, the code that you end up with at the end of your research is there a whole bunch of other stuff that maybe you just tried and didn’t work.

00:16:34 So you just threw it away possibly early on. Okay, now it’s a bit more with experience, in a sense. I could kind of dive in and I have a better feel for where I’m going. But it’s true that when I was learning, a lot of it did get thrown out. And I said, look back to my PhD code. It was just the longest script I’ve ever seen. It was just cut and paste, and so some of it gets thrown out. You don’t understand. You hear about reusing code and how that’s important, but it’s this, well, how would you ever do that? And it’s just this thing that every step of the way until you find that resource, whether it be the book or the presentation that you find from Python or something online or a podcast, you hear these words. And it’s similar to me with testing. I hear these things. But I’m like, how would I ever implement that? And I think it’s on your own. And that’s where the struggle becomes, I think for me is obviously if you’re in a lab where a big part of the work is coding, and therefore they’ve kind of attracted people who are developers to help build that code base and are just really pushing that to the boundaries of that discipline for the code side of things, and then the scientists themselves, the ones that are collecting the data, work together, then I think that code is very alive. It’s very much as you’re saying, it’s being revised and it’s like pros. But on the other hand, there’s the people like myself and a lot of people who I know we’re very small teams. I worked probably at least I can say I’ve worked up until now, and I’ve never had a code review in my life and pair coding. I think I might have done it with a student who I was teaching and maybe a recent colleague who sometimes when we get stuck, we help each other a little bit, but it’s all just selflearning. And the supervisors, a lot of them don’t. They don’t come from a background of coding, and that generation didn’t need to code.

00:18:24 They don’t value code like I do. It’s just a generational thing. So I did a survey at my workplace, and to them, they don’t value it very much, but they do see that for the future generation, it’s essential.

00:18:36 And so it’s hard. I guess I don’t work in software, but I guess it’s that struggle between people who are trying to get the product and the developers who actually want to develop it. Sometimes there might be a little bit of discord there about how important various aspects of, you know, is testing important? Well, it’ll slow it down, but it will make it more reliable and we know what it’s doing. But somebody else might say, I don’t care, let’s just get that out there. And so coding, I think for some supervisors that I’ve noticed, it’s a bit of a black box. And to them it’s a bit of magic. Whereas at least in Excel, it’s visual and they can kind of see what’s going on, even though it’s fraught with possibilities of making errors. It’s a visual thing that they can see versus if I just show them code, a lot of them just cringe like a blind light and they just say, oh, I can’t read this, which is why I think the notebooks are positive. I personally don’t use them if that workflow just doesn’t work for me. But I could see how interweaving kind of instructions or descriptions of this is what I’m doing now. And then you just write the piece of code and have maybe some figures pop up. And then the next step that I did is this. It kind of tells a story that even if you don’t understand the code necessarily, you can follow the story. So I do think it has its value.

00:19:49 I want to pick something. I’m going to pick on you a little bit. We have a lot of stuff we could cover. What’s your view of testing now and where does it fit in your work and your workflow?

00:19:59 It was interesting. I just did this developer Mindset thing with Bob and Julian from Pi Bytes. And one of my goals during this thing that I did was I want to learn to apply testing, because my gut feeling is that it’s important. I’d heard it as a paper, The Ten Best Practices for Scientific Coding. One of those is testing that was published, but it never describes how to do it. What is testing? It was just kind of like, you probably should do this. So then I went looking and I came across your book and I was like, okay, cool, here’s a framework.

00:20:31 But it’s kind of hard to learn the framework at the same time as learning the actual guts of how do you test? What do you test? And so I’ve been kind of trying to figure out what does that actually mean? And trying to find the best resources. So I have a better idea. Very superficial at this point, but still. And I’ve been working my way through Kent Beck’s book for Test Driven Development, because again, I could see how that would be a very useful exercise to structure the way I work, because I do sometimes get away and just code code, code and forget to see if it even runs. And then once it’s broken, I’m like, oh, what do I do now? And so the thought process of TDD made sense to me. So now I’m at that point where I’m trying to practice that and see how it might work.

00:21:15 Okay, I got to stop and say, I’m completely flattered that you ran across my book before Kevin book. That’s so cool.

00:21:24 Yeah. And you’re passionate on the podcast about testing. I’m like, this is exactly what in a sense, I think science needs. But also for me, just personally, I think I would trust what I do a bit more because I don’t have the over the shoulder person or the people on my repository checking for things and doing pull requests. And so testing provides a little bit of that. That’s how I feel about tests. And so I wrote two packages during this program and pushed them up to Pi because I wanted it to be professional. I put tests on them, and I was so proud of myself. I said, Look, I used pytest. I did fixtures. I even learned how to do tests for figures. So that was pretty cool. The pytestmpl for Matplotlib, you generate a figure, and then you can actually compare both figures and see what the difference is. And so that was a thing I didn’t think would be possible. But I’m like, cool, I can now test figures to see whether it actually generates the right thing. But once it all died down a little bit, now I look back on it, it’s a bit of a mess. I will say my testing. And I think part of it is because I did it always after I finished a module or I’d finish a part of it, okay? And then I’d finally stop the momentum of my coding and then say, oh, I guess I have to do those test things.

00:22:37 And it was always an afterthought. And so first I didn’t have as much energy, and I just didn’t know how deep to dive into the code. I was being pretty good about maintainable code and bits software, so little functions, little classes, things that, in a sense, should be testable. But it just felt a bit tedious. And what I realize now is that I went way too deep. I was testing the implementation, not the behavior. And that’s kind of something that Kent talks about in his book a lot. And I realized that just by nature after the fact, I have the perfect test for my current package. But if I change any of implementation, if I change even a variable name somewhere, my tests are going to fail. And that’s for sure.

00:23:21 Okay.

00:23:21 And it’s because I just didn’t realize what level should you be testing? And there’s always a trade off of how much and how little. But I think that’s what I’ve learned that I have to now go back and do it. But I guess that’s what I’m trying to understand from people like yourself who have that experience. I think I wouldn’t run into that issue if I did it from the beginning. So that the test driven development to do it properly. But also, I mentioned a little bit to some of the people I work with is that it seems a bit controversial as well. Like TDD. There was that famous blog post that was out there, and then other people are now saying that it’s dead. Other people saying that it’s just missed.

00:23:58 People are just not doing it right, but it actually still has heaps of value. And so as a new person coming in, I was a bit confused. I’m like, should I be learning this TDD thing or should I not? Is it dead? So in a sense, I’m kind of interested to hear what your views on that, because am I wasting my time? I think it’s going to have some value, but it seems that.

00:24:16 Oh, I think it definitely has value, but it’s definitely some issues around. Thank you for letting me know where you are at with all of this. I was just taking a look at some of your code, not the test, just some of the code. And I got a few questions for you. One of the things you said, you talked about, you wrote these packages. One of them is Flippy, right? Did you intend to open source this from the beginning?

00:24:40 I did.

00:24:41 Well, this is great. I’m having my first code review live on a recording, no stress whatsoever. So I had these packages that were for me. So they helped me do my science. I was doing these figures. A lot of people want to make these types of figures, but there’s very few of the statistical packages that make them. So I kind of made my own hacked version, and I’ve been using it for a few years. And the other package is the same thing. I had built something that I’d been using, but it was just growing kind of stale. There’s no tests. Every time I had to change anything, it just always broke and took me forever to fix it. And so what I decided to do is say, hey, if I put the bar quite high, saying I’m going to publish this, make it a pipe package. Well, then it’s going to I can rethink the whole structure of it. I can do some cool things like type hinting, which is something that I think would be useful when I have students or colleagues use it. Hopefully it will be a bit more informative and I could just rethink the whole thing and add test, which is something I really wanted to do.

00:25:39 Did you feel like the notion that you were going to publish it? Hopefully it would be useful to somebody else and that other people were at the very least going to look at the code? Did that change your approach while you were developing the software?

00:25:54 Very much so, yes.

00:25:56 And was for the better or for the worse?

00:25:58 For the better, definitely for the better. I mean, I was just more careful, more thoughtful. That whole. Although I wasn’t doing my test as I was going, I did. I was kind of all at the same time reading Martin Fowler’s book a bit. I was taking the time to I wrote the code, but then this is how I write my science. Actually, when I write science papers, I write and then the next day I don’t start the next part. I actually always go back and revise what I wrote the previous day. And then as I go further, I always come back a certain amount to make sure everything kind of fits. And so in a sense, I’m refactoring my scientific article as I go. And so although I didn’t do the test driven development red, green refactor, I was at least doing the green refactor side of things. It was working and I’d look at it and I say, oh, that ended up being a pretty long function that’s actually got a few things going on. Maybe I could split that up because I will have to write some tests for it.

00:26:55 I was doing that partly because I know that other people will be looking at it. And when I look at GitHub and some of the projects there, I get quite overwhelmed. I must admit that the big packages that have just heaps of files and I open them up and I just don’t know where they’re going. And I could see junior scientists like myself who are trying to get into open source or want to understand what my code is doing. That’s the people I had in mind. So I didn’t have a Brian ONC and professional developer. I didn’t want to have a code review by you. But on the other hand, I was thinking of I would hope that somebody who’s okay with Python, just the basics of it, could at least follow what I’m doing. And so I could see why great function names, variable names, and then keeping it small makes it all that they could follow my story. And that’s kind of how when I write my science, it’s the same thing. I want the reader. I don’t assume the reader is an expert in my field. I assume the reader is a lot of times my dad, to be honest, he’s interested in what I do, but he’s not a scientist. So I kind of write it for that level. And therefore I add extra definitions. I make sure that concepts are clear. I provide examples. So when I was writing the code this time around, I wasn’t just doing it to get it done.

00:28:05 I always had this other person over my shoulder. And it really, as you say, I think for the better made me write it differently.

00:28:13 Well, I love this idea of writing, just doing some work for a day. And then the next day, before you push the ball forward, go back and make sure that what you did fit with the rest of the work and that you could read it. So having yourself be the first reader of the code later is good. You don’t just write it and then go, well, it works. I’m going to move on, go back and read it and say, does this flow? Does it seem like it tells a story?

00:28:41 And that’s good. I can’t wait until we get to the point where you believe that writing code that you think somebody else is going to read and also writing code that’s going to be tested, you know, in your heart that it’s going to make better software and also it’s going to be faster to write because that’s the part. That’s one of the parts that I just cringe every time I hear when people say, I know testing is the right thing to do and it does take a little bit longer, but you’ll appreciate it later, oh man, don’t tell people that it shouldn’t take you longer. So at first, just like writing code so that it can be read doesn’t mean you’re going to get it right the first time. But picking a function name that you think relatively describes what you want it to do, and then if you know, it’s going to be kind of a confusing, big thing than just going out and writing a comment at the dock, string at the top or a comment at the top to say this is kind of what this function is doing in English, and then writing it and then coming back later and go as a fresh like another day, not just later in the day, but another day or next month or something, coming back and looking at it and saying, does this make sense? And then maybe the name doesn’t really affect it, or maybe you’ve split things up and you’ve really refactored and that it used to do four things. Not only does two, you can change the name. The same goes for testing that hopefully you get into the mindset where you’re just doing it at first and you’re not going to pick the right tests at first and you’re not going to pick the right granularity at first, but you just do it and it helps you because you can go back. One of the ways, hopefully it will help you is if you’re doing it while you’re testing. And I personally don’t care whether somebody writes a test before, after, or during, but writing tests for the functionality that they have in place right now helps you to do that refactoring bit. So you can say, hey, everything works. Now I’m going to go back and reread my code and go, oh, gosh, this thing is broken up. So I’m going to break it up. And I didn’t change the API at all. I just rewrote the guts and then one of my tests break and I go back and look and go, oh, shoot, that test actually was testing implementation. It wasn’t testing behavior. So ideally, if we wrote the test just for the behavior at first, we wouldn’t have to rewrite the test when we write the code. But it’s okay to learn about testing at the same time we’re learning about software development. So there’s going to be some of that juggling back and forth and going, oh, well, the test isn’t quite right.

00:31:18 Let’s go back and change the behavior. And then ideally, we would run the revised behavior test against the old code to make sure it worked and the new code. But let’s be real. Most people just run it against the new code and do examinations and they do it for real in real time. Now, the cool thing when I said that you were the first reviewer of the code because you go back the next day and review it the other idea that you said you don’t really have a team of people that can review your code for you. Lots of people are in that same situation, even people in big teams, they may be the specialist in some little area and there’s really nobody else. I mean, somebody else can look at their code and go, oh, you missed like a comment there that variable name is weird or something dumb like that. But to really review the code to make sure it works right, there’s not going to be anybody around because they’re the only one that understands that complexity. That’s just how teams are put together a lot of times. So having your past self to say, okay, I knew what this API was supposed to do. I knew what this behavior was supposed to do. And then you can kind of forget about some of that stuff and focus on little details in the future. And you can have your past self be the code reviewer because your past self wrote the tests. So I really like your direction learning about a framework right away. I mean, actually, I think pytest is so much easier than all the others. I think that’s cool. I’m glad that you jumped in with Kent’s work because there’s a whole bunch of other TDD stuff that is frightening to me.

00:32:52 Yeah, it’s kind of like in science. Also, a lot of times people might reference something. And if you’re a good scientist, you go back to the source because you don’t assume that they are citing it or referencing it properly. Or maybe they’re citing it a bit out of context. And so when I was looking into this testing business and especially TDD, well, in a sense, why not go back to the original? And there’s a good video out there by Ian Cooper, an interesting gentleman, and his view is that is that people have kind of taken it and morphed it a lot of people. Where did you learn to DD? It’s oftentimes from a colleague at work who themselves, where did they learn it from? And it’s I guess out of context and it might be misapplied. And I guess I have a question or a bit of a story here. You’re talking about how people just say it makes you slower.

00:33:40 I think there needs to be more people telling the stories of I have my own small I’m a small company. I wrote my little project for me and I wanted to be productive. And I was great. I did this code thing, but six months a year later, I wanted to add functionality to it. It took me forever because I didn’t have test and code. Everything I did broke my program and it was just painful. But I got it done for the next little bit. But if you think about productivity, it slowed down. And then the third time I went and I had to add functionality to it, I actually just cringed because I just couldn’t get it to work. So as I’m hearing from some of these books and podcasts that I listen to, well, you have a new system that’s in parallel. So that’s what I did. I took my testing framework and I just said, well, I need to learn classes anyways. I don’t understand those things. And that was many years back. And so I rewrote my whole thing with classes. And so now, oh, great, this is great. This is polished. It all works now. But again, slowly, as I added more and more functionality, again, it started to rock because there’s no tests that were checking and always adding to it. And so in that sense, I was much slower. I wasted so much time adding. So maybe writing it was fairly on par or maybe a bit faster. Obviously, if I didn’t have to write the test, but everything else, which is actually the lifespan of that code, I wasted so much time getting back into my code. Any little change? Every little bit I had to check. I didn’t have that green test that could tell me you’re good, it’s still working. And I guess you talk sometimes with Michael Kennedy about new coders. That idea that you want it to be simple. You want it to be interactive and you want it to be fun. And I totally agree with that.

00:35:24 But you start to get that adrenaline hit of I’ve done a program, I’ve built another one, I do all these fun things and you’re progressing, if that’s how you’ve learnt from the beginning and maybe even from like kids nowadays or six and seven in their coding, if you’ve done that, and then by the time you get to 18 or 19, maybe early 20s, somebody starts to say, but actually you should be doing this testing and maybe you should be doing this even before, in some cases, it’s such a different mindset from what they’ve been doing for years and years and years. And I could see how, from what I understand from reading Kent’s book, is that really it’s a practice that you need to really devote yourself to and experience it and to get the benefits. And I think it will probably take a good amount of time for me to really grasp the subtleties of it. But if I just say I’m going to try it on my next project and very quickly, I get frustrated because I’m trying to do some coding, learn a new framework and all that, it will slow me down. I’ll get frustrated and there’s none of that adrenaline, those little hits which I personally did experience with the tests. I loved when my test passed. I loved when I did a bit more coding or I changed a few things and then I could just hit that the mental sanity that came from having the tests. I loved it. But it took me a whole 15 years of coding without them and having a few projects wrought to the point where I had to redo them to now be able to appreciate what tests brings. And I guess I’m just trying to would a young developer who’s just always pushing out stuff. If you now tell them, by the way, now you need to do this coding stuff, but we want tests. Do you think that’s part of the problem is that they see it as it’s just so different and it’s not as satisfying?

00:37:06 I think we need to teach people how satisfying it can be. And some people have started. There are a lot of people that have started. You brought up the Python Bites guys.

00:37:15 Oh, Julian and Bob.

00:37:16 Yeah, Bob and Julian. One of the things they do is they have like these code challenges system that they have set up, and they’re not the only ones. There’s also lots of other Python instructors that I know that have programs set up to help teach people either in person or online. And we’re now including tests with it. So like, for instance, the Python Bites code challenges.

00:37:42 What there are, there are little tiny code challenges and you try to figure out some puzzle and you write the code to solve the puzzle. How do we tell if it works? Well, they’ve got tests in the background that validates some stuff to make sure that the tests that your code does solves the goal of the code. And I find that an adrenaline hit to have like three tests in there and to see I think it works. And then I’m going to run the code, run the test and two out of three. So I know I’m on the right track. So I need to figure out why I’m not hitting that third test and get there. So that incremental achievement of new functionality, I think is great. Reuben learners and other person, and he does a lot of these same sort of challenges. And I’m thinking there’s a program called Python Marshalls. That’s an email system. And that’s even kind of a cool thing where there’s a challenge to begin with and then with tests and then there’s additional ones. So there’s like extra. So now you’ve solved this. Now let’s make the problem a little bit harder. And then also let’s make the problem like maybe completely different and test against that and those sorts of things.

00:38:58 I think we would benefit from teaching even grade school kids with relation to tests. I was actually like watching this video. It’s actually a pretty clever video. And supposedly a whole bunch of CS programs and other types of programs do this method. And basically the teacher sits at the front of the desk in front of the room with peanut butter, jelly butter, and some two slices of bread, and then just follows directions and has different people in the class tell the teacher what to do to make a peanut butter and jelly sandwich. This is a hilarious thing to watch because the teacher intentionally does something that follows the direction but is wrong. So that teaches people to be very exact.

00:39:41 Now I think we can even start that even at that level to say, okay, before we start this exercise, what is our goal and how do we know we’ve achieved it? And it’s like, well, okay, so in the end, we want two slices of bread with peanut butter and jelly in there, and they should be layered and maybe we want butter. Does it have to go out to the edges? Let’s talk about the specification and the sorts of things we want to test for. And I think that we can do that in software from the very beginning. And then it’s not weird. We teach electronics that way. We teach kids how to even in high schools, I think high school electronics probably still have already having a SoulScope or a power meter or something like that. That’s testing. But for some reason, software, we think, let’s wait until they’re in College and only CS majors. Now we need the electrical engineers to know software testing, and we need everybody. So I think that we can change it. But you think so, too. I think because you’re here saying we need more testing in academia and in science.

00:40:46 I think so. And I think we almost owe it to, I think, open science. I don’t know how long it’s going to be, 10, 15, 20 years. I think it’s going to be to the point where there’s journals now that require you to submit the data when you publish your paper. Interestingly, a lot of people cringe because same thing as the code. The data isn’t maintained properly. It’s a mess. And so I think there’s people who have looked at this and I don’t know what the percentage is, but it’s not great in terms of data that’s properly organized and structured, that it’s actually useful. And I could see the code side of things being that way. If you actually force me, maybe I’ll give you some code. But since I don’t ever expect anybody to even be able to run it, partly because it’s got that hard paths that are for specifically my machine on a folder that will be in my. And so the idea of writing the code so that others will be able to just take it with my data and run it is such a different mindset. Some fields, clearly they’re already there and they’ve been there for a while. But I think a lot of science just isn’t. And that’s why sometimes I get intimidated when I listen to these podcasts or I read an article is that I have to realize that I’m listening to the top of the pyramid. I’m listening to the people that are it’s where we’re all headed. But a lot of us are still at the bottom of the pyramid and we’re just fumbling our way. And the hard part is always that idea that it’s extra work right now and nobody officially requires it. So am I going to sacrifice having one or two extra papers published this year? But maybe I’ll have a better publication because I’ll have the code clean and the data cleaned up and I’ll have registered my trial. All that extra work currently is costly and may actually hinder my career. So I could see why people aren’t jumping on the bandwagon and the senior people who have never had to do that, I could see why they don’t think it’s important or if they do, I mean, they’re not going to do it themselves. It’s the grad students and everybody else. So it slows us down. So I think we’re at this middle transition ground. The early adopters often lose out in these types of things, and I think we’re at that slowly. Everybody’s kind of looking at each other like who’s going to go first? So some journals are pushing for extra data. Some of them are asking for code. Some of them, a lot of them are doing we encourage people to and what I’ve written about this a little bit in a few letters to journals that recommendations and encouragement just doesn’t work. I can encourage you to do the best stats possible and to do the reporting. But if you don’t mandate it, it won’t work. Even if you mandate it, some people will still obviously get by and still do it the wrong way. So people aren’t willing to put their feet down and just say, look, you have to do it this way for Quality’s sake and for moving the field forward. And so my approach is very much work. We have a committee that’s working on quality research, quality. And I’m always about let’s do the lowest hanging fruit. Let’s pick the easy things, because the high level things, they’re doing so well in those practice, those people are doing it. That’s where we want to be. So let’s just bring the bar up from the bottom. And so having code is already better than having no code. So this whole testing thing is probably a little bit above that. It’s very much. But the TDD thing has me interested because I’m going to pursue it, because I think for the people that are learning coding, it brings an amount of certainty and it makes you think in code.

00:44:02 Yeah.

00:44:04 I’m really enjoying this conversation. And I actually don’t want to cut it short, but we kind of have to.

00:44:11 And I want to explore the TDD thing and more of the scientific research. So I’m right here just going to say, can we get you on again so we can continue this conversation?

00:44:22 Yeah, sure. No, I’ve enjoyed this conversation. I’ve enjoyed listening to you over the few years now of podcasting. So, no, I have a passion for something I never knew I’d have.

00:44:31 Awesome.

00:44:32 And you’re a great person to talk to about it. So, yeah, I’d love to be back on.

00:44:35 Yeah. So let’s schedule something. Maybe we can focus on TDD.

00:44:38 Yeah, that’d be great for the next time.

00:44:40 And I do I’m going to, I guess, close it with I like the idea of if you don’t require it, people won’t do it because it costs more. So I think it would be cool. If you want to keep your code private, that’s fine. But maybe the publication could require you to publish your tests.

00:44:58 Even just the test and not even the code. Maybe that’s enough because you can figure out what the code is doing. Yeah. Interesting. Good concept.

00:45:06 No, I think if we can get to the point where. But I also don’t want to slow down research. I get that. So like you said, a lot of the code is only written once it’s written, it’s run against an experiment or some set of data. People write a paper and you move on to something else. Now, unfortunately, the next person that comes by and tries to reproduce the research, it would be helpful if they had the code around, and it would be helpful if the code still ran things like that. But if we can try to get it to the point where it’s maybe just a tiny bit more work and actually saves you time and research through the whole time. That would be cool. And maybe my work is done.

00:45:45 So let’s table this conversation and I’m really excited about talking about this because it’s a really cool concept. So let’s get back to it next time. Perfect.

00:45:53 Thanks, Brian.

00:45:54 And let’s shout out before we close it off. People want to know more about you. Where do they find the information about you?

00:46:01 I’m starting to be a bit more of a Twitter person. So at Martinhiru, that’s Heroux. And it’s a bit on COVID halt right now. But I do have a blog with a colleague, Joanna de Young scientificallysound.org, which is where we try to just write about science but also about coding and just how do we do science in this digital age? So those are the two major spots.

00:46:23 Okay. And then you also sent me a link to Google scholar. I don’t know what this is. So what is Google scholar?

00:46:30 I mean, Google scholar, you can search for articles and obtain them that way rather than PubMed as an example. But they also if you choose to, you can have your public resume in a sense of publications. And although we’re not supposed to count all these indexes because apparently they’re not useful, but everybody looks at them give you your little H index score which gives you your innocent popularity contest in science. How many people have cited your own work? And yeah, it’s just a nice quick place to see what I’ve done and also how many times it’s been cited and that kind of stuff. So it’s kind of like a Facebook for science. Almost.

00:47:01 Okay.

00:47:02 Research gate is the other place, but this one is just a nice simple just if people are interested in what I’ve done scientifically cool.

00:47:07 You can go look nice. All right. Well, thanks a lot and we will talk to you next time. Perfect.

00:47:12 Thanks, Brian.

00:47:14 Thank you, Martin. I look forward to having you on again to talk about test driven development. Thank you Pie charm for sponsoring the show. Try out PyCharm yourself by going to Testingcode.com PyCharm. Thank you, Patreon supporters for your amazing and enduring support of the show. Truly incredible. Join them by going to testinco.com support. The show notes for this episode are at testandcode.com 140. That’s all for now. Now go out and test something and if you know a scientific researcher, talk to them about testing, tell them to send their questions my way. If I don’t know the answer I might know who to ask. Take care, stay safe and happy holidays.