160 - DRY, WET, DAMP, AHA, and removing duplication from production code and test code

Should your code be DRY or DAMP or something completely different? How about your test code? Do different rules apply? Wait, what do all of these acronyms mean? We’ll get to all of these definitions, and then talk about how it applies to both production code and test code in this episode.

Transcript for episode 160 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

00:00:00 Should your code be dry or damp or something completely different? How about your test code? Do different rules apply? Wait, what do all these acronyms mean? We’ll get to all of these definitions and then talk about how it applies to both protection code and Test and code this episode.

00:00:31 Welcome to Test and Code Code.

00:00:42 This discussion about dry and damp and various other clever acronyms and mnemonics should start with some definitions and a little history. Let’s start with Dry or don’t repeat yourself. I first ran across dry during my first reading of the Pragmatic Programmer long ago. Don’t repeat yourself. The Dry principle is every piece of knowledge must have a single unambiguous, authoritative representation within a system.

00:01:09 That sounds neat. A good summary from Wikipedia says, when the drive principles applied successfully, a modification of any single element of a system does not require a change to other logically unrelated elements. Additionally, elements that are logically related all change predictably and uniformly and are thus kept in sync. Besides using methods and subroutines in their code, Dave Thomas and Andy Hunt, the authors of Pragmatic Programmer, rely on code generators, automatic build systems, and scripting languages to observe the drive principle across layers. Well, that sounds great, but it’s not always easy to implement and it can go haywire if applied inappropriately. Let’s look at a little bit more. In the same Wikipedia article, it also includes some funny description of Wet W-E-T. Which could stand for right everything twice right every time we enjoy typing and waste everyone’s time, I really enjoy the we enjoy typing one. That’s awesome. The article also talks about AHA from Can’t See Dodge and being a big fan of Take Me On. As a kid, I definitely wanted to know more. Aha is not part of the band. It stands for Avoid Hasty Abstractions, described by Dods as Optimizing for Change first and Avoiding Premature optimization, and was influenced by Sandy Metz’s preferred duplication over wrong abstractions.

00:02:46 Okay, now we’re getting maybe some people that don’t think Dry is awesome all the time. We’re also getting some of the misapplication of Dry. Aha programming assumes that both wet and dry solutions inevitably create software that is rigid and difficult to maintain. It’s not good. Instead of starting with an abstraction or abstracting at a specific number of duplications, software can be more flexible or robust if an abstraction is done when it is needed or when the duplication itself has become the barrier and it is known how the abstraction needs to function.

00:03:27 Not sure I followed all of that, but we definitely need to pull on this thread a little bit more to find out more. But first, this episode of Test and Code is brought to you by Datadog, the monitoring and security platform for cloud scale applications. Datadog APM lets you automatically track the impact of every code deploy on your application’s performance in real time. Leverage out of the box comparison graphs that visualize requests, errors, and latency down to specific endpoints to decide whether you should ship a hot fix or roll back your code. Datadog also ties your code deployments to the underlying infrastructure and code profiles. Without clicking a button, start monitoring your Canary blue, green and shadow deploys with a free trial at testandcode.com. Datadog and Datadog will send you a free T shirt.

00:04:29 Okay, so we’ve defined Dry, wet, and AHA. And it’s not the take on me. Aha. It’s to avoid hasty abstractions. And we have some hints at where the trouble that there might be trouble in Dry Paradise. I’d like to bring up some examples I’ve seen from experience.

00:04:48 A lot of people have heard of drive, but not really understood it deeply. They do remember they don’t repeat yourself thing, so weird things happen. An example is an order. Let’s say we have two functions, each with say, 35 and 40 lines of code, and they share some of that code with each other. Let’s say they share the first 30 lines of code and differ only in the last five and to ten lines of code.

00:05:13 Wouldn’t you be following the dry principle by grabbing those 1st 30 lines and sticking them in a separate function and calling that function from each of the original functions? Well, it’s not that cut and dry get it dry anyway. It avoids duplication.

00:05:31 But dry is more than just about avoiding duplication. It’s also about abstractions and about how the code behaves when a change is needed. Yes, exactly. You might say if a change is needed now I only have to change the code in one place and not two.

00:05:46 Right? But what if the change is only appropriate for one of the client functions? If you don’t know that, then you have a bug in the other function. If you do know that, then maybe you’ll copy that shared function into a second function and change one of them. And now you have two functions, each calling their own separate helper function.

00:06:07 You now have a deeper call tree and you actually haven’t saved anything. However, if the first 30 lines really do represent an identical abstraction, then you’d probably be correct in creating the helper function. Probably, though maybe it’s software. Future requirements are hard to predict. And also what if it’s not 30 lines? If they share one line of code, obviously it’d be silly to create one line helper function and replace it with one line. Calling the helper function. You haven’t really saved anything, and you’ve hidden away implementation. I’ve seen this though, and I still have bad dreams about it. Okay, 30 lines.

00:06:45 Maybe create a helper function with that, but one line don’t. But Where’s the cut off? Is it two lines? Is it five lines? Is it ten lines? I don’t know. That’s where abstractions and making sure your code is subjectively. Easy to read is important.

00:06:59 By the way, you are a bad judge of whether or not your code is easy to read. So am I. I’m a bad judge of my own code. That’s why code reviews are good. Let’s look at a couple more mnemonics. One is about the duplication thing.

00:07:12 It’s called the rule of three. I’ve also heard it as three strikes and you’re out. And it’s roughly two instances of similar code do not require refactoring, but when similar code is used three times, it should be extracted into a new procedure. Martin Fowler likes this rule and or at least did and brought it up in his refactoring book, and I’m on the fence about it. Actually, it’s relatively easy to apply when it’s your code and you just wrote the other two duplications recently. It’s harder to apply when it’s a shared code base and the duplications have been put in place at various times, and you might not even know they exist.

00:07:54 But it’s important. It is important as part of refactoring because this Cut paste modify type of coding is very common in software. And just so you know, cut paste modify is not considered doing architecture.

00:08:11 So if you see a lot of duplication, it’s time to start a refactoring effort, I think. But all maintainers need to be on board and the code and be involved in the code reviews so they can understand the abstractions that you’re putting in place, and the abstractions need to make sense.

00:08:27 Okay, one more thing. Damp. Damp is descriptive and meaningful phrases and is about promoting readability.

00:08:36 Okay, now we’re getting somewhere.

00:08:38 And I learned about damp from a stack overflow discussion about applying dry and damp to testing. I’ll read just a section from Chris Edwards because it’s great so far. To maintain code, you first need to understand the code. To understand it, you have to read it. Consider for a moment how much time you spend reading code. It’s a lot. Damp increases maintainability by reducing the time necessary to read and understand the code. Sounds good. Removing duplication ensures that every concept in a system has a single authoritative representation in the code. A change to a business concept results in a single change to the code. Dry attempts to maintain attempts to maintainability by isolating change to only those parts of the system that must change. Also sounds good. So why is duplication more acceptable in tests?

00:09:35 This is still part of the quote.

00:09:38 Tests often contain inherent duplication because they are testing the same thing over and over again, only with slightly different input values or set up code.

00:09:48 I’m not following now.

00:09:50 However, unlike duplication code, unlike production code, this duplication is usually isolated only to the scenarios within a single test picture file.

00:10:00 Because of this, the duplication is minimal and obvious, which means it poses less risk to the project than other types of duplication.

00:10:09 Okay, I’m going to continue reading and I’ll come back. Furthermore, removing this kind of duplication reduces the readability of the tests.

00:10:18 The details that are previously duplicated in each test are now hidden away in some new method or class. Yeah, that’s bad. To get the full picture of the test, you now have to mentally put all these pieces back together. Therefore, since test code duplication often carries less risk and promotes readability, it’s easy to see how it is considered acceptable. That’s a principle favorite dry and production code favorite damp and test code. While both are equally important, with a little wisdom, you can tip the balance in your favor and quote. And yeah, I also had some interjections in there.

00:10:56 Okay, so now I’m a little more on the fence. He had me for a while, but then it kind of fell over a bit. I remember hearing Kent Beck once say that a test should tell a story. I like that, and I think it should kind of be at the same abstraction level. But shouldn’t your code and the functions in it tell stories also?

00:11:20 I guess I’ll just say that how I deal with this. I don’t really have good advice that I can tell you, but I can tell you how I deal with it. I do know that a long time ago I started to try to make my tests as readable as possible.

00:11:34 That’s my goal. If there is duplication, parameterization fixtures, and helper functions are great to clean that duplication up, but only if you can still read the test quickly and understand it. I also want to make sure that anything that can fail generates an easy to understand failure cause that’s kind of an aside thing from what we’re talking about now, but that’s harder than it sounds.

00:11:58 I also know that when I started to do this for my tests, make sure my test told the story at a consistent level of abstraction. I started wanting my production code to be like that, too. So now I don’t really have two rules for production code and test code. I want it all to be readable. Another thing is that I’m not that great at predicting what kind of changes are needed in the future.

00:12:25 So my defense against all of this is a few things. Keep in mind a dry.

00:12:32 It’s something to keep in mind, even if I don’t follow it all the time. I make sure my functions tell a story at a consistent level of abstraction without hiding important detail. I avoid duplication when it makes sense and doesn’t detract from readability.

00:12:49 I test everything. Everything is tested, and don’t be a zealot about dry or really anything. I work with experienced engineers from all sorts of backgrounds and learning paths, and some people have read about all this stuff and some people have not.

00:13:06 I try to not be on my soapbox and I learn from my fellow employees and try to have them learn from me.

00:13:16 Don’t prematurely clump duplication into helper functions. Sometimes I need to write the same thing, like five times before I really understand what the abstraction is and understand which of those duplicated things represent an abstraction. Sometimes it’s two.

00:13:34 So sometimes I have to write it several times before I’m ready to refactor after I really understand it. And also don’t be afraid to refactor other people’s code.

00:13:45 Your tests will help you to make sure that you don’t muck things up and your version control system allows you to roll things back if you accidentally make things worse or possibly make it less readable, you’ll know that from code reviews anyway also have fun. This is software.

00:14:02 Don’t get too serious about it.

00:14:11 Thank you Patreon supporters join them at testingco.com support thank you Datadog for sponsoring get started with a free trial at testingcom Datadog dog and Dandel dog will send you a free Tshirt. Those links are in the show notes at testandcode.com one 60 160 that’s all for now. I’ll go ahead and test something.