Tools like error monitoring, crash reporting, and performance monitoring are tools to help you create a better user experience and are fast becoming crucial tools for web development and site reliability. But really what are they? And when do you need them?


Transcript for episode 88 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


You’ve built a cool web app or service and you want to make sure your customers have a great experience.

You know, I advocate for utilizing automated tests so you can find bugs before your customers do. However, fast development, life cycles, and quickly reacting to your customer needs is a good thing and we all know that complete testing is not possible. That’s why I firmly believe that site monitoring tools like logging, crash reporting, performance monitoring, etc. Are awesome for maintaining and improving user experience. I had a problem. However, I’m not completely savvy in all the ways of the web as I spent most of my career in embedded development and I know many of you are working with web applications. So how do I close the gap? First, I’m learning. Second, I reached out to Regan as sponsors for the show because I like what they’re doing and I know their products fit many of the problems that you have. Third, JD Trask, the CEO of Regan, agreed to come on the show and let me ask all of my questions about this whole field. So cool. That’s what this episode is about. Yes, Ray Gun is the sponsor of this episode and a whole bunch more, but I also think this is great information about the field as a whole. I learned a lot and I hope you do too. I normally stick an ad in the middle of the podcast, but for this episode I just think that would be weird. So I’ll just tell you now, if you like what you hear in this episode, check them out at raygun.com.

Welcome to Test and Codecode, the podcast about software development, software testing, and Python.

On today’s episode of Testing Code, I’ve got JD Trask from Reagan. So if people don’t know who you are, could you introduce who you are?

Sure. I’m the co founder and CEO of Ray Gun. I’m a Kiwi, hence the accent down here in New Zealand. And absolutely love coding, love business, love everything about software in general.

Okay, so what is Kiwi? Kiwis just somebody from New Zealand, or is there more meaning than that?

Yeah, it’s the colloquial term we use for New Zealanders. We call ourselves Kiwis, and so it’s not an offensive term. When I did travel overseas, I met one guy who thought it was an insult down here, which felt really odd. But no, we tend to kind of call ourselves Kiwis. It’s been a bit muddled in recent years by the Kiwi fruit, which seems to have been shortened of being called Kiwis around the world, but we typically mean it in terms of the birds that can’t fly.

Okay, this is totally not where I was going to go with this, but does the fruit have a longer name?

No, it’s just code Kiwi fruit. Okay.

I worked in produce when I was in College, and I’m like, I’m pretty sure there’s just Kiwis.

Okay.

Anyway, I saw was the bird, though.

Yeah. And they’re not related. Right. Like Kiwi birds don’t eat Kiwi fruits.

I’m so tempted to tell you that they like Kiwi fruit right now.

No, that’s totally disconnected. Both can fly about as well as each other, though.

Okay, well, what I wanted to talk about was Regan instead of Kiwis. But Reagan is your company, right?

Yeah. So we set it up to do Software Crash Reporting. So picking up on the faults that people don’t sort of necessarily plan for, we also have a real user monitoring product to understand the performance that our end users are experiencing. And more recently, we added in a full APM product, which stands for Application Performance Monitoring, where it tracks basically what’s my code doing on the server. Why might this thing be slow? What’s going on here? So kind of gives you that full pane of glass between user experience, what’s blowing up and where the performance sort of sinkholes might be across your code, whether it’s front end, back end, that sort of stuff. So it’s more of a slightly maybe after the testing phase, more typically sweet spot is folks wanting to track stuff in production and see, okay, what are all the things we didn’t plan for?

This is all kind of exciting to me.

Most of the Web development I’ve done has been internal source stuff, small tools for internal to a company, handful of people using it, and people just tell me, hey, the Wiki is down or something like that. Now I go reboot the server. So this is on a different level, obviously. I know that there’s a lot of people listening to Test And Code that are heavily into all of this, either testing Web applications or in the DevOps space of keeping it up. So this, Regan, sort of fits into what we think of as part of the DevOps job, is that right?

Yeah, absolutely. So around the time DevOps as a term sort of started popping up was about when we launched Reagan Crash Reporting, which was the first product in the suite in about 2013. And it’s always seemed a little interesting to me. Everybody has a different view on what DevOps is, and I’m personally of the view that it’s more about the behavioral style of developers and taking ownership through production.

But when it comes to towing, I’ve noticed that generally everybody only seems to think of continuous integration and continuous deployment as being the sort of DevOps tooling pipeline. Now there’ll be people out there that are thinking, this guy doesn’t know what he’s talking about. It’s much more. And there’ll be people out there that kind of go, it’s nothing to do with tooling at all. And that’s kind of part of the issue with the label.

But I’ve always found it interesting that it stops there. And in my mind, like, what we’re doing and what a lot of people do is you need that feedback loop from production because it’s all well and good that you can kind of get from code into product really fast. But that’s not a complete circuit. Once you start getting the sort of intel back from once it’s in production back to the developer, you can kind of imagine. Now you’ve got this virtuous feedback cycle, right. That can run super fast. It’s great. You’ve put the effort into getting into pride really fast, but now we can deploy to prod notice an issue and fail forward fix that with a small improvement because we got told about it automatically really quickly push that out and it really helps to accelerate software delivery in general has been the experience we sort of found and want to promote.

The term crash reporting is a little scary.

I don’t want any crashes reporting might be great, but aren’t all crashes bad?

Generally, yes, they are bad.

The line gets a little bit blurry once you get to front end crash reporting or error reporting. Some people think of a crash as being a hard, unrecoverable situation, which typically if something goes wrong on the back end, it usually is in that sort of camp if it’s something on the front end. So thinking more like JavaScript. Well, I don’t think it’ll surprise anybody to know that the web browser is pretty much a dumpster fire.

Right.

Like you can’t trust it, you don’t know what extensions and stuff you use as a running. And so it’s quite easy to have issues that occur on the front end with JavaScript code that don’t actually prevent the user from doing anything. So it can get a little bit blurry at that end. But certainly on the back end, crashes are bad.

Usually when I’m sort of describing what crash reporting is to folks who might not be as technical as this audience, I just think of it as like a black box flight recorder for an aircraft. Right. Things are going to go wrong. And Firstly, I want to know that they went wrong. Secondly, I need to know enough information to know what the heck should I be doing to try and fix these things?

Okay. Error monitoring and crash reporting are kind of the same thing then.

Yeah, effectively the same thing. Just different labels. Usually what we found was that in the mobile world the term crash reporting was far more prevalent outside of mobile. It was often called error reporting, but effectively the same sort of products in the mix there.

Okay, let’s say if I have some smarter code that can sort of try to do something and then have a reasonable fallback if something fails, are those cases something that’s going to be reported as well? Or in my code, if I’ve handled it, is it not reported?

So by default, a handled error would not be reported because you’re doing something with it might be completely fine that there is an error state or error case in there and you are handling it. What we’ve found is that usually across our customer base is most people, the unhandled stuff is the highest value because it’s literally the things they didn’t think would happen. Right. Surprising to them. And then we have some folks that have really great processes around how they build their software and they are actually using try catch type constructs all over the place and generally doing a pretty good job. And what they find is they will usually have the certain errors that are in there that even though they’re handling they may want to report on just so that they understand how frequent these are happening so they can sort of maybe optimize or change flow that requires a little bit more instrumentation. We’re talking maybe like a line or two of code, but it’s generally pretty set and forget on the unhandled exceptions.

And I just want to cut back to a comment you made earlier as well about how you said, you know, people just tell you something is wrong and that’s great. But what we found when we first built this product is that we actually built, I think, eleven products prior to launching the Reagan one. And so we started instrumenting our older products with the crash reporting piece, partly just a dog food. It before launch. And that’s where we kind of came up with this metric that only about 1% of our customers of the older products would ever tell us that something actually went wrong. Like go to the effort of posting in our forum and we’ve seen this even when we are working on, let’s say, a deal to sell to a relatively large company, they’ll go to our pricing page and say, well, maybe the cheapest tier might be like €25,000 a month. There’s no way I have €25,000 a month. And then they put it in and they go, Holy shit, there’s like €100,000 an hour or something like that. The numbers get kind of crazy for what can go on in there.

And of course, then you start thinking about business impact and kind of go, well, even if only a few percent of these are stopping our users from having a great experience or maybe buying something, what’s the cost of the business by having no visibility into this?

And so that’s what we kind of see. We’ve had one customer as an example where they were like, I don’t know if we need this. And then they lost a quarter of a million dollars in a day to a software bug. And it was like maybe we do need this, that sort of thing. It’s quite common for us to see in our process.

Well, when I was looking through some of the stuff you offer, if you turn this on, maybe somebody’s going to get just like piles and piles of error reports. Don’t you have some filtering in place so that people don’t get overwhelmed with the results?

Yeah. There’s two types of things we do to help folks. Firstly is we fingerprint all of those. The sort of Genesis of why we built this was actually my co founder and I years and years and years before we ever launched the product.

We worked in the same It company building Bespoke software, and we were relatively well known for our ability to deliver pretty high quality outcomes for customers versus some of our peers. And one of the things was that we would always instrument the code to send us an email of everything that went wrong.

The problem with that approach was that there was no smart grouping. Right. So you’d very quickly train yourself. You have to be really careful about the sheer amount of information overload. So if I generated 10,000 errors and I got 10,000 emails, that kind of sucked for me.

Not to mention that this was in the days of 20 megabyte inboxes.

So when we were building Regan as sort of like, how do we production on this and make it into a product for people? We realize we really had to do some sort of smart grouping. And so I probably know more than a person should about the identity of an error. Like how do you group two areas together knowing that they’re the same error? And folks might think, well, you could look at the type of the error and the message, well, maybe the problem is messages typically have some sort of unique identifier in it which breaks things. Or you might look at the stack trace where in the code base that it comes from, those sorts of things. So we do all this analysis automatically, put a lot of effort into that. So as the errors flow through, we do that grouping. And so we have one customer. They are a very large global pizza company and they use us and they generate hundreds of thousands of errors every day. But the grouping means that they only have a few I think it’s a few hundred actual root cause bugs that they need to go and resolve. And so we typically would work with the customer and our advice is normally pick up the top two bugs and fold them into the current Sprint. Just fixed two at a time. Doesn’t sound like much. It’s pretty easy normally to go in and fix one or two bugs and you just kind of continually relentlessly improve that software. The other end is we do have filtering that people can apply as well, which is around maybe I don’t want any errors generated by knowing bots various indexes can be sort of bad citizens on the web, or I want to ignore anything coming from this IP address, because that’s where perhaps we’re running a pen test against that box today and we know it’s going to generate a whole lot of spurious errors. All those sorts of things are in the box there as well.

Interesting.

You have to make it manageable. One of the challenges, I guess from product design side is we have some customers that will say things like, okay, we’re a new startup. We want this thing to scream at us about every single instance of an error. And then we get those massive Fortune 500 companies that are like, dear Lord, please make this manageable at our scale. Give us the tools to sift through this. So balancing that can sometimes be tricky.

So have tools like this change the way people develop software. So if I were to write like a shrink wrap type software, which I don’t even know if anybody does that anymore, where I have to try to design it, try to implement it and test it as thoroughly as I can and send it off, and I’m only relying on users contacting me with problems because I can’t get it any other way. I know that people are trying to deploy faster, have these tools allowed people to develop faster, but does it make people more sloppy?

I don’t know if I’ve seen people become more sloppy, but I definitely have heard a bunch of folks tell us that they will frequently be coding and have sort of the Reagan crash reporting screen open on another monitor while I’m coding, just watching what’s going on in the live view, just to see and not typically the people that are pushing to production several times a day.

The thing that I personally and this was sort of serendipitous.

We were one of the first couple of folks to integrate with Slack when Slack first launched. And I remember visiting their office at the time in San Francisco was really small, there was only a handful of people there. And that really changed the game for a lot of these sorts of products because we noticed that folks started to not want to send themselves emails, but they send the notifications about the issues into Slack channels.

And this is how Reagan the company, how we actually do development is that those things stream in there and of course that enables what the cool kids would call chat apps. So something comes in and the team can kind of have a bit of a conversation and Slack around it. Maybe they trigger a thread of the notification in there and they can kind of collaborate and understand what’s going on with a particular issue. It’s also useful for people then spotting something that can kind of go hey, look, that looks like some sort of new issue or something that’s just gone out and we’ve just done this deployment. And so it’s sort of more helped the collaboration within teams pushing things out.

The stories that I think about it of the folks that are leaving it open on another monitor are typically more the smaller teams or individual developers that are wanting to own things from end to end. So anything that does go wrong, they’re going to have to deal with it regardless.

While the chat style is certainly something that has absolutely taken off within the larger organizations and folks with remote teams.

Okay.

Yeah.

To be realistic, it’s overkill to try to thoroughly test everything to the Nth degree and just provably impossible.

Yeah. I mean, that’s my personal view as well. I mean, you ask any of the software engineers at Regan and they’ll be able to tell you that I’m often beating the drama of, like, more unit tests, more integration tests. We need more stuff in here.

We sell the product. And I’m not even an advocate for the idea that we only rely on a crash reporting product.

These things have to work together.

And it’s kind of amusing to me because I see people that often reply to, like, our Twitter ads when we talk about this, and you’ll see things like if you just did test driven development, then you wouldn’t even need this product. And I’m like, I don’t know, that smells like a lot of bullshit to me.

I’ve never seen somebody go here as a completely error free piece of code only because I did test driven development. And if that was true, it would be something of absolutely insignificant size.

There are always things that people don’t count on. And one of the things I find super interesting about the space of error reporting or crash reporting is that in some ways it scares me because you realize that these are all situations that the engineer didn’t think would happen.

By definition, that is almost exactly the same as a security issue. Right. Nobody really sets out to build a security issue unless maybe they work at the NSA. But it’s not something you plan for. And to sort of say, well, maybe we don’t need to even think about security if we did test driven development because we’ve thought of all the edge cases as just baloney.

You need all aspects here. Boeing is not going to suddenly go, we don’t need black box flight recorders because we did a little bit of extra testing.

Just not helpful to anybody to think that there’s one silver bullet answer to everything.

There’s a gamut of different types of companies as well. There are quite a few companies that I know of that are just run by even one person or a handful of people.

And they’re serving a lot of people. And also sometimes, like projects that people rely on that is not even one person that’s supporting it. It’s one person in their spare time. And that person can’t be completely doing everything, dotting every eye and double checking absolutely everything before deploying it. And they can’t watch it all the time. So tools like this, I think, are really great to allow small businesses and even side projects to be as responsive as a large company without really having to do much.

Yeah, well, the other end of it that and I mentioned in my sort of bio that I’m very passionate about both software and business. And on the business side of the table, you’ve got to think, okay, there’s metrics like what’s the cost to acquire a customer? What’s our lifetime value of a customer? What’s our churn rate, all these sorts of things. And I’ve often felt quite strongly that tools like this provide value far beyond just, hey, let’s help the engineers know the extent of the problems and how they might go about fixing them and actually think, well, if one in every ten user has an issue, actually trying to set up my software project, even if it’s a side project now I have to get like 10% more people to even try the damn thing for me to make up for that issue.

And so there is a direct correlation between software quality and software performance and that ability to drive adoption. Right. Nobody likes something that’s slow. Nobody wants to use something that’s buggy.

We’ve got a history littered with where this stuff hasn’t worked. I’ve recently been watching some reviews of old I think it was SimCity when that came out in 2013 or something.

And that thing just didn’t work. It was buggy and it was horribly slow. And now no one talks about SimCity. Right.

These things can kill or help businesses thrive.

They’re not just for the engineer.

I honestly think sometimes folks should wander over to marketing and say, right, we’re going to take some of your budget, because wouldn’t you love it if 5% more of the people you can send to our app actually could pay for it?

That might be the biggest impact that they could have.

So, yeah, there’s a lot of things that I think about beyond just the engineering team here as well, which does have value.

It’s very easy to get my head around why I care about error reporting and stuff. It’s a customer experience thing. On the performance side, if you’re spending on different servers goes up unnecessarily. If you could try to tune that performance, you can spend less or just make things work faster. And that makes sense. So I want to kind of ask you about user monitoring, because when you say user monitoring, my first thought is, aren’t we not supposed to do that anymore? Isn’t that, like anti Privacy? So help me understand user monitoring and the value there.

Yeah, sure. Well, first, let’s start off by saying I absolutely am in agreement with you that we shouldn’t be tracking people in a nefarious way. And that’s exactly why people should look at products like Regan rather than maybe their favorite free analytics tool from a giant ad tech company that was doing crazy stuff with the data.

So first and foremost, our position on all of this is that the data we receive from our customers is our customers data. We don’t do anything scary with it. We did adopt fully full support for things like GDPR out of the European Union, which I think is absolutely fantastic for the consumer. Their right to control their data, their right to be forgotten, their right to say, Tell me about the data you’ve got on me.

All of that stuff is really important, and I absolutely support that. And the plus side of it is as well, is that that was very easy for us to support because we don’t do anything to get you with the data sits in our data stores and it’s available to you, our customer, to do something with, and that’s it.

So back to the specific point of Rum, though.

It’s kind of unfortunate, I guess, in today’s world that the word user and monitoring are in there together because rum is a specific product category that is about measuring the performance of actual users. So the reason real user is in there is because it’s to make the counter argument to things like synthetic testing, Ping the site, tell me the response time.

Synthetic testing is really good for things like SLA checks. Is the site up that sort of thing? Noticing if there’s a baseline change? Like, did we do a deployment where the performance changed? However, synthetic testing doesn’t actually give you any real world insight into the performance of your software. Brahma is really focused on performance part of the equation. It’s less about telling me about what Brian is doing on the Internet and maybe across sites and stuff like that. That’s not what it’s about. It’s about saying here’s the distribution of load times that you’re seeing. So good examples that we’ve seen customers using us with is we had a large ecommerce customer who was rebuilding their front end and react and they wanted to AB test the load time for the users. So it was all well and good that they were moving to react, but they knew that performance was so important that if they lost any time that they would convert less and make less money. And so until their React version was noticeably quicker than the old one, they weren’t going to flip over to that. And that, as an aside, is one of the major things that I’m seeing as a pattern in software across the world is that in the past, everybody focused on what’s my server performance, what’s my slow database query, what’s going to take a while for the server to give me a response?

Nearly every customer that I work with this is usually sitting down and sort of going through their data with them and having some conversations. The service these days are usually responding pretty quick. It’s on those fancy blooming JavaScript frameworks that are adding 3467. 810 seconds to the load time for the user just waiting for the browser to be able to actually grind through all that code and composite the page. But if you’re just doing server monitoring and you’re going, my server is returning this page in 300 milliseconds, Pat myself on the back and not realize that it’s giving customers a ten second load time.

That’s a huge problem. And of course developers are often the least aware of these performance issues because firstly, they get given normally apologies to those that don’t pretty high powered computers because software development can be quite intensive. They’re also working on the code and typically are Loading everything off localhost when they’re in development, which only impacts the network latency. Obviously not the render time, but then they also have all sorts of caching and stuff because they’re reloading the stuff all the time. If you’ve got your first time that a user hits your site and they’re pulling down like nine Megs of JavaScript and your fancy web pack and react that they’re trying to grind through stuff, that’s a shitty experience.

Nobody wants to use that software. So Rum is about understanding how long is the user waiting to do stuff and at what point are they dropping off and what are the outliers in that time?

So it’s all well and good to say that the average load time, for example, is 3 seconds. But what about the last 10% of users? What are they seeing? Is the P 90 like 8 seconds? Is it acceptable for our customers, the 10% of our customers, to get eight second load times? What’s contributing to that? So quite a long answer, but that’s really what it’s about.

It’s not about snooping on your behavior and doing anything creepy like that.

It does sound interesting. Like you said, even 10% are experiencing something that stops their accomplishing what they want to accomplish. You kind of want to know about that?

Yeah. So I kind of think Rum to continue on my labored analogy to aircraft is kind of like your Airlines ratings website. People want to go with the airline that’s going to get them from A to B the fastest.

They’re not going to want the one that’s got eight layovers. So it’s about trying to understand how do we actually perform for our users? Like I said, it’s just a shame that user monitoring isn’t the label for the category because it does sound a little bit more nefarious than you’d want it to be.

Okay, when people are looking at their reports for this is it more of a generalized thing. Are you seeing individual usernames?

So by default, we don’t do any usernames. We do have the ability for folks, if they want to tag like let’s say users locked into a system, they can expose that. So as an example, we use that feature for tracking Ray gun itself. And the great thing about that is let’s say you are a Ray gun user. Brian, you contact us and say, I’m having this problem. Let’s say our team doesn’t even need you to tell us what the problem is. They can just kind of go, Well, Here’s Brian’s, login in our system and here are all the errors he’s had. And here is the performance profile he’s been experiencing and so they can pick that up. But obviously that’s a feature that appeals to some and can’t be used by others based on their own internal Privacy things. Again, we don’t do anything with that data, so it’s not going anywhere. But I totally understand that folks don’t necessarily want to turn that on.

I was just thinking it totally makes sense depending on the different type of site you’re running.

I know everybody always cares about Privacy, but let’s say I was just thinking about a friend of mine that has a software training course site. People are on there to just learn take courses and learn different things to it would seem reasonable to have usernames on there because if somebody contacts you and says, hey, I paid for this course and it’s not working on my machine, you can just look at that and say, oh yeah, apparently I don’t support Safari or something or whatever and try to fix it without having to ask.

Nobody really likes all those questions of like, can you tell me absolutely everything about your computer before you tell me what the problem is?

Yeah, absolutely.

We do see a bunch of our customers using us and that sort of customer support role of like, let’s not now make the user try to have to understand what the computer is doing if we can avoid it.

One thing we have seen folks do and this is fully supported and something we do is there is a mid range where you get the customer that says, well, I’d like to be able to understand who the user is, but we have a Privacy concern with sharing that data, which, like I said, despite all of the protections, I totally understand is that a lot of them do things like, rather than, say put in an email address as an Identifier, they might put in the primary key for that user out of their own database. So we might see a 1234. Well, that means nothing, right? But if they absolutely need to find out something about that user, they can totally punch 1234 into the app and find everything about the errors and performance story for them. And so that seems to be like a pretty happy mid ground for being able to, say, empower our customer to be able to support their users even better, while also not necessarily having to share anything that might be questionable or particularly identifiable to us.

So let’s say I’ve got this small one person shop or something and I start getting some traction on a web application where in the process of growth do you think somebody should start thinking about adding some of these tools to a project?

So I’m obviously pretty biased, but I have the view that I would start out with crash reporting from day one.

The reason for that is that the behavior that I’ve kind of seen is that most people put in a product like this, and they will tend to stop increasing the number of errors that they introduce, but they may not necessarily get the bandwidth or time to sort of go back and fix all of the issues. That’s the ideal.

But the fact of the matter is, especially in maybe that one man band situation, you just might not have the time to fix everything. So you might fix the key things that you know are having an impact and kind of leave the rest there for another day, kind of like the equivalent of how they’re expanding backlog in Jira, but they don’t tend to wildly increase from whenever they put it in because you might do a new release and you’ll notice, okay, this is a new issue. I’ll go fix that right now because I’ve still got all the context in my head. It’s super simple and that’s great. Real user monitoring. However, I probably would not put into a system until I actually had some form of scale or understanding of product market fit unless I had my customers already complaining about poor performance.

I would leave that. We also do see that as well in terms of customer adoption. So crash reporting, we absolutely sell to individuals through to Fortune 50 businesses. Rare user monitoring typically appeals far more to companies at scale. They’re usually the ones that will know that one extra second of load time might cost them $5 million a month. And so therefore, performances is quite paramount because they’re at scale.

And then lastly, APM, which is a newer product for us, kind of has the same behavioral style as crash reporting and that most folks know they kind of need something, and we see that selling from small through larger organizations as well. I’d probably only worry about APM, though, once you have something actually in production live. So you might not put that in just when you were doing the first development work. So that’s kind of how I think about it.

So you can do the crash reporting stuff even on test server or something then?

Yeah, absolutely. So you can put it in there and send it in, flag it as being, say, in your test environment. So the way we do it at Regan is we have different environments, obviously developer machine. We then have an office environment, which is, to be blunt, is a little bit of a relic of the past when we were in a particular office that didn’t have a particularly great Internet connection. So we replicated some of the infrastructure internally to test on. Then we have our beta environment that is a mimic in the cloud, and then you have production. Right. And so each of these actually sends all of their crash data from data and APM data into different buckets in the app so that folks can sort of track that. So going back to that chat Ops example that I used earlier, we do have a non production errors channel in our slack. And so folks can see in there as they’re doing these tests and things, as they’re getting towards prod. If things start cropping up in there before, maybe a customer would have experienced it. So that’s proved useful.

Did you say that it runs even on people’s development environment?

Yeah, they’ll typically have it integrated in Dev as well. We’ll send that off. The thing is, it might be slightly unique to us, but at the scale we now operate out to give the audience, I guess, some idea we process between about 100 million and 500 million API calls an hour through the region platform today.

And so in order to sort of manage and facilitate that scale, there’s a lot of moving parts.

Usually when you’d be building something on your machine, for example, if an error occurs, it pops up with the error message and you can deal with it right there. And then but if you’ve got, say, five or six processes all working together, it can get a little bit murky.

That seems like a lot of data.

Well, we do have some pretty big customers.

One of our customers is an example uses Rum to track the end user experience. And their peak number of users that we’ve tracked across their apps at one time was 88 million concurrent users. Wow. And so, yeah, the platform certainly can operate at terrific scale. Some of the largest sort of services and brands that are out there use this in a pretty big way. There’s a few logos on the site. Some of those bigger ones were obviously under sort of NDA not to sort of talk about. But, yeah, there’s a terrific amount of data that goes through to help these companies improve their offerings.

Now, how many people work at Reagan? How big of a company are you?

There about 40 of us at the moment.

Yeah. We’re working towards probably being around 100 at the end of next year.

Wow. When did you guys start?

So we launched Regan as a product in 2013. So coming up to I guess that’s about five years now. Six years.

Yeah. And when we launched it, our company ran under a different name. And like I said, we built these other products for developers and they sold reasonably well. But we’re only about five or six people at the time because what we’ve done previously and forgive me, this is a little bit outside of the testing realm, but my business partner and I stepped out in 2007 to build our company and we decided we wanted to build software products, but we also wanted to bootstrap the business. And so our first contract in the first week was a quarter of a million dollar deal with Microsoft, where we built some demo where for them on how to build modern, scalable Web applications. And that was really our seed money. And then through the years, up until we built Regan, we kind of kept working on these different products and putting them out there and having mixed levels of success.

But we also teamed up and helped build other businesses where we would take an equity stake in their company and kind of be the engineering team, if you will. So we built, like, New Zealand’s largest philanthropic website that’s done something like $100 million in donations in New Zealand, which is not bad for a country of only four and a half million people.

And we’re part of that. And that was sold. We helped build a business valuation company. We built an email mining service, kind of like the ones that help you better understand what’s in your own inbox and various things like that. When we built Reagan, though, that was kind of a turning point for us and saying, okay, rather than doing these split deals with Other. Orgs, we’re going to sort of go all in on our own stuff. And then the Reagan product actually got so popular quite quickly for us that people started questioning why they were even seeing the old company name, which was mindscape on the credit card bills because they had no idea who Minescape was. And so we changed the company name to Reagan.

Following on from that was when we added the real user monitoring and APM tools to sort of help complete that visibility into everything about how your users are experiencing your software.

So that’s kind of the history of how we got from a couple of nerds going and starting a company to where we are now.

Did you expect this? Are you surprised by the growth of the company and the projects?

On the one hand, I’m pleased about it. On the other hand, I don’t think I will ever be satisfied with the growth. I always want to have more customers, more. Like, the interesting thing for me is people always kind of assume when you’re a business owner that you must be motivated by the money, right? Like, you’ve got to be money obsessed to be building a business. And it’s like, you know what the money is. One thing for sure, you wouldn’t necessarily take the risk without there feeling like there’s the potential for an upside. But the reality is day to day, what really gets me excited is seeing a new customer come on, seeing how our systems handle those data volumes, watching that, seeing them improve their software. The cool thing is I know exactly which apps, for example, on my phone and my desktop that I know that will continually get better because they’re using products like ours. And so I like that side. And then lastly, other thing I really enjoy is as we add more people to the team is just kind of going, you know what? It’s really neat to have built something that is helping pay these people and that pays for their families and mortgages and all of that sort of stuff. All of those softer things are far more exciting to me and fulfilling on a day to day basis than thinking about, hey, what’s the actual enterprise value that we’ve got here?

Yeah, that’s cool. I don’t know if you mentioned it on the podcast yet, but you do like to code still?

Yeah.

Do you get to?

Well, I love to code. So I started coding when I was nine years old using Q Basic on a four, eight, six SX 25 with 8GB of Ram.

And I always say that to people as well. They go, you must have been a smart kid. I’m like, no, that is literally how kind of easy it is to start coding. If a nine year old can self teach themselves, then it’s not that hard. You don’t need to think of software developers as being some sort of mentor member geniuses.

But to me, I was really into Lego as a kid and especially Lego technics. And so to me, when I discovered coding was kind of like discovering a box that had an unlimited number of Lego pieces, like I could stop hounding my parents, I could build whatever I wanted, and if I couldn’t build it, it was because I was too dumb and I had to think harder about a problem to try and solve it. And that sort of psychology has stuck with me till now. So I do not code day to day on the Regan platform these days. I do occasionally do a little bit of coding in the weekend. So an example just this weekend being was that I pulled down the code from our notification worker, which handles the dispatching of notifications to various endpoints like Slack and email and SQS and all that stuff. And I just kind of went through and I looked at sort of the Regun data on a couple of errors that we’d seen out of that thing that looked kind of questionable. I fixed a couple of those up, I made a couple of code tidy ups in there and got that out. And then outside of the more work related coding, I’ve been trying to learn a lot more about machine learning and algorithms. I feel like that is the first call. It like major change that’s come to the software development world in recent years since I stopped being a day to day coder. And I don’t want to be that guy that sort of goes, well, everything. After when I sort of didn’t do it day to day, I just didn’t learn. I want to be able to engage with the team who are working on our ML code and things like that and be able to have a high caliber, high bandwidth conversation with them. So I do invest a bit of time and learning stuff that way.

Yeah, that’s cool. Well, it’s just so fun.

Absolutely.

Even in the machine learning and data science spaces, there’s places for people to expand what they’re doing and learn new things, even if they only have like 20 minutes in the evening to work on it. You do still enjoy coding. Is there anything in particular about your role at work that is the most fun still?

Well, I certainly enjoy my job.

I would also say, though, that the role of CEO is not one for everybody.

It is challenging.

I think one of the things that you certainly have to make trade offs and coding, but often the tradeoffs that you’re making, encoding don’t involve maybe hurting someone’s feelings or making somebody feel bad.

Certainly a lot of decisions that you’re making in the CEO role, you’ve really got to think about the consequences of because they can potentially ruin somebody’s day, week or whatever. So there are elements in there that I don’t particularly enjoy from time to time. But what I do really appreciate is that the team know that I’m a huge nerd, and so I do sort of call it walk the floor, if you will, management by walking around a couple of times a day. And I just love sort of seeing the things that folks are building.

In my mind, software is about amplifying human ability. You can sit there and write some code and achieve so much more because of the ability to leverage the computer. Right. And business is actually the same. How do we sort of bring people together to achieve more than one person can do? And all of us have that sort of iron man fantasy of one man can just do it all or woman. But the fact of the matter is you can’t you’ve got to sort of bring these things together. And so that side of things is what gets me out of bed is kind of seeing the cool stuff that the team can do at scale, some of the amazing things that are in there.

A good example was I mentioned that customer with all of those concurrent users. Well, it wasn’t like the old version of our code just magically handled that. We had to do a whole lot of work to figure out how we were going to scale to handle that from a single customer. And I love seeing the innovation and invention that comes to solving those sorts of problems. So I kind of get to experience the winds without necessarily always writing the code, which is slightly less satisfying, but it’s still pretty satisfying. The bit that kind of concerns me these days is we bring some people on and they might be there for a few weeks and they’ll sort of maybe sheepishly ask me, so how technically you they’re not used to CEO actually knows how to code and can have high Fidelity conversations about how to do some of these things.

That’s cool, though. I mean, it’s pretty neat.

I think of it as a superpower, one of the things we’re building out a business intelligence team at Regan at the moment, to just help ensure that we make better decisions based on our business data that we have. And one of their tasks is to kind of go through our admin site and slowly get rid of all the reports that were coded up by JD on like a Saturday night with a glass of whiskey tells me the data I needed but I didn’t bother with Axis labels or something weird stuff like that.

So we managed to get quite far on that sort of style of things.

But yeah, those days are now behind us.

I’ve been a manager for a handful of years now and my boss told me once that he knew that I was proving when during our reviews I stopped telling him all the cool things I was doing and started telling him all the cool things my team was doing and he said, I think you got it now.

I would agree with that.

Absolutely.

If you’re ever in Portland I’m going to have to have you go whiskey bar hopping with me.

Well you know, there is a non zero chance that I will actually be in Portland later this year.

As we mentioned before we started recording, I think we have an office in Seattle. I’ve been pretty tied to New Zealand this year because my wife and I had our first child and wanted to be supportive by not running away for weeks on end to the other office.

But we’re actually running a couple of events for tech leaders both in Seattle and Portland and I may be there for that as well. So if I am, I’ll certainly reach out.

That’d be great. Cool.

Well I have had a lot of fun talking to you about all this stuff and so thanks a lot for coming on the show.

I’ve had an absolute blast, Brian. I really appreciate it. Thank you.

Thank you JT for helping me understand the power and usefulness of tools like crash reporting, performance monitoring and real performance monitoring. Seriously, I learned a lot in this episode. Thank you to Patreon supporters for continuing to support the show. Join them by going to Test And Code. Comsupport and yes, this episode of Test and Code is sponsored by Reagan. It takes just a few minutes to get started. They provide a small code snippet for you to drop in your code and from then on Ray Gun has your back take control of your app monitoring with Ray Gun. Check them out at raygun.com. That’s Raygun.com. That link is also in our show notes at testingco.com 88. That’s all for now. Now go ahead and test something or maybe make sure your users are getting the best experience they can by adding some crash reports.

You.