Pipelines are used a lot in software projects to automated much of the work around build, test, deployment and more. Thomas Eckert talks with me about pipelines, specifically Azure Pipelines. Some of the history, and how we can use pipelines for modern Python projects.


Transcript for episode 96 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


00:00:00 Pipelines are used a lot in software projects to automate much of the work around build, test, deployment and more. Thomas Eckert talks with me about Pipelines, specifically Azure Pipelines, some of the history and how we can use pipelines in modern in Python projects. Thank you Patreon supporters for your continued support of the show. And thank you PyCharm for sponsoring this episode. Listen to their spot later in the show.

00:00:37 Welcome to Test and Code, the podcast about software development, software testing, and Python.

00:00:45 I am excited today on Test and Code to have Thomas Eckert on and we’re going to talk about Azure Pipelines. Thomas, did we meet before last PyCon or do we meet at PyCon?

00:00:56 We met at PyCon. I remember I came up to you at lunch because when I went to PyCon, I went with Microsoft and I was working the Expo booth. And then one of my breaks from the Expo booth, I caught you at lunch and just sat down and we talked about podcasting and pipelines.

00:01:18 It was just around when Azure Pipelines announced a bunch of free pipeline offerings so that open source projects could use Azure Pipelines to deploy.

00:01:30 Okay, I’ve tried it out since and then forgot most about it. But Azure Pipelines are becoming more and more important in the world. But before we get into that too much, all we know about you so far is your name and that you were at the Microsoft booth. But tell us who you are.

00:01:45 My name is Thomas Eckert. I am a software engineer in the site Reliability Engineering.org at Microsoft working on Azure. So my job centers around making Azure more reliable. And that can mean a lot of different things. In particular, my role is centered around building tools for monitoring so that when something goes down, we know exactly what has gone down and who’s affected.

00:02:16 So around all of Azure.

00:02:18 It is a horizontal role. So we build tools to help all the services on Azure. I think there’s 1400 services offered.

00:02:28 Yeah, there’s a lot.

00:02:29 Yeah, exactly. In particular, my work is helping teams to set objectives for what their reliability should be and helping to monitor whether or not they’re in that objective, whether or not they’re meeting that objective. Okay, I can talk about my background too. So I came into tech from physics. I studied nuclear physics and I worked on Monte Carlo simulations of nuclear interactions.

00:02:54 Wow. And was this fun?

00:02:56 It was very interesting. It raised a lot of interesting problems that I got to solve. And so in particular, my work was around predicting the likelihood of nuclear interactions. And that was all in C plus plus very memory intensive or CPU intensive work producing these gigantic data sets. And I enjoyed problem solving and C, but it was actually learning Python to extract the data and present that data where I really got excited about programming itself.

00:03:33 Yeah. So much more fun in Python. Well, it’s a good thing you did most of your work in CPlus plus, or else you would have had your entire research paper done in like a month in Python.

00:03:42 That’s right. I know.

00:03:43 How do you go from a Masters in physics to working at Microsoft?

00:03:47 I finished up my Masters in December of 2017 and I felt like I wanted to go and really dig into programming that that was the thing that I enjoyed most out of my physics work, and in general, I just enjoy solving problems. So physics is super satisfying too. But I got the opportunity to move to Seattle and I started looking for work here and a lot of the work in Seattle is around programming. And I started off by going to the coffee shop across from Tableau, which is a data visualization company based in Seattle and asking people who are wearing Tableau swag what their work was like. And as I made connections through that, I eventually got an interview at Microsoft and nice site Reliability Engineering organization just a little over a year ago.

00:04:45 Brilliant career advice. If you want a job, hang out with people that have jobs, right? Yeah, it’s good idea.

00:04:54 I like it.

00:04:56 A key to PyCharm for sponsoring this episode. The pipe test support alone is enough to try Pie Charm. It’s the best in the industry, and the testing support is in both the Pyramid community and PyCharm Pro editions. If that’s the case, why am I running PyCharm Pro? Well, it’s because there’s so much more that saves me time. Included. Wicked awesome database support. I use both Postgres and Bongo databases. High Term has supported Postgres for as long as I’ve been using It and many other databases too. And as of December release with 2019 three, they also support MongoDB. That’s awesome. I can explore Postgres tables and experiment with queries right in my editor, and now I can explore MongoDB collections. Also, database support is one of my favorite Pro features, even during months that I only use it a couple of times. It’s so worth it.

00:05:45 Version.

00:05:46 2019 three also added Type DICT and literal type annotations. Two type hint features added with Python three, eight. So if you want to try those out, PyCharm now helps you with that too. Very cool. Go to testandcode.com/pycharm and use the discount code there to try PyCharm free for four months.

00:06:07 So Pipelines, you’re working on site reliability for all of Azure, but you know about Pipelines, right?

00:06:13 Yeah.

00:06:14 So Pipelines are an interesting area of growth in the industry. Do we want to talk about the history of where pipelines have come from and how code has been deployed historically, or do you want to focus specifically on Azure Pipelines?

00:06:28 We can talk about Pipelines as a whole because in the Python world we’re kind of abusing Pipelines, I think, to some extent. So let’s talk about Pipelines as a whole. Why do people use them stuff like that.

00:06:40 If you look at the history of how does code get from the computer of the developer to the end user? Traditionally, roles were split between developer and operations specialists, and some places they still are. This set up a kind of slight antagonism between the two because their goals are not aligned. So this is okay if you are shipping something like Microsoft Word once a year or you’re doing patches every few months. But if you go far enough back that your code was shipped on a CD, what would happen is developers would write code and they would get to a gold master and they would write out instructions for operations specialists to get that compiled code to a CD or compile it correctly. There’s also a testing layer in there, but all of these roles were very manual.

00:07:43 Deploying to servers being a task that you’d write down. This is my code. I want it to exist on this server in this building, and here are the instructions for how you will need to compile and deploy it. As we’ve moved into an economy of code, of software being delivered continuously, of services being constantly updated and improved, the slight antagonism between the operations and the developers grows. And that antagonism is based around the developers. Their incentive is to create new code to add features, to change things and operations. Their goal is to make sure that everything is running in a stable state. And the worst thing you can do to mess up a stable state is to add code to change things.

00:08:45 Okay, so this all sounds like big company sort of stuff though, right?

00:08:48 Yeah, you’re right.

00:08:49 I mean, that’s fine. But there’s a lot of people that even small teams that I was working for a large company and we were all like, I mean, we were compiling, burning disks and mailing them to people, things like that. This is for small customer bases. You can’t just mail all of your customers for stuff, but the continuous thing is more towards web based delivery, right?

00:09:15 Yeah, exactly.

00:09:16 Now, even for non web based deliveries, people that are delivering, even if you do have to ship something, but probably not. There’s often installers, even if it’s a quarterly basis. We’re using pipelines for all sorts of stuff now because this whole man just getting rid of the manual process is what’s really important, right?

00:09:37 Yeah. And pipelines are all about automating the code getting from your developer to the end user and every step that is in between those two people. So that includes compiling. If you are in a language that requires compilation, you alluded to Python cheating the pipelines. Because Python doesn’t really get compiled. It gets compiled to bytecode.

00:10:08 Yeah, but we don’t have to do it before shipping.

00:10:10 You have to do it before shipping. So pipelines include testing. It includes just moving the bits from either a GitHub repository or, sorry, a Git repository to your server, your cloud, your service that is running that code.

00:10:32 Yeah. The testing part is the one I like.

00:10:33 Exactly. And one of the things I got to work on in the Sprints at Python was helping the Click project within Pallets build out an Azure pipeline for testing the Click library itself using Azure pipelines and across several different versions of Python.

00:10:57 Yeah. Let’s talk about Azure pipelines itself, but also backing up a little bit. We were talking about how traditional software was built, but two years ago, if we were talking about this for Python, the answer would be just use Travis. And that’s changed a bit because Travis is still there. But there are other continuous integration stuff. So our pipelines and continuous integration the same thing.

00:11:21 There might be some debate about that. I think that pipelines are a part of continuous integration. Continuous integration, as I understand it, is a broader concept that involves even the way that you think about developing your code, that you are always keeping your, let’s say your master branches always in a state that it could be shipped. And that goes back to agile development policies or agile development practices.

00:11:52 Okay. But the tools I use are get some code from somewhere. I do some stuff. If that works, I continue on and do some more stuff. If that works, I continue on and do some more stuff.

00:12:03 Yeah, it’s very iterative.

00:12:04 Yeah. But also iterative, but also you can have stages. So a pipeline process is going to be not all Travis, CIS or any CI tools. A lot of people don’t use it as like a staged thing. That’s something where pipelines are really. That’s why we call them pipelines is because we’re taking some output from one stage and processing it more in the next stage. Right.

00:12:31 Yeah.

00:12:32 I’ve got some projects on Azure, I’ve got some projects on Travis still. And then I’ve got I don’t know what else. I’ve tried a whole bunch of things.

00:12:40 Github, actions.

00:12:42 I haven’t tried Actions yet. One of the things I’m doing is I’m just grabbing my code out of if I do a commit, the pipeline software or the CI software notices that I did a commit, grabs my code, creates a wheel out of it or something, and then runs some tests against that, and then does the same thing on like a whole bunch of versions of Python. Maybe. Or maybe I have it matrixed out. So I’m also testing against multiple versions of some library to making sure I’m compatible with another library or something. If I do that just on my desktop, I’m going to use Talks.

00:13:21 Right.

00:13:21 Can Azure pipelines read Talks files? Or they can.

00:13:25 That is actually how I set up Testing Click in different versions of Python. And I was able to run these Bash commands and say, hey, go into the repository and find the Talks file and execute that Talks file. You can just run Unix commands on do all the testing.

00:13:47 If it’s working on my desktop, I’m hoping that it’ll work on the environment, too. But one of the things the cloud environment gives me is that I can run on different operating systems as well.

00:13:59 Right.

00:13:59 I have to look at exactly how because David Lord took the Click pipeline work that I originally did and he implemented it in full. But at least when I originally wrote it, it was testing across Windows and Linux.

00:14:14 And you can use those hosts and make sure that your code doesn’t run differently on these different hosts if you are dealing with path in some way. That was tricky. Those kind of issues which showed up in Python less and less now with the newer Path module in Python three, those kind of issues used to show up more often.

00:14:43 Yeah. Plus Windows has gotten better too, of like dealing with path directories in the same whatever direction you put the slash in, it works fine.

00:14:50 Exactly.

00:14:50 But also you can. So I was looking at the Click Azure set up, the pipeline set up, and it even tests on Mac. So as your pipeline tests on Mac also.

00:14:59 Yeah, there are macOS hosts that you can use.

00:15:03 So how does that work?

00:15:05 Are there really like a bunch of machines sitting around that are like that, or are they virtual machines?

00:15:12 I don’t know the specifics, actually. I would imagine that there are Macintosh computers sitting in a data center somewhere.

00:15:22 Okay, well, that’s pretty cool.

00:15:23 Yeah.

00:15:24 Azure pipelines are more than you could do. More than you can do in like a traditional like Travis or something, right?

00:15:31 I think, yeah, it depends on what you’re talking about. That’s my understanding. But I only use Travis for one or two projects and they don’t have a huge background in Travis. But I really came to start using pipelines with Azure Pipelines because they were free and really easy.

00:15:45 Yeah. The only thing that I have trouble with with Azure Pipelines is finding it again, it’s in the middle of all the DevOps.

00:15:53 I think it’s in DevOps or something.

00:15:55 Dev Azure.com.

00:15:57 My feedback for the team would be like for us Python people, having a simpler way to get into it than trying to navigate all of the corporate stuff would be cool. I use it for testing. But tell me more.

00:16:09 You’Re talking about using Azure pipelines for testing and that you can test across different hosts and different versions of Python. You can matrix across those. The thing I think that makes Azure Pipelines really powerful and pipelines in general is that you can apply conditionals of hey, if these tests don’t pass, don’t deploy this code to say Pip.

00:16:34 And if they do pass, if all of these tests pass, then go ahead and do the deployment and you can set that to happen. When you merge your code into master that will kick off the pipeline. It will run tests on every version of Python that you support across all your hosts, and then you can connect it to Pipi and the code. The module can have a wheel built and placed right up in the repository right in Pip.

00:17:08 Okay, so what happens if it fails? Does the merge fail or can I set it up to make the merge fail?

00:17:13 Typically there are two different types of pipelines that really are doing the same thing, but are called in different circumstances. So there are pipelines that are called during pull requests and pipelines that are called after a merge has completed. So what you can do if you maintain a Python repository or a module that you can have somebody who makes a Pull request. When that occurs, it will kick off an Azure pipeline just running the test and code will get feedback in the Pull request itself saying whether or not the tests passed.

00:17:58 And my understanding is that you can put blockers to say that the Pull request will not be accepted unless the tests all pass that you put the onus on the person making the Pull request. So then you have the pipeline kick off when the merge is completed. So by this point, if you have your continuous integration, continuous development Get system set up after the Poll request which has been validated, when all these tests ran has completed and the code is merged into a development or a master branch, the pipeline is kicked off again, but with a flag that says this is the master branch. When all the tests pass, push that code to Pip.

00:18:51 I want to do that.

00:18:52 Yeah. So if you look at the latest project that I’m working on at work, when we are developing, we each make our own branch off of the development branch. And when we make a pull request into the Dev branch, a pipeline is kicked off that just runs all the test and code it’s TypeScript transpiling into JavaScript. It’ll first do a build to make sure that that all works correctly. Then once that pull request is completed and the code comes from the feature branch into Dev, the pipeline is kicked off and it does actually the same exact work that it did during the pool request where it built and ran all the tests except when it completes, it will push the code to our development environment and we will go and that’s our preproduction environment. Then when we feel satisfied with that work, we’ll make a Pull request into Master and that same pipeline will get kicked off with the release being directly into production. Okay, we aren’t touching that or clicking any manual release. It’s just the pull request is successful, the code is merged and it is on its way to production. The thing that we do keep is there’s always the last version before this deployment that is still kept live in another slot. So if something goes wrong we can hot swap back to the previous deployment.

00:20:32 Why would you do that?

00:20:33 Let’s say that you didn’t write enough tests or you didn’t test the right thing. And there is a bug that is found in production and say it’s found in version one, two. Okay, well, luckily we keep version one, one live in another slot and we don’t have to go through the whole deployment pipeline again. We can just say, oh, one dot, two had a bug. Let’s flip to one one.

00:21:07 We’re just going to go back a version and we will make the fixes we need to in order to release one, three to replace it.

00:21:17 Okay. And you should write more tests.

00:21:19 And you should write more tests.

00:21:20 Yeah. Once your users are finding bugs, you always need to be writing more tests. Okay, yeah, I had a question.

00:21:28 But I think I have the answer for it already.

00:21:30 Having this all trigger off of something is good, but if I’m just working, I’m not actually changing code when I’m just trying to develop a pipeline and stuff.

00:21:40 Do I have to keep committing stupid things just in comments and stuff to kick off the build? Or can I manually go in and just say no, just run it again?

00:21:50 Oh, you can manually go in and trigger. Yeah.

00:21:52 If people are interested in Azure pipelines, how do they get started? Where do they go?

00:21:57 Do you know if they have their code up on GitHub, they can go to their repository and go to the marketplace?

00:22:11 Let me think here. Search for apps and actions and then Azure Pipelines, they can add that Azure Pipelines directly from GitHub. Directly from GitHub. That’s right.

00:22:22 That’s pretty cool.

00:22:23 I think it takes you over to Azure DevOps where you can edit your YAML so you weren’t just having to write YAML blind, which can be frustrating. It gives you snippets that you can add into the YAML documents so that it will make the whole process easier.

00:22:42 Okay, so I don’t have to write this whole thing by scratch.

00:22:44 If you look around, there’s lots of repositories of samples. Or if you can find another Python package that is using Azure pipelines, you can go and find their YAML and use that as a starting point. And iterate from that right.

00:22:58 Like the pallets click. For instance, if I go through the tool thing and create a YAML document for my repo, I just don’t like it. Can I just edit it directly?

00:23:11 Yeah, once it’s in your code, it’s always reading from the YAML document in your repository. So you can go and pull it in and open it up and vim locally and then push it to your if I’m looking at another repo.

00:23:26 The file is going to be Azure pipelines YML.

00:23:31 That’s cool. And then how about you? If I want to find out what you’re up to, how do I find out more about you?

00:23:37 You can find me on Twitter at Thomas Eckert with an underscore at the end. I couldn’t get Thomas Eckert as a handle so I just added an underscore at the end and you can also look for me on my website, which is Thomas Eckert Dev.

00:23:53 Okay.

00:23:54 I think that’ll take you to anywhere you might want to find me.

00:23:58 Okay. We will harass the other Eckert to ditch.

00:24:03 Do you know him?

00:24:04 I don’t know who he is. No.

00:24:05 Okay. Yeah, I didn’t know there were any other Brian Hawkins in the world until the internet came along and found out that there are a whole bunch.

00:24:13 I learned an interesting story yesterday about how visual studio code Got the handle on Twitter for at code, which is kind of a hard one to get just the word code, but I guess it was used. It was already occupied, but the person using it hadn’t tweeted it in years so they just had to go and there’s some process by which if somebody is inactive on Twitter, you can lay claim to their handle.

00:24:41 Okay, well, it looks like the original Thomas Eckert is suspended So you might have a chance of getting it.

00:24:47 Send Twitter and email I will send an update for your podcast.

00:24:52 Well, thanks a lot for coming on. This was a lot of fun.

00:24:54 Thank you.

00:24:55 Are you going to be at pikon again this year?

00:24:57 Do you know that’s my hope that I’m planning on it. I definitely plan on being at Pi Cascades in Portland, which is in March.

00:25:06 Yeah, I’m going to be there too.

00:25:08 Yeah, I heard you are going to be there because I’m speaking. That’s right.

00:25:11 Yeah. So that’ll be fine.

00:25:12 I’m really looking forward to it. So I’ll see you there.

00:25:15 Sweet. All right, well, and if anybody else is thinking of coming to Pike Cascades, you need to get tickets because they sell out every year. They sell out.

00:25:22 Oh, yeah.

00:25:22 So thanks a lot for coming on the show and we’ll talk to you later.

00:25:26 Thank you.

00:25:29 Thank you, Thomas. I really enjoyed the conversation.

00:25:31 Thank you.

00:25:31 To Patreon supporters. You keep me motivated. If you want to become a Patreon supporter or sponsor an episode, Head to testandcode.com support. Thank you, PyCharm for sponsoring this episode. Check out the database support by going to testing and trying out pro for four months. That link is also in our show. Notes at testandcode.com 96 that’s all for now. Now go out and test something.