Dima Spivak, Director of Engineering at StreamSets, explains how they use pytest and Docker containers to test their application through a REST API and some customizations.

Transcript for episode 58 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

Welcome to Test and Code, a podcast about software development and software testing.


Let’s say you’ve got a Web application that you need to test. It has a Rest API that you want to use for testing. So that’s awesome. Can you use Python for this testing, even if the application is written in some other language? Well, of course. Can you use pytest?


What else would you use? What if you wanted to test that in an environment that is inside of a doctor instance and you want the testing apparatus to get your doctor instances spun up and running and then tear down afterwards and do all that for you? How would you use pytest to do that? Well, to exactly how, I’m not sure, but I know somebody who does know how to do that, and that’s Dima Spivak, DMA is the director of engineering at Stream Sets, and he and his team are doing things like this. It’s pretty awesome what they’re doing there. So let’s hear how they’re doing it in this episode. Thank you to DigitalOcean for sponsoring this episode. Check them out at testandcode.com. Digitalocean.

All right. Today I should have asked you this ahead of time, but on Test and Code, we’ve got Dima Spivak, but I probably mispronounced her name. Is that right? Yeah, that’s right, Spivey.


That’s cool.

I like that.

I’m kind of excited about everything we’re going to talk about today because I think I’ll learn a lot. Some people might not know who you are, so can you introduce yourself? Sure.

I’m Deema Spivik. I’m a director of engineering at a company called Stream Sets, which is based out of San Francisco. Our one liner is that we basically work on what we call data Ops, which is kind of bringing DevOps best practices and principles to the data integration space. So we build the platform based on that. Okay.

And what do you do there?

So I’m a director of engineering, so I kind of oversee two teams. I oversee our technical writing team, which is one of the most important teams that any company can ever have. And then I also oversee a team that’s called engineering productivity, because at Stream Sets, we don’t have a traditional kind of QA organization. Our developers are the ones that are actually in charge of writing all the testing. But what our engineering productivity team does is they’re responsible for creating the frameworks and the automation and kind of the glue that lets developers kind of quickly iterate and write tests. So that’s kind of how we came into using pytest was that our team was kind of tasked with creating a framework that developers could quickly use to write their own testing.

I love that notion of naming a team. Engineering productivity.

Yeah. One of the things we discovered with it was that ultimately the goal wasn’t just testing because we also found that there were other things we could do to enable engineers, like we could build dashboards and things that are not traditional testing things. So we kind of brought in the scope pretty early on.

When you came into it, was that already called Engineering Productivity?

No. Originally I was one of the first employees that was on this team, and we originally were just called the Quality Team because the idea was, okay, you guys are tasked with making sure quality is good. And then as we kind of started doing stuff, we realized, well, one, it doesn’t really scale. If you’ve got 15 developers and then two or three people in charge of quality, you’ll have an imbalance there. So then pretty soon we realize, okay, well, what we’re doing is we’re kind of enabling engineers and the developers on that side to be productive. So the name was volunteered, actually, by one of the people on our team. So it wasn’t even me. Someone else said, hey, how about Engineering Productivity? I go, that’s a great name. We’re going to use that.

I love it. And I like the focus on technical writing, too, because I know that it’s probably different people, but that’s also really important.


So you’re doing some interesting things with Python and testing.

Yeah, sure. So basically one of the things we do is our product is actually it’s kind of weird that we got into testing with Python because our product is actually written in Java. It’s Java. And then our front end for the product is Angular. So we have kind of like a web application that works. It’s kind of weird that we started using Python because we don’t use Python for much of anything else at the company.


But it’s easy to pick up. Or have you found that? So the people on your team have had to learn Python while they’re learning this?

Absolutely. Yeah. So the people directly on the Engineering Productivity Team were both kind of back end Java developers when they joined, and then pretty quickly they became kind of the resident Python East does at Stream sets.

Okay. But that’s fun. Are they having fun with Python now?

They really like it. We find that when we have to use other languages, we have a couple of projects where we actually use TypeScript, and when they have to get away from Python, we always kind of find ourselves missing it again.

Yeah. So your product is Java, but you’re testing with Python. What’s the interface that you’re talking with your product?

But as I said, because it’s kind of a web application, it has a really mature Rest API. And so essentially what we do with pytest and with Python is that the kind of workflows that a user would normally do through a UI. Everything they do through a UI obviously kicks off certain things in the back end and the Rest API. And so my team eventually created kind of an SDK for Python that interacts with our product, which we actually then ended up selling. So it started off as an internal project of like, hey, how can we wrap our Rest calls? But then it turns out that customers actually wanted something like that for their own use to kind of programmatically interact with our products. And so we took that which we originally created for our test framework. We released it publicly, and then from there, we basically created this interface to let developers pretty much quickly write tests that simulate workflows that they would do in the UI.

Okay, you’re testing entirely through the API, then the Rest API.

Yeah. For the most part, most of it’s the Rest API. And then for the assertions to like, when we talk to external systems, we use existing Python libraries. So we’re having users of like, SQL. Alchemy and the Bodo libraries from Amazon and things like that.


Are you testing your live site or having things put up on a test server or how does that work?

Yes. So we make heavy use of Docker. So that’s kind of one of the other things that we were really excited about when we started building the framework, because our main application, we have a number of applications at Stream Sets. But the one that was really the most prominent when I started a couple of years ago, it’s Streamsetts, was something we call Streams. That’s data collector, which is basically an application that lets you move data from pretty much any origin to any destination you can come up with. And it has a nice kind of UI, so you don’t have to write a bunch of code to do this. And that application itself, even before I started at Stream Sets, was actually one of the ways we distributed it was as a Docker container. So one of the main things we did with our test framework was we put in a bunch of hooks and infrastructure. So that with Pi test, when you say Pi Test and then you want to pass in a test, we actually have kind of command line options where you can say Pi. Test, and then you can say SDC version, and whatever is passed in would actually, as part of our fixtures to start everything up, would actually spin up an instance of our data collector, which then the test would specifically point to when they’re running from there on. So it’s kind of like a self contained solution, which is kind of nice.

That’s cool. Yeah. You spin up your test server just from the fixtures.

Yeah. And on top of that, because the fixtures have their when you have in Pi test, when you have a fixture that has a yield to something after it’s done, you can actually clean up after it. So we even have a flag that can kind of stop it from happening. But in general, what our framework lets developers do is on their Mac. If they’re running Docker for Mac, they could basically say Pi. Test and then pass in the name of a test they want to run, and they can say SDC version 3.6. .0 and then if they do nothing else, the test would start up an instance of our product. It would run the test and report results, and then at the end it would actually tear down the Docker containers so that they don’t have all these kind of runaway containers on their laptops.


Okay, I think it’s fun that you brought up that notion of having passing in a flag to not tear it down. Is this so that you can debug things?

Exactly, yeah. So we’ve had a few instances, and usually we try to learn from them and then change the framework to actually pick up the cases. But we’ve had a few times where a test is failing, but the output isn’t telling us much of anything because it turned out that we were logging some kind of failure, but we didn’t realize that it was logging a failure. And so when that happens, you can pass in something we call keepsdc instances. And then what will happen is when the test is done running it’ll, just skip that last step. At the end of the fixture, tear down so that a developer can then log in and go, okay, I clicked around the logs and I discovered that this weird exception was being thrown that I hadn’t thought of.

Yeah, cool.

I think I should add this keep thing around. So I’m seeing a lot of cordlers. I’m at work, I’m not testing Rest APIs. I’m testing physical hardware test instruments. But we do a similar thing that I’m not spinning up the hardware, but from the fixture. However, we have fixtures that will create a connection to the instrument and put it into a preset state, and then all the tests can use all that connection. But on the tear down, there’s a lot of tendency for people to want to clean up what they’re doing, like if they’ve turned on a power level or something or connected to some device to tear that connection down in a fixture. But even if we have like X on first stop on sale that will run the fixture code and tear things down and then it’s not in a debuggerable state. So I think having an extra flag to say, hey, go ahead and keep some of these debug States around. That’s a cool idea. I like that.

Yeah, we added it early after discovering that I am very fallible and I didn’t think about all the ways developers would find clever methods to break their products. Okay.

Thank you to Digital Ocean for sponsoring this episode. Digital Ocean is the preferred cloud platform of hundreds of thousands of innovative companies. Digital Ocean makes it easy to deploy, manage and scale applications with an intuitive control panel and API designed for developers. Get started with a free $100 credit towards your first project on Digital Ocean and experience everything the platform has to offer, such as cloud firewalls, real time monitoring and alerts, global data centers, object storage, and the best support anywhere. Join over 1500 businesses already creating amazing things on Digital Ocean. Claim your credit. Today at testandcode.com DigitalOcean.

Neat pytest is spinning up Docker stuff too. Do you have these Docker images preset set up or so?

We run Jenkins at Streamset, and we basically have automation where every night, whenever we do kind of a full master build of our product, it actually triggers a separate job that my team manages that pretty much create stalker images for our core product, and then all the kind of libraries that plug into our product because our product, because it talks to all these different sort of systems and environments, we actually separate all those out from the core of the product because if we didn’t do that, we would have this like five gigabyte product that we had to distribute. So instead we have what are called stage libraries, where stage libraries could correspond to specific versions of talking to. For example, if I want to talk to an Apache Kafka cluster, it’s a specific version of the Apache Kafka cluster library. And so every single night we actually build images of again, the core product and all the stage libraries as separate Docker images. And then as part of our test framework, because these images are built and then pushed to Docker hub, the test can figure out when it’s running. Like, okay, I need the core version of three 60 nightly, and then I also need these three stage libraries and it pulls all those Docker images down and then kind of composes them into a single coherent product with all this stuff kind of set up for itself.

Okay, so when you said you’re building test framework on top of pytest, are you meaning fixtures or are you meaning something else?

Part of it is fixtures, but part of it is even just kind of the way we package things together. Because we call our test framework. It’s really not an original name, it’s a Stream Sets test framework. We use pytest as our test runner, but then we also handle, as I said, all this stuff about orchestrating Docker containers to spin up our product, and also the fact that we package a bunch of libraries so that as part of our assertions, we’re not depending on our product to tell us that the data got incorrectly. What we’re doing is after we might run a data pipeline that tries to move some data around, we then use these Python libraries that would then directly query our destination. And what we discovered was that if we kind of package all that into a single new project, so we have pytest and all those libraries and we put it all under this Docker container kind of infrastructure. By having that, it makes it really easy to run because then literally any developer can run one command on their laptop, which is like freshly installed. And all of this is kind of ready for them. So when we say test framework, we mean the Pi test part, which is kind of the runner, and then all this other infrastructure with fixtures and the libraries and the packaging and all that kind of stuff.

Yeah. Okay. I think it’s cool that as a side project of this or actually just kind of a bonus that you ended up building an SDK for stream sets, or at least for the data collector part.

Yes. It was kind of a funny story because the way this all came about was a couple of months after I started, we kind of wanted to prove our worth to the whole company. So we had this kind of all engineering meeting where we had people from the field and actually one of our earliest employees when we were showing off like, oh, here’s our test framework, and here’s how we can programmatically build pipelines. He actually stopped the presentation and goes, Customers would pay for that. And we go, really? He goes, yeah, we have customers all the time asking, like, how can they do what you just did through code? And we go, oh, well, okay, maybe we should rethink this. And so that part we actually broke out from the test framework, and now it is a standalone kind of Python project that’s up on Pipe.

That’s really cool. I like that. But it makes your job easier also. I mean, it makes the testing job easier because you need it to hook all your tests together.

Exactly. So the SDK has all these ways that from code, you can interact with our products. And so if a developer and the SDK itself now became a dependency of the test framework, so we broke out the functionality that lets you talk to our product in Python. We broke it out into a separate project. And so a developer who adds some new feature, if they want the ability to test that feature, they first have to expand the SDK to support that feature because otherwise they can’t write tests for it. So we kind of had this clever way of saying because we want the Python SDK to be really fully featured. If you want to have testing for your new feature, you first have to improve the SDK for Python.

Okay, got it. Yeah, that makes sense. I had it backwards in my head. Customers or users of the SDK don’t necessarily have to grab Pipe test and all that.

That’s on top of the defense to go the other direction, right?

Exactly, yes, got it.


Do you have any cool, fun stories or complete failures or anything with all of this process in terms of failures.

The kind of stuff we’ve seen, which has been kind of the craziness of it, is that when we were starting out, one of the things we were kind of hitting a lot was that, as I said, the way our product is kind of packaged is we have this core part of the product, and then we have all these stage libraries that are kind of plugins to let the product interact with different environments. And I would say one of the challenges we had to overcome was that we discovered pretty soon that sometimes developers like to start multiple builds of their product at once. And so we had one instance where the way our infrastructure works, we basically build all these Docker images for the core and then the stage library. But it turned out that developers had accidentally done like one major build of Master, and then they checked in a bunch of new code and then they did a second major build. And because of this kind of lag between building the images and running it, we discovered the next day that we had some tests that were failing in a really weird way. And I kind of looked into it and realized that there was this race condition that happened where because we’re kind of composing all of this into one Master framework with all these containers, we had some containers that had been updated with the newest stuff and some that just hadn’t been updated yet. We were like, hey, we can’t reproduce this area you’re seeing locally. And we realized nobody can ever reproduce this. It’s only if you happen to be running like a split repository with like something’s updated and something’s not we’re doing, which is wanting to compose your product through Docker. You have to really be mindful of the way you tag your images because otherwise you can run into those kind of race conditions.

How do you get around that? Do you have a Docker composing based on like a label or timestamp or something?

Exactly, yeah. So we basically could put the committees for the Master build into that. And so if it discovers that, hey, my stage library is out of sync with my core, depending on what flags you pass. And when you run our test framework, you could either have it fail fast and say you’re doing something wrong, or you could say, okay, well, I’ll run. But here’s a huge warning that this might give you a weird behavior.

Okay, if somebody’s listening to this and going, hey, this is kind of cool. You’ve been doing this for a while. How long has this project been going on?

So this has been going on for about two years now.

Okay, so you can’t jump from zero to where you’re at right away, but what are some of the first steps? I’m guessing having some sort of Python API to whatever you want to test. Or I guess you could just use the Rest API, but wrapping it in a layer is a nice idea. What else do people have to do?

The way we sort of started was there’s this notion of kind of test driven development where you write tests first and then you kind of write the things that are needed to support the tests. And I’m a pretty big believer of it, even for kind of end to end testing. So in our case, when we started, we basically said, okay, what do we want a test to look like? So even before we had any framework in place, we sort of said, okay, what would I want it to look like if I were a developer writing a test? And so in our case, we said, okay, well, I want to be able to programmatically, define a pipeline. So here’s what the code would look like. And we wrote four or five lines of code that was kind of fake code that might look like it. And then we said, okay, well, how would we start the product? And we decided, oh, well, the fixture can do that, so we don’t have to start it in our code. And so I think in general, if someone wants to start from scratch and build something like this, I think it works best when you think in terms of the tests first and you kind of optimize for the quality of the tests and the readability of the tests, and then you can always go back and build your abstractions around that. But if you go the other way around, you’ll sometimes find that you design something that no developer finds intuitive. Like a developer looks at it and goes, what the heck is going on here? This is clearly not written with a developer workflow in mind. And ultimately our customer. Even though the SDK is customer facing, we do have some big customers that really like it. First and foremost, our customer for the engineering productivity team is the developer who works at Stream sets. And so we sort of have always had that mindset of wanting to make it as intuitive for a developer as possible.

I like the idea of starting with something like, I want to test this thing. What do I want the test to look like? And then how do I make the rest of the infrastructure around it happen? You could probably have this be a gradual thing too. You could have it be something you could build up over time while you’re doing other types of testing until you can test more things. Like if you got a new feature put in to an existing project or product, you could maybe test just even just that new feature with this sort of a model.

Yeah, absolutely. And in fact, when we started, one of the first things we actually did was we would have our product like this data collector running on its own, on its own machine. And so we didn’t have to build all the Docker stuff initially. Since it’s a Rest API, we just kind of have to pass in the server URL. And so that’s exactly what we did when we first started, we didn’t build any of the Docker stuff. We started with like, okay, how can I interact with that preexisting instance of our product? And then as time went on, we realized, hey, we could really simplify developers lives instead of requiring that they start up something first and then run the test if the tests themselves can have a flag to do that for them.

So now at this point, are you having all of the developers are writing all the testing then?

Yes. So pretty much across the board, with the exception of if we ever have to make a change to the SDK, we’ll sometimes write like a really basic smoke test to make sure that we wrapped it correctly. But other than that, when developers add new stages to our product, or if they add new expand existing features or add configurations, they’re the ones responsible for either adding new tests or updating the old tests so that they don’t break with their new features in place.

That’s cool.

Yeah, I like that.

And how about learning? Like if you bring somebody else on how to is your engineering productivity team, are they helping to train developers on how to use the tools and how to write by test code?

Yeah, absolutely. They do a few things. On some occasions, they’ll do kind of like presentations. So if we have a new developer who’s really into stuff, who really wants to kind of have a deep insight into it, they’ll actually do a presentation where they walk them through and sort of show them, okay, here is a simple test that talks to a database, and here’s how we wrote it, and here’s how it works. We also have a kind of a Slack channel for our team, specifically on our Slack organization in our company. So people can go in there at all times and ask all kinds of questions.

And then the last part about it is that we are very big on code review. So we often will encourage developers that, hey, if you’re new, go ahead and write a test for some trivial thing that maybe we just don’t have coverage for, but you want to try it out. And then within the code review, we’ll go ahead and give comments to say, oh, we adhere to Pepa or we actually prefer that you don’t do this, but you start this using a fixture instead of having it coded as a function within the body of the module or something like that.


So we have a few of these kind of avenues to onboard people.

How much do you utilize code reviews of test code very heavily.

So we actually have kind of this internal standard at Stream sets where when a developer adds a new test, it has to go through code review. We use Garrett for code review, and we actually have this kind of informal policy, which maybe we’re going to formalize, but basically from engineering productivity we can only kind of give a plus one so we can say, okay, this looks fine as a test, but we actually demand that another developer who’s more familiar with the functionality give it the final plus two before it gets pushed in. So that we kind of have one person reviewing more on style and on the framework usage, and then a separate developer focusing more on okay, as a test, does this make sense in terms of the coverage it’s adding, or is this not a very useful test?

I like the idea of having somebody that knows the test framework and somebody that knows the code under test being part of the review. That’s cool.

It saved us on a few occasions where it turns out that a test looked really good, and then the developer chimed in and goes, this isn’t actually testing anything.

Yeah, I like it. It’s a nice idea. Now, as far as Slack goes, I have to give you a shout out and thank you. You are also somebody that jumps into the Slack channel for this podcast and helps out there quite a bit.

It’s an awesome channel. You’ve got a really vibrant community of people that are really testing nerds. So you’ve done a good service for the Python community.

It’s one of those incredible things that people answer questions that I have no idea what the answer is, but there’s some bright people in there that’s good. Shout out to your product under Test or Stream Set stuff, but let’s give a plug for it. What kind of a person would use this?

So users of Stream Sets are basically anybody that wants to manage data or move data or get insights into data as it’s going through their systems is basically a Stream Sets users. So we’ve got some customers that are huge enterprises, and we have some customers that are really small startups that just don’t want to have to hire a Java developer to hard code in a bunch of solutions to move data around. And so kind of all of those are kind of users of Stream Sets.

Okay, it’s a Java based thing, but I’m guessing that it doesn’t have to be a Java application to use Stream Sets.

Yeah. So Stream Sets, in fact, one of the main reason streams that exist is because you don’t really have to write code at all to use it. So it has a really functional and useful front end and a web UI that’s been beautifully designed so that if you know that you want to move data from like, let’s say you’ve got an Amazon S three bucket and you want to move it into like an Oracle database, you don’t have to write any code to do that. You can actually pretty much drag and drop stages onto a canvas and then connect them together. And in between, you can do all kinds of processing steps in the middle to sanitize your data or make sure that you’re not accidentally having Social Security numbers getting pushed into a database in plain text or anything like that. But basically you don’t actually have to write any code to use it. And as I said, you can run it in Docker. We also have kind of a cloud based solution so that if a user really just wants to go ahead and get started and not have to become an expert on data operations, they can pretty much create an account with us and log in and start creating pipelines within five minutes.

That’s cool with your testing and stuff. You’ve been written a lot of custom fixtures and custom packages and whatever. Are there any, like off the shelf plugins that you’re using for Pride test?

That’s a really good question. So we’re definitely using a few. I’m going to pull it up, actually, as I answer this, because I’m trying to remember the names of a couple of the other ones. So one of the ones we actually do use is for internal purposes. We do some performance testing where we create some of these pipelines and then run them and kind of get a sense of the throughput for certain things. And so one of the things we’ve definitely used a lot of is pytest benchmark, which is really cool because it adds some nice fixtures to let you check the runtime of your code and things like that. I think we’ve also had some instances of using some of the fixtures that make it easier to kind of have the put from Pi test easier to digest with Jenkins. So we use the J unit XML functionality from pytest, because then when we run our tests, we can then have Jenkins parse those results and actually show them up in the normal kind of UI without us having to scrape from the Pi test output. But then most of the rest of it, I think is pretty much our fixtures internally. I think there might be one or two more, but I don’t have it here in front of me.

Okay. I always forget about the benchmark. It’s so often that you want to try to time some code or time a section of test, and I always forget about that. It’d be the perfect use case for that.


Anyway, well, this has been a lot of fun having you on. Is there anything that we forgot to cover that you wanted to make sure to get in?

No, not really. If anyone’s interested in kind of knowing more about sort of the design decisions that went into our test framework, we do have a blog post up, so if you go to Streamsat Comblog, one of the posts there from a few weeks ago is called Introducing the Stream Sets test framework. And so someone who listens to this and goes, hey, I want to know how to install that, or I want to see what weird things they did with Docker. You can see that blog post and it kind of walks you through example tests and points to our GitHub that has all of our open source tests that are out there.

You have this stuff up on open source.

Oh yeah, all of our testing for Data Collector is open source because Data Collector itself is a Apache licensed open source product as well. So if you want kind of examples of how testing can live out in the open, you can check out GitHub. Comstreamsets and go to the Data Collector tests repository.

Yeah, check out the show notes for this episode because I’ll definitely link to all this stuff. That’s very exciting. Thanks a lot. This is fun. Thank you and maybe grab you on some other time.

Sounds great.

Thanks to DigitalOcean for sponsoring grab your $100 credit at testandcode.com DigitalOcean. That link is also in the show notes at testandcode.com. 58 thanks to Dima for all that great info and for taking the time to talk with us. Thanks to Patreon supporters for your continued support of the show and thank you thank you for listening for sharing the show with friends and colleagues, for Fitching in with the cost of the show through Patreon and for giving the show a rating on itunes. It really helps other people find it. And also thank you for checking out the DigitalOcean offering through the link in the show notes. And last but not least, thank you to Marco for the audio help error. That’s all for now. Now go out and test something.