MongoDB is possibly the most recognizable NoSQL document database. Mark Smith, a developer advocate for MongoDB, answers my many questions about MongoDB. We cover some basics, but also discuss some advanced features that I never knew about before this conversation.

Transcript for episode 142 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.

00:00:00 Mongodb is possibly the most recognizable no sequel document database. Mark Smith, a developer advocate for MongoDB, answers my many questions about it. We cover some basics, but also discuss some of advanced features that I never knew about before this conversation.

00:00:30 Welcome to Testing Code.

00:00:40 I assume you go by Mark.

00:00:42 I do.

00:00:43 Okay.

00:00:44 Answer to anything. As you know, I have that weird username online, Mark Smith.

00:00:50 At least you have an easy last name.

00:00:55 Do you want to describe why you have Judy two K is your Twitter handle?

00:00:59 I never tell anybody that story.

00:01:03 Okay.

00:01:03 It’s mainly because it’s boring, right? It’s not actually a very interesting story. And so people get really wound up about it. And then eventually I go, okay, I’ll tell you the story, then I tell it to them and they go, yeah, okay.

00:01:16 Much better to leave it as a mystery.

00:01:18 That’s fine. You’re Mark Smith. If people don’t know who you are, what more should they know?

00:01:24 I’m a developer advocate for MongoDB.

00:01:28 I’ve been there about a year, and before that, I’d say I’m kind of part of the Python community in general, given talks at quite a few Python conferences over the years. So if you kind of see me around, that’s probably what I’m doing.

00:01:42 Yeah. So that’s how we met. We ran into each other at a Python.

00:01:48 Yeah.

00:01:48 I can’t remember if you brought your horns to PyCon or not. Do you usually bring your horns?

00:01:53 I did. I do. Now, Pikon US is the only conference that I take the horns to.

00:01:58 They don’t pack down very small.

00:02:00 Right. But people that only know you from Twitter could recognize you right away with the horns on.

00:02:06 Yeah.

00:02:07 Is that why kind of indirectly. I used to have an illustration of a Viking helmet. I was kind of anonymous on Twitter. And then when I started doing developer advocacy, my boss, who was kind of big in the developer advocacy community, was like, you have to have a photo of yourself on Twitter. People have to be able to recognize your face at conferences. It’s a service to the people that you’re talking to to give them some way to link up your Twitter profile.

00:02:31 And I resisted it for ages, kept this illustration on my Twitter profile, and eventually I just thought, you know, I’m going to put a photo up, but I’m going to keep the Viking helmet. So I bought a cheap plastic Viking helmet off Amazon, did the business post looking over the shoulder with a Viking helmet, and then set up as my profile. And I got a smirk emoji as a response from my boss at that point.

00:02:56 It’s good. It’s a good hat, though. Your large banner pick.

00:03:00 It looks like you’re wearing a kilt on stage.

00:03:03 Yes, indeed. I’m based in Edinburgh.

00:03:07 I was brought up in the south of England, hence my extremely bland English accent. But yeah, I moved to Scotland about 15 years ago. You’d struggle to understand. My daughter, she speaks with a broad Scottish accent.

00:03:19 Okay. I used to work with a team in Edinburgh long time ago when I was working with HP.

00:03:26 All right. Yeah, we had a big HP office just outside Edinburgh.

00:03:30 Yeah. Cool. You’re a developer advocate for Mongo. Am I offending you if I drop off the DB when I say Mongo?

00:03:37 No, internally people tend to sort of switch between the two, so I tend to stick with MongoDB generally, but that’s not. I don’t know why, because I actually find it quite difficult to say.

00:03:47 Yeah, you can’t really pronounce it. It’s like Mongodba. I mean, did you use MongoDB before you became a developer advocate?

00:03:56 No, not really. Yeah, I’ve used a bunch of different databases, both SQL and no SQL, over the years.

00:04:03 I actually took a job once just so I could use a whole suite of no SQL software out there because I thought it was a big thing at the time, but there was no.

00:04:13 I mean, this was about twelve years ago, I think, and there was no guarantee that any of it was actually going to survive. And I just thought this is really exciting to get to play with some of these new technologies. So I took a job I’d actually been contracting for a while and planned to take a three month sabbatical. And then one day after quitting my current contract, I then got a call for this job that I then couldn’t turn down and ended up working for a company called Skyscanner, which does flight meta search. They’re much bigger in Europe and Asia than in America, but yeah, they were the first unicorn that I worked for, so they got pretty big over the time that I worked for them.

00:04:48 Let’s just start with the basics. What is MongoDB?

00:04:53 So MongoDB is a database.

00:04:57 It’s no SQL database. I think it’s about twelve years old.

00:05:02 It’s big premise is that it’s a document database. So instead of storing data in tables and rows and not being a key value store, it stores these kind of hierarchical documents. So if you’re used to storing kind of JSON files, they’re sort of analogous to that.

00:05:20 I don’t know whether we’ll cover that in any more detail, but I suspect it will come up at some point that it’s not actually a JSON database. We even refer to it internally quite often as a JSON database. But yeah, you’ve got this idea that values are not just scalar values like strings and integers and boolean values. You can also store arrays and more documents within your kind of top level documents. So it’s quite a rich structure.

00:05:44 I started as a C Plus Plus developer. I do both C Plus Plus and Python now. I never do databases with C Plus Plus. I always use any of my database applications are all Python based and by document we don’t mean like Word docs or something no.

00:05:58 In Python it’s addict.

00:05:59 Right? So that’s what I think of. I don’t think of it as a JSON database. I think of it as a dictionary database. Like I can just store dictionaries there.

00:06:07 I think this was one of the big revelations that I found when I started working with MongoDB is that it actually, although it’s not huge in the Python world, it’s much bigger in the JavaScript world for the same reasons as it’s big in the JavaScript community.

00:06:20 It’s a perfect fit for Python because all those kind of native types that you have in Python that you’re using all the time from Ditch and lists and all the other types that you’re using that come with core Python. They’re all supported out of the box in MongoDB, so it’s actually just a really fluid way to work with data.

00:06:43 Thank you PyCharm for sponsoring this episode. I first tried PyCharm when they started supporting pytest many years ago. Their support for pytest is now amazing. I was a longtime Vim user, so next I needed to test the idea Vim plug in, so all of my finger muscle memory still worked while editing check. It works great. There’s lots of reasons to love by PyCharm, but for me it is because they have the absolute best user interface for test automation. Then I learned many more ways by Charm can save me time. Like really great support for editing Markdown HTML, CSS, JavaScript remote connections to database, and amazing version control support. Really. It’s the best Get diff tool I’ve ever used. And now version 2023 is out and the Shift shift, the Find anything key sequence even lets you search get commit messages. What even that is so awesome. Tons of other cool features have been added in 2023. Check it out and I hope you enjoy it at testing pie Charm.

00:07:47 Well, before we get too much further, you said something that tripped me up. So I also think of a dictionary as a key value look up. So you said that MongoDB is not a key value store, but I can store dictionary. So am I thinking about key value store? Maybe. I don’t know what a key value database is.

00:08:10 Certainly the way I think the easiest comparison is a key value store is one DICT. It’s like when you’re looking up a value, you have you use the key to look up the value that has that key. Okay.

00:08:21 Whereas in MongoDB you’ve got this idea of a collection which it’s just a bucket of documents. So if you were linking up the file system, it would be a directory of files.

00:08:37 Really? It’s lots of dictators stored it’s stored in a namespace.

00:08:42 Okay, so a bucket of dictionaries.

00:08:44 Yeah, exactly. And so when you’re looking for something, it’s probably a good idea that your collection has documents with the same structure in it, but there’s nothing enforcing that by default. So you can store movies and music albums in the same collection if you want to, but that’s probably a bad idea.

00:09:11 The way we think about it is that you should store data together that you query together. So if you’re going to show movies and music in the same list, then you should probably query it from a single collection.

00:09:23 But make sure that there’s a kind of shared subset for those documents that are in there. So maybe they should have a title, a runtime. I don’t know whatever it is that those two things have in common. So that’s your kind of subset of data that you’re going to query on when you list.

00:09:36 One of the things that drew me to document databases is that I don’t have to worry about sequel.

00:09:42 That seems obvious, but it’s also just there’s like this weird translation layer. I use document databases just directly. I’ve not been somebody to use like an ODM or an Orm in the middle, I think. Odms. Is that what they call them?

00:09:58 Well, that’s what we call them in the MongoDB world, because it’s a document Mapper rather than a relational map. Yeah.

00:10:03 Is that the normal, though? Do you have a sense for what normal is? Do people usually use ODM or are they using MongoDB directly?

00:10:12 In my experience, newer developers developers are newer to MongoDB or possibly coming from a different ecosystems. They may be coming from Django, and they’re just used to having this stuff done for them. They will immediately reach for an ODM, and there’s a couple of big ones out there.

00:10:28 I’ve just forgotten the main one that everybody uses, but we developed one internally called Pymom.

00:10:38 You define your class with all the attributes you expect to have on it, and then you can just create those and just kind of dump them into a database, but they’re not actually adding a huge amount of value, and they kind of obscure some of the operations that you want to do. So again, I sometimes forget to mention this MongoDB because it’s just so inherent to the way that you use it is you don’t have to update a whole document. It’s not like you just replaced a document with another document, although obviously you can do that. You can update parts of a document independently. So if you just want to increment a number within a document, there’s a command for that. And you just kind of say, look up this document and increment this value by this, increment that value by this, you can bundle up a whole bunch of operations that you want to conduct on a document to update it. And that’s all then conducted atomically.

00:11:30 Okay.

00:11:32 If you’ve got an ODM, unless it’s also essentially collating those operations that you’ve done locally, and then when you’re applying them to the database that actually applies those updates, then you’re not really getting the benefit. If it’s just going to replace the values in the database, you’re losing that advantage.

00:11:49 Okay, so I guess people are probably maybe using an ODM when they really just don’t want to think about the database at all.

00:11:57 Yeah, the way I would use it if I wanted that ability. And I do prefer working with objects than with raw dicts. I think that’s one of the mistakes that lab developers use is kind of working with Dicks more often than they should in Python.

00:12:11 If I wanted to work with classes, I would use something like Marshmallow to map between the documents coming back from the database and the objects that I want to work with. Because then you can apply obviously you can add the domain, the business operations to that class so that once the data is mapped into it, you can then conduct operations on that. That makes sense.

00:12:32 Okay, so Marshmallow is a good way to be mapping between dictionaries and objects then.

00:12:37 Yeah.

00:12:38 Okay, cool. I think I’ve looked at that before to check it out.

00:12:43 I wrote a competitor once years ago called Kylie for mapping JSON to objects, but it wasn’t as powerful as Marshmallow. I only found out about Marshmallow as soon as I told people. I built this amazing library called Kylie, which was good with JSON, which is an Australian soap joke that only people of a certain age will get. But yeah, everybody said, oh, the name is great, but there’s already a better tool for this out there. So, yeah, that was it.

00:13:14 It was the only bit of production code that I’ve ever written that had a meta class in it. I was so proud of myself. And then, yeah, it’s been used by one person, as far as I’m aware, other than myself.

00:13:26 That’s the way things go.

00:13:28 Okay, so there’s a simplicity side, but is there a size thing? One of the things I was concerned about is I might have just a little tiny, small application.

00:13:36 Would I use MongoDB for small application or is MongoDB mostly for large applications?

00:13:44 People do. So the reason it’s called MongoDB is it’s a contraction of Humongous. So it’s meant to be a Humongous database. It’s meant to deal with web scale data. That was the big thing back a decade ago, which we’re kind of mocked for these days.

00:14:00 But no, it scales down really well.

00:14:03 I have it running on the Raspberry Pi next to me. I have a whole bunch of little small home APIs that I run on top of Mogadi B just at home.

00:14:14 And there are definitely some disadvantages to running a single node. It is designed to be run in a clustered environment, and you get things like automatic failover. You can upgrade each of your nodes separately, you kind of take them out the cluster upgrade and put them back in the cluster so you can have zero downtime upgrades in a cluster.

00:14:35 And yeah, you obviously get the resiliency of just having your data on more than one machine.

00:14:40 But no, it runs fine on a small single node.

00:14:45 Okay, well, if I’m, like at work or something through it on a server footprint wise, similar as, like, say, MySQL or something like that, or any ideas?

00:14:57 I haven’t checked, but that’s what I have in my head.

00:15:00 It’s about that.

00:15:02 It’s not like you suddenly lose all your Ram when you run it. It’s a low resource service, unless, I guess, you’re storing huge amounts of data in. But again, that’s all tunable if you don’t want to store all of your indexes in Ram, if you can deal with the performance hit you get by storing your indexes on disk and Loading them up when you’re running queries and things. Then again, you can run it as a small background service if you want to.

00:15:26 Okay, now, indexes is something I never even think of. Do I need to think about indices or indexes?

00:15:34 Oh, yeah, yes. You’re another person who says indices. I sometimes say indices as well, and I think, oh, no, everybody says indexes these days, but yes, absolutely.

00:15:45 We’ve got various different types of indexes. If you’re doing lookups on a certain field, you’re going to want to index that field in a collection. So that’s the other thing you get with a collection is a sense of this applied.

00:15:58 These indexes go with a collection, basically.

00:16:01 Okay, but is an index something I build and then forget about? Or is it something that I use to do the lookup.

00:16:07 Oh, no, you can build it and then forget about it. It’s in the same way as you would with a relational database.

00:16:14 Well, I don’t know how to use those either.

00:16:15 Oh, fair enough. So, yeah, it’s just something you add to a collection. At some point when you realize that it’s running really slowly, you realize it’s doing a complete collection scan, and you go, okay, I’m looking stuff up by name and I’m sorting it by date. So I’m going to need an index on those two columns, and then after that you never need to refer to them again, automatically use the indexes that are relevant.

00:16:39 Okay.

00:16:41 One of the things you brought up is the downfall of running a single node.

00:16:45 Is it more of a downfall than running a single instance of my sequel or something?

00:16:55 No, as long as you’re taking backups. No, not really.

00:17:02 It’s a file backed up system also, right?

00:17:05 Yeah, absolutely.

00:17:06 It’s fully acid compliant. It stores stuff on disk.

00:17:11 In fact, the main thing, one of the main things that’s happened in the last decade is that the storage engine was rewritten.

00:17:21 Mongodb acquired a company called Wide Tiger with some computer scientists with some serious background behind them who had written this nonlocking concurrent database engine.

00:17:35 And so that was incorporated into MongoDB about five years ago, I think, and has meant that it works better on multi core systems than it did.

00:17:47 The early versions of MongoDB were very different product to the product that’s out today.

00:17:52 It’s been almost completely rewritten sort of over the last decade, which I think is something that people don’t necessarily think they know about. Mongodb is wrong because they’re just outdated. It’s not necessarily wrong. It’s just a different product.

00:18:09 Okay. I don’t have those misconceptions. I just think it’s neat. But I’ve only been using for the last few years.

00:18:15 So you’re actually a MongoDB user, are you?

00:18:17 Well, in a very single note, small sense.

00:18:23 I’ve actually been using the two.

00:18:27 I don’t know if there are other small dates for a small local application.

00:18:35 I’ve used MongoDB and also tiny DB.

00:18:38 That doesn’t support indexes, does it? I think that was the main limitation with it was just a bit. Yeah. It’s just dumping data on file in files, basically, isn’t it? Or is it one big file?

00:18:48 I think it’s one big file or small file, but like, if you really need a small database, the other option would have been SQL Lite. Yeah, that’s it.

00:18:57 Sql Lite is great for small databases, obviously. That’s why it’s everywhere.

00:19:02 Well, except for I still have to think about SQL.

00:19:04 Yes, indeed.

00:19:05 Not that that’s a bad thing. It’s just if I just want to store my stuff, sometimes I don’t want to think about it.

00:19:11 Yeah, totally.

00:19:12 I don’t love SQL. I have to admit, the relational model has some real advantages in terms of structuring data and doing queries with joints.

00:19:25 But I don’t think SQL is a great way of doing that. And it just amazes me that it’s still around kind of 50 years after it was invented or whatever, because I forget what it was called. But there was a competing system for competing technology built by the Ingress database that was much closer to what’s actually happening behind the scenes in terms of the way, kind of the relational algebra. I’m not sure I’ve got the right phrase for that, but that lost out to Oracle, who really backed Sequel in a big way.

00:19:58 And so we’ve been stuck with this language that looks like it’s human readable and writable, but it isn’t really. It just happens to use more words than you really want to use when you’re writing the queries.

00:20:08 You brought up relationships. You can have relationships between different collections within MongoDB. Is that correct?

00:20:15 Yes, absolutely.

00:20:21 Mongodb has this second query language. I don’t know whether you’ve come across it, but aggregation pipelines, have you no use those?

00:20:28 It’s a killer feature of MongoDB, and it’s big. It’s a big complex feature, and everybody misses it. Everybody learns the basic Crud operations.

00:20:38 But the aggregation pipelines is like a parallel query language.

00:20:42 The way that you build them up is you’ve got this kind of list of operations that you want to conduct on a collection.

00:20:48 So you might start with a filter, which is a match, actually, which filters the documents to a sort on it. And then you can add some fields based on calculations, so you can run calculations across multiple fields and sort of add that the value as a new field to each document running through this pipeline, and then whatever is left at the end of the pipeline gets returned to you as your resulting documents. And one of the many operations you can conduct as part of that pipeline is called a lookup, which does adjoin against another collection or possibly your own collection. If you’ve got kind of a barrier where you’ve got relationship to yourself in some way, and it will use indexes wherever possible. I think this is another misconception. I’ve seen this written kind of up and Reddit comments and things like that. It doesn’t use any of the indexes, but it does as much as possible. It will use all of the kind of power available to it to optimize the operations you’ve asked it to conduct, including behind the scenes. In some cases, it will reorder the operations to make them more efficient if they’ll end up with the same result at the end of it. Okay, there’s very clever people behind that feature as well.

00:22:07 Check that out.

00:22:10 I love them. Now, when I do a beginning talk about MongoDB, I show how Find works, and then I’m like, yeah, but this is kind of boring. Every database can look stuff up. Let’s move to aggregation pipelines because this shows where the real power is in MongoDB. So, yeah, you have to go and check that out.

00:22:24 Okay, I guess I have a few more questions.

00:22:28 Let’s see. One of the things that came up during the time where I was paying attention to MongoDB was the introduction of transactions, but I think you brought up that transactions really aren’t that necessary most of the time.

00:22:45 Can you expand on that? Why would they be necessary in the rest of the world, but not in MongoDB?

00:22:54 The fundamental reason for transactions is consistency.

00:22:57 I guess that’s the C in Acid, but when you’re updating multiple tables in a relational model, you need to do that within a transaction, because otherwise there’s a potential for, let’s say you’re adding tracks to an album.

00:23:13 To use the kind of classic relational example, you need to put a lock on the album, because otherwise there’s a chance that by the time you’ve added the ten tracks that belong to it, somebody’s just deleted the album because maybe it had a misspelled title or the wrong ID or something like that. And so you need transactions for essentially doing any update in your database that has any joins in it. If you care about consistency, because MongoDB can store that album as because it stores hierarchical data within a single document, and because updates to a single document are atomic, providing you can make those updates within a single document, you don’t need a transaction. Boy, it’s kind of the transactions implicit, I guess.

00:23:58 We added explicit transactions in 4.0, so actually two, maybe three years ago.

00:24:07 So if you do want to do updates across multiple collections, you can explicitly say, I’m starting a transaction now and I’m committing my transaction.

00:24:17 But it’s kind of about it’s a sign that you haven’t really modeled your data the right way.

00:24:23 The MongoDB way is to store data that goes together together within a single document, ideally.

00:24:31 Okay. Sometimes that’s just hard to think about. Absolutely.

00:24:35 And it’s just a heuristic. I’m not saying you’re a bad person if you haven’t done it well.

00:24:40 I mean, I instantly go to something like an inventory. It’s a situation where you got like all the elements that are in the back room and the front room or something like that.

00:24:51 If I want to subtract five from here and move them over there, they probably aren’t going to be in the same table or something or the same.

00:25:00 The same document, right? Totally.

00:25:05 So what you need then is a pattern called event sourcing, which I’ve never actually implemented myself, but I’m just aware that this is the solution to the problem. And fundamentally, this is the way banks move money around, right. Banks don’t actually do everything in transactions. What they do is they or in database transactions. What they do is model the actions themselves, the actual transactions. So you end up with a stream of operations. It’s like you’ll have an operation which is transfer this money from this account to this account, and then whatever application server they’re using will pick up that operation and say, okay, now I’m going to subtract the money from here. Now I’m going to put it in here, and you can kind of track the history by matching the events that were executed on it.

00:25:55 If you Google for event sourcing MongoDB, there’s a two paragraph page on it on the MongoDB website, which, if I’m being honest, it’s not hugely useful, except for as a pointer, that event sourcing is maybe something you should know about. But there’s a really good blog post that somebody’s written.

00:26:11 I think his name is Depal about how to implement this in MongoDB. And so he implements the sort of event collection, which is your stream of events, and then he implements a view on top of it. So the aggregation pipelines, you can create the sequence of operations. You can also say create me a view where essentially and apply this aggregation pipeline to this collection as a view. And then you can just do read only operations on that aggregated collection. And so he’s got this neat aggregation that actually creates your kind of view of each account, and it embeds the history in each one as a bunch of operations that’s been conducted on the event stream. It’s really smart.

00:26:54 That sounds great. I have to check that out, too. So many things to look at.

00:26:58 I just need to go and implement it myself now, so I can probably get my head around that to do this.

00:27:03 The last thing I guess I wanted to touch on was when would I reach for MongoDB? If I’m doing a little tiny command line application that just needs to store a K of data or something like that? Maybe not.

00:27:16 The official line is MongoDB is a general purpose database, and you can use it for any data storage needs that you have.

00:27:24 But no, I think obviously for a tiny application it’s not really an embedded database. So if you want to store data like use SQL, I would genuinely love an embedded subset of MongoDB for storing stuff on disk on Linux. In fact, now I think about it, we do have a product that does that. We have a product called Realm.

00:27:44 It’s a slightly different model to MongoDB itself, but you can store the data locally.

00:27:50 It runs on React native, and there’s a node version of it as well. So it would be difficult to use with Python, which is why it doesn’t immediately jump in my head. But it’s got a neat sync feature where if you want to sync the data to a central MongoDB server, it will handle all of that and conflict resolution that isn’t syncing to a centralized database or something like that, then MongoDB is probably the wrong choice. So I’d go with a plain file or some sort of key value database or SQL Light, but everything up from that. I think as soon as you’re sharing data with other users, I think it fits a huge number of use cases.

00:28:35 So even like a local thing. So if I’ve got a local web app or something like that, where I’ve got just a team of people accessing the same data, it might be a good thing for that.

00:28:46 Yeah, I think so.

00:28:50 One of the places where I’d really like to see it used more is data science. So you’ve already mentioned it just really easy to get your data in there, and then with aggregation pipelines, you can conduct a whole load of operations on it, and you’ve got that agility of being able to just kind of modify the data in place. You don’t have to design a complex relational schema upfront and then force your data into it. It’s just like get the data in, and then you can kind of clean it up later if you want to, or you can change the structure later or stick a few on top of it to sort of impose some order on it.

00:29:22 I think it fits that really well.

00:29:24 And as my application grows, do I have to care about migrations or something like that?

00:29:30 Yes, it’s a short answer, but you don’t have to migrate in one go because you can adapt to the data, because you can store different structured data in a collection if you want to, you can write programs that can deal with those different structures of data.

00:29:48 So you can choose to leave the old data in its old format. And never change it if you don’t want to. And as long as you’ve got some facade in your code that can accept the old schema and kind of manipulate it so your application can still deal with it, then you never need to change it. You can do kind of update on Read. So every time you read the data you just go, oh, it’s in the old format, I’ll write it back in a new format. I guess that would be an update on Write, really?

00:30:14 Or you can choose to do a big bang migration and just say, look, I just want my data to all be in the new format like ASAP and run that as either a background operation or just as an upfront operation in your app server. But I think the flexibility there is quite nice that you don’t have to do the end of the world migration because those are the things DBAs are terrified of. When I’ve worked in companies with big databases, it’s been like we need to change the schema, so we’re going to be prepared for some potential downtime. Hopefully this won’t run too long. We’ve run it on the staging servers and it only takes 2 hours.

00:30:52 We’re going to do this and to run it on different servers and then move everybody over.

00:30:57 It’s just data is hard. I think as somebody who sometimes has had the chance to write pure software where there’s no real data being stored at any scale, it’s like when you get back to data, it’s like, oh yeah, this is really hard. You can’t just stick this in a container and not worry, make it ephemeral.

00:31:14 We have to make sure that this never goes away and it’s always in the format that we expect and that there’s no dud values in there.

00:31:22 Well, so when you just start using MongoDB, you don’t have to have a schema.

00:31:28 Is there a notion of a schema in Manga that’s another underused feature of MongoDB is you can apply a JSON schema to a collection and use that to validate all data going in. So it doesn’t validate the data that’s already in the collection, but it does it again on right. Essentially it makes sure that it validates your data as you’re changing it.

00:31:48 Okay. And do those have like version numbers that you can say, I’m using version X of the schema or of the data?

00:31:55 Yes, I think so.

00:31:57 Jason schema is quite powerful.

00:32:00 I believe you can tell it that if a value is set to one, then you’re following this version. And if I could be wrong about that, that may be a limitation of it, but one of the things I really like about it is that you don’t have to validate the whole schema. So in the example I gave you before where you’re storing multiple data types, but they all have a name and a runtime and I guess an ID and a date that they were added, you can just validate that subset and say they must have this or they’re not a valid document and that’s quite nice. And then the rest can be as flexible as you need for your data model.

00:32:36 Okay. And then if you were going to go and migrate all of the data, is that migration something that you have to write yourself or just like read all the data and then write it back or something like that?

00:32:50 Yeah. Do it one record at a time or document at a time.

00:32:55 It’s tedious, but it’s no different to any other database.

00:33:00 I suppose there are tools out there that will do migrations for you. As far as I’m aware, we don’t have one of those.

00:33:05 Yeah, well, like, for instance, I think I thought that Django had their own migration scheme.

00:33:11 Yeah, Django’s migrations are amazing.

00:33:15 Andrew. Well, he wrote it twice, but Andrew did a really good job.

00:33:17 Well, cool.

00:33:18 Well, this was a lot of fun. Did I forget to ask anything that you really want to say about Mongo before we wrap it up?

00:33:27 So we talked about a lot about running MongoDB on your own device or server or whatever, and I didn’t get a chance to pitch the fact that we have this big MongoDB as a service offering called Atlas.

00:33:43 Full cluster of MongoDB just by pressing a button. Free tier is really good, lasts forever and it scales up and down automatically for you as you require so it can save you quite a bit of money Rather than sort of buying the biggest machines that you think you need in the future.

00:34:02 Yeah, so I just wanted to get that in there. It’s a really good way. If you just want to get started with MongoDB, it’s much easier than installing MongoDB on your own device. Cool.

00:34:10 Well, thanks so much for your time. And if people want to reach out to you, is there a place that they can reach you?

00:34:16 Twitter is usually the best place just on GD two K. I can’t say gdtk either. I’ve got a job for a company I can’t pronounce and a Twitter handle I can’t pronounce.

00:34:25 Well, we’ll put links in the show notes, of course, and get this out so everybody can go out and try MongoDB.

00:34:32 It’s been great chatting.

00:34:33 Thanks a lot.

00:34:38 Thank you, Mark. Great info about MongoDB. Thank you, PyCharm for sponsoring the show. Try pycharmyourself at PyCharm. Thank you, Patreon supporters join Them at support show notes for this episode 142. That’s all for now. Going out and test something.