Using classes in Python is a lot easier today thanks to attrs and dataclasses. Hynek joins the show today to discuss some history of attrs and dataclasses, and some differences.


Transcript for episode 189 of the Test & Code Podcast

This transcript starts as an auto generated transcript.
PRs welcome if you want to help fix any errors.


In Python before we had Data classes, we had attrs before attrs. Well, it wasn’t pretty. The story of attrs and Data classes is actually intertwined. They’ve built onto each other and in the middle of all of it. Hennock joins the show today to discuss some of the history of attrs and Data classes. If you’ve ever needed to create a custom class in Python, you should listen to this episode.

This episode is sponsored by Rollbar. Rollbar is the leading platform that enables developers to proactively, discover and resolve issues in their code, allowing them to work on continuous code improvements throughout the software development lifecycle. Rollbar has plans for all situations, from free to large enterprise. With Rollbar, developers deploy better software faster and can quickly recover from critical errors as they happen. Learn More at Rollbar.com welcome to Testing Code.

Welcome, Hynek, to Testing Code. I’m super happy to have you on the show.

Yeah, well, thank you for having me.

I wanted to talk about attrs and things have been crazy and you’ve been involved with attrs for a long time, but you started it, right?

Yes, I did.

It was like a second generation project after characteristic which nobody remembers anymore, which is okay.

I was just born out of my frustration how complicated it was to write properly behaving classes in Python because I needed them from another project of mine called Service Identity where I needed polymorphism to verify identities between certificates and host names and these things. And I find it really frustrating, especially because I was working with other languages at that time. It was much easier to have a properly working, let’s say Struck and Go or even in Haskell for that matter. So I tried to automate the boilerplate away and that’s how characteristic happened, which had a terrible API, which I am ashamed now for.

It’s also very long name, which is why the original address APIs were, let’s say, on a Thursday.

Yeah, I would say that weird, too cool. But it was shocking to me. But before we get into that, I want to mention that I came from the C world and I was surprised from starting using classes in Python how different it was. I mean, I’m used to the C compiler filling in a whole bunch of stuff automatically for you and a lot of that just doesn’t happen with the Python classes. Or maybe I just don’t understand.

What do you get for free with Python?

Yeah, basically nothing. That’s the thing, right. And the problem was not a secret because people were using this weird hacks using name tuples where they just create a name tuple and then subclass it to add behavior to it.

There’s this example in the Python documentation.

So people were always pining for some kind of nicer way to write classes.

There was just nothing. You just had to implement all those vendor methods or you forgot about them and then you got a weirdly Behaving class.

Yeah, I never thought about subclassing name tipples.

I would use them, and then they almost did everything I wanted. But I’d wanted like default values. So there’s like, you could overwrite the default values.

Just technique to do that. It’s weird, though. Yeah.

Just add a message to our name. Pepper, you have to subclass it.

Okay.

When you want behavior.

You don’t have.

A chance unless you.

Yeah, it gets weird very fast and it seems such a foundational thing. So this is why I’ve started this project.

Okay.

Like elevator pitch level. attrs does that stuff for you, right? So if you use attrs, you can create a class. So what do you do? Do you like do a class anyway and decorate it? How do you use attrs?

Yes, it’s a class decorator, which nowadays is a much more common site in Python than it was back in 2015 when I started it.

And it will depending on the way it works, because the current APIs are closer to what data classes do. But it used to be just you just defined fields in the body of the class by assigning them a certain value and others would collect them and then write the code for you.

Okay, this was all before type hits were a thing, too, right?

That was very much before type into a thing.

Sometimes people like, compared this 2015 APIs to something new.

It’s like, not fair, right? Because those new APIs are not possible.

They were not possible. And once they were possible, we did add them to Address, too. So, like, Address actually shipped those type annotations before data classes were out.

Okay, so let’s go back to some early attrs.

Was it picked up by a lot of people? Were people excited about it.

So there was a certain following that was from day one, people who are more into having proper class design and were turned off by David Omnipresent God objects or hacks around named apples.

And I think it was like in 2016 when Glyph wrote his article called The One Python Library Everybody Needs or something like that.

That’s a nice title.

Yeah. And after that it went pretty big. Yeah. So I should credit Glyph too, because it’s also his fault with APIs.

We both use characteristics. We both were frustrated by it, and we had like this unhinged chat which was very early in the morning and very late at night for him. We just tried to come up with shorter night APIs, and one of those ideas was to make the module name callable, which didn’t make it into end product. But if you think that editor s is crazy, you should have seen what we were experimenting with.

Yeah. So when I started trying out attrs, you decorated with ATTR s for attrs, and then you define your variables inside class. Variables are attributes, so they are ATB it’s clever.

And it’s kind of fun to type the first few times and then you’re like, that’s kind of weird. It’s like a little different than a lot of other stuff.

The assumption was always and it was a correct assumption, by the way, that you are going to type it a lot of times.

So you want a name that is somewhat short but still meaningful.

The nice thing with the original address was really that you just imported one name, that editor. Right. And then everything you did was bound to that package name. Like had always explicit, fully qualified symbol names to everything you called written address. So it was in a way it was self documenting. So that was a nice property, actually, if you could get behind that, there were always like these serious business aliases, how we called them, which were just like written out names so you could import them from editor.

What was funnier though is that some people, I think they just didn’t get the pumps and they liked.

Yeah, well, a lot of people didn’t get the pump. Some got really angry about that. That’s another thing. But some didn’t get it. But they really like those short names. So they imported the short names from error. So they did like from error, import s and IB. And they did like this add, s, class, whatever, and then define the attribute by calling IB, which I don’t know, man.

That’s terrible documentation.

If your company is full of that and if it’s your company style, why not? Right? Like it’s like a central thing so you get used to it. But it is something to behold.

Well, it’s even less typing. Right. So IB is less do we still have the same interface? Does that still exist?

The interface still exists because we almost never break backwards compatibility.

Wow. Okay.

Because to come back to your question, address currently has 105,000,000 donors per month.

105,000,000?

Yeah. So it is like solid in the top 30 of all download modules on Pi.

Too bad you’re not charging for it.

Yeah.

One cent per download, right?

Yeah.

I’m not really vain about this because I’m fully aware that a lot of that comes through third party dependencies. Like, address was a good way to have good classes in your own package in a way that is independent from your Python version. For example, Python is using errors internally.

Oh, really?

Oh, yeah. One of the biggest users, actually.

So like 40 million of those 105,000,000 are from Pi test alone.

Okay, so both attrs and pytest are an example of packages that you can use, that you can use decorators and just forget that you’re using decorators. You don’t have to know how decorators work, they just help you in these cases. I think you’re right that people are more used to decorators now than they.

Especially meant class decorators because there’s a lot less used than functional decorators.

I guess that’s true, but I don’t write very many classes, partly because of the initial shock of trying to write classes when I first started Python and having it not be as obvious, but also Pythons fairly modular. Anyway, a lot of the design patterns and stuff that I was used to that are necessary. And C because it’s just C or C because it’s just one giant name. Space are not the same with Python. With Python module is its own namespace. So there’s less need for it, I guess.

But for actual data stuff, you’d still need classes or Structs or something.

Yeah.

This is something I also really believe, and that is kind of controversial, is that I use classes only for data. Not that I don’t use methods. I do, absolutely do. But I’m not a fan of like using classes for namespacing. What you just basically say that you are not doing either.

Some people like it to just put things into classes to give them another namespace. And I don’t see the point really.

Right. But it is almost necessary in some other languages.

Yes. I think that’s why people kept doing it. Right. A lot of things in Python where you just wonder, why are people doing that? All right. They came from other languages. Like the time where Python is the first language of almost everybody is actually a quite recent development. Right.

Very true. We were talking about the S and IB syntax, but things have changed. Right. You can do things a different way now. Do you have a different name for the new interface, the new API?

Well, that’s more or less like a two step thing. Right. So after I helped with getting data classes happen.

Oh, yeah. I jumped too far forward. So you were involved with data classes, right?

Yes, I was Gito emailed me after I announced a release, and I met with him and Eric Smith to hammer down the rough design of data classes.

So that work was going on for quite a while then.

Well, we’ve met at Python. I want to say 2017.

Okay.

Yes, Python 2017. And I believe Python 3.7 came out at the end of the year or something. So basically half a year, I want.

To say was data classes in 37 or 38. I can’t remember.

Three, seven.

Okay.

Yeah. So basically, after the design was decided on, people just came to attrs and implemented the same design for Editors.

Except that we did one thing which I kind of regret is that for Editors to look at class by the annotations, which is the data class away. Right. You write your class and you write X colon and then, for example, Int for integer. Yes. So you don’t have to call any functions in a class body. And we didn’t want to collect them by default because we were afraid to break someone’s code who was adding type annotations to the class buddies for different reasons. So you always had to pass like an argument to the editor s called auto address equals true.

Okay, as an Easter egg, we’ve also added a function alias for this concrete call, which was editor data class, which was never documented, but people immediately found it and started using it, which was kind of funny to us.

So this is the first step because to come back to your question now, everything just looks weird because these names don’t make really sense anymore. You have to pass this argument into the editor. S. So everything got ugly again and we try to get pretty again.

And instead of breaking backwards compatibility or doing some weird stuff to the S, we decided to create new APIs with different names with better names and better defaults. And that’s the new add, define and field function calls define class and field if you need to do customization to your fields in the body, just like you would do with data classes.

Okay, so I do from attrs, import, define and import field if I want field.

Also, you just jump one more step ahead. So first it was still an edge because we wanted to get these APIs right. We wanted to see how people are going to use. We wanted to see if we need to make them any better. So in the beginning they were provisional and then we waited or I procrastinated for a while and then I’ve introduced the new Editors namespace because the editor without an S namespace made no sense anymore because we didn’t do the cute things with editor. Sr. Ib and we also had a lot more symbols within the namespace. When we started out, we didn’t know. We are going to add so many more things to errors like converters, validators, you name it. So there’s many more features and it just looked weird to us.

So last December I finally pulled the plug and added a secondary well, you could say new primary module name to the address package and now you can import from errors. So you can write from Editors. Import, define from import field.

Okay, so one of the things to give it the type. I don’t have to use it’s, just type hints, right. For the type, you only need a.

Field if you want to specify, like exclude the field from in it or something like that.

You only need it rarely.

How about for default values? Is that the same as set of classes?

Yeah, you just assign them.

Nice. It’s a pretty cool history, actually. It isn’t really address versus data classes. You were involved with both and both kind of feed off of each other, right?

Yes, definitely. Most of the animosity between the projects just came from the outside and it was very annoying to live through that.

But it’s not like I was fighting with Eric Smith or something, right? Eric is one of the most wonderful people I know. He always credited us. He credited me specifically.

My name is in the Pep address is in the Pep and yet people just.

Yeah.

So are there things that didn’t go in data classes that you kind of wish had, or are you pretty happy with how that turned out?

That classes are a strict subset of what address does it’s missing some of these features, some of which I like, but there’s no reason for me to be unhappy about because there’s others. Right? Yeah.

If I want my special features, I have always other Studios.

I have used data classes before, especially in my other public projects, to save myself a dependency. For example, in Argonne to CFI, I’m using data classes for data.

There’s a few things that I missed there, but yeah, whatever. It’s just a light version of errors.

One of the things that I found I used well, actually, both of them have the feature that I love is being able to exclude elements from equality testing.

Right.

One of the things that’s not in David classes that you have in attrs is validation. Can you tell me what validation is and how it works?

In attrs, you can just specify a validator as an argument to the address field function call, so that would be a reason to use those.

And you just pass a callable that gets called, among others, the new value that is supposed to be passed into this field and you can raise an error or you can do whatever you want in a validator. So Editors comes with a bunch of them. For example, like is instance thing. So you can just make sure that only objects of a certain type are passed into this class, and if they’re wrong, it’s just going to explode.

I have to be honest that I don’t use this as much.

Okay.

What I use a lot more, and what’s also not in other classes are converters, which allows you to just specify, again, a callable, just a function, then it’s going to get called with a new value.

So for example, if you want to make sure that a field is an integer, but you want to let the user pass anything they want that can be cast to an integer, you just specify a converter Int. There you go.

Then whenever it’s assigned, it’s going to get called with Int on the value. If it’s an Int, it stays the intring, it’s going to be converted to an end.

That can be really useful for small.

Stuff when we set up the type anyways, isn’t it automatically do that? Because I’d say that like X is an Int. For instance, if somebody no, neither data.

Classes nor others do anything about those annotations by default.

Okay.

And there’s always someone coming up and saying, yeah, why don’t you use those type annotations to do type validation?

The problem here is that this only works for very simple cases.

And the moment, for example, if you have a list of something or something like that, it gets very complicated and you have to develop. Like, I don’t know what to call it. Almost like a metal language like Pythonic has, which is not the pure type annotations, but you need more expressiveness to express those kinds of things.

Okay.

Neither data classes nor address do that.

And converting is something else either, right? So just because it has a type, it doesn’t mean that you automatically convert it to that type.

So within your converter. Also, you could check what the type is that you’re getting called with. Yeah, you could do something different based on different types.

Yes. One of the longest running bugs on Address Bug Tracker is the unification of Validators and Converters, because they are really the same thing, except that one thing returns a value and the other one doesn’t. But yeah, we still haven’t.

You’d have to come up with a new name then if you combine them.

There’S so many ideas, there’s still nothing we are really happy with. So we just leave it and wait until something better comes up.

Well, so are there cases where you’d want to have the same function? Do both the converter and the Validator?

Yeah. You can also do things like pipelines, right? Like you call multiple functions on top of the new value. For example, for Validators, you can already pass a list of validators that are executed in turn. So there’s ways to compose those things, but there’s always been some edge case that made a certain solution unappealing.

And also I don’t have unlimited time.

Is attrs done or is there stuff going to be developed in the future for attrs?

So the thing is that I use errors in all of my projects, like professional projects, so errors is not going to be done as long as I’m using it and as long as I have needs for it.

I think the one thing I’m proud of is that Iris now is like seven years old and has reinvented itself again and again just along with the Python community. It just kept evolving and kept adopting new features, new ideas.

I don’t know how long I can keep up with that. Right. Maybe I’ll just have to start a new job at some point and I just won’t use it anymore. I won’t have as much time anymore. That can always happen. But as long as I use it, I think it’s saved it. I will keep running it and I will try to make it as nice as possible for myself.

That’s wonderful. Well, let’s hope that you keep having to use it then.

I’m using it more and more, actually, because of the new API. I love it. I love the new API.

I’m very happy to hear it.

It’s very clean and like you said, for the most part it makes classes simple. I do the from attrs import to find to find the class and then that’s it. I don’t have to do much else. And if I needed something I haven’t actually looked into the converters. I’ll have to play with those.

And the thing that I usually pull in Field for is just to exclude something from the equality test or something like that. But there’s a whole bunch of stuff in field, so people should check out the API for Field has quite a few things that you need.

Oh, yeah, it does.

So there’s one thing, though.

I like converters for simple stuff, but there’s always for as long as errors exist, being a certain pressure to make it more powerful when it comes to serialization deserialization and decomposition of data, this is something that we emphatically do not want to do. We want to give you a great way to write powerful classes.

Okay. And to do this advanced stuff, because we don’t think that this serialization and these topics do not belong to the core models of your code.

We believe that this is a separate issue that should be also treated separately.

Currently, the Python Address project has only two projects in it, or the Python Address organization has two projects in it. One is attrs, obviously, and the second one is Caters.

And Caters is specifically meant for this kind of work where you just can construct and deconstruct complex data into classes and out.

Again, I’ll check that out. Python caters.

Caters is for you to describe data as you expected and parse it for you and vice versa. So basically, to give you on a popular example, attrs Plus headers is somehow equivalent to Pythonic.

Okay.

But in our case, we keep that separate those two concerns, because we don’t think that they should be meddled into one thing.

Oh, I like that, because there’s a lot of times where I don’t need all that other stuff for simple cases. I’ve got a simple class, so I could use either data classes or I could use attrs.

I thought there was something different about slots.

So this is a very good example, because slots have traditionally not been part of data classes, but have been added recently. I think it was in three point ten.

So that means that if you are maintaining a package like an open source package, you have to deal with different feature sets between those between Python versions. Right.

So if your package is supposed to work with Python 3.9 and Python three point ten, you have one Python version that has slots and one that doesn’t.

Okay.

Right. So either you don’t use slots or you add some complexity to deal with this. With errors, you don’t have this problem, like if you have a certain version of errors and this version has the same features wherever it’s possible.

Does attrs use slots then?

Yes.

Is it something you have to configure?

Yes. The new APIs do have slots on by default, because we think that’s a good thing.

Yeah, that was it. And then Data classes are off by default.

Yes. And they also just added it’s super new.

Okay.

Yeah.

Slots confused me at first. Correct me if I’m wrong, I use slots because I don’t want somebody to add extra fields later.

Is that a bad use for it or am I getting it completely wrong?

I don’t believe in paternalizing my users. So for me that someone is me. I don’t want to assign some stuff to my classes by accident. I want to be very explicit what fields exist, and then if I mistype an attribute name, I want to get an error and not just accidentally just assign to a different name. And then something is numb and then I get an attribute error or something and I have to debug what’s going on. So that’s one thing.

It has other websites, too, because slotted classes are useless memory and they’re also a bit faster.

But in general, it’s a good idea, in my opinion, to make your classes slotted unless you need to do something with them that you have some specific needs. But we think that’s a good default.

Yeah. So my first interaction with slots, I thought, okay, so these are a cool thing to have around for advanced cases, and now I’m totally on board with they’re not an advanced feature at all. They should always be used unless you have a reason not to.

Yeah, you could say that not having slots is actually a food gun.

You could say that’s the advanced thing. Right, because you remove a safety feature from your class.

Any other cool things that we should know about attrs before we wrap things up? There’s probably tons of cool stuff about attrs.

Sometimes I just forget what it does by seeing what others don’t do. I think one interesting thing is what’s happening right now is Pep 6810, which is about it’s called like the data Class Transform Pep, and they are trying to standardize a way to define data class like toolkits such that they can be uniformly used by Ides. So this is coming from Microsoft, from the Vs code team, mostly from the pilots people.

And it’s an interesting path because it’s mostly written around data classes, errors, and Pythonic.

So these are the main use cases currently.

I’m not happy with the things they left out from errors, but it’s kind of nice to have this kind of support in mainstream.

Okay. Data class transforms. That’s a standard track for 311. Cool. I’ll keep watching for that. But the final thing I want to say is that I think everybody that is happy with data classes needs to thank you also. And basically working with classes is better now in Python than it was when I started using Python.

So thank you. It’s great.

Thank you. I appreciate that. I do consider it a big achievement of mine, even if nobody knows. But it’s really like I’ve left an imprint.

Let’s say that, yes, you’ve made the world a better place.

Thank you.

Thank you, Hynek. I really like the new API and your positive that of influence on Python. Thanks for talking with me. Thank you Rollbar with Rollbar. Developers deploy better software faster learn more at robust.com and thank you. Thank you for listening. Now go out and remember to have some fun while coding.