Raymond Hettinger Holiday Party Talk 2017
Raymond says: I will tweet out a link to this presentation later. It is in sphinx.
Notes are in this folder, /outlines/talks
, as HettingerHolidaySF2017.pdf
.
- Mission: Train thousands of Python Developers
- Mutable Minds, Inc.
- raymond.hettinger@gmail.com
- Training Videos: Free to users of Safari Online: "Modern Python: Big Ideas, Little Code"
- Raymond says about a $100 on-ramp to using Safari Online
- @raymondh on twitter
I use sphinx to prepare my talks / slide decks. Never intend to go back to Powerpoint.
I am a manufacturer of Python Programmers. I take someone good at something and make them good at something plus Python programming.
Questions from the end of the talk
I didn't really record them all.
- (Mentions offhand he has an OCaml background)
- Q: Why can't I use a value
use_slots = True
instead of manually adding slots- Raymond: I expect some discussion this week on python-dev about that. Solution is probably that the class decorator will return a new class with the slots in it. It's a little late to turn it on and off with true/false. The present design doesn't preclude that feature. We need to avoid making it impossible. Also because no one who reviewed the pep and no one thought of it, or eric wanted to do it that way.
Python 3.7
- How to hold these 3 numbers: 33, 1.00, 0.5 (HSL Colors)
Class Generators: dataclasses and namedtuples
Problems to be Solved
I want to pass around a bunch of instances of colors: HSL system.
Dark Orange: 33 degrees on the color wheel at 100% saturation and 50% lightness.
from bunch import Bunch
from types import SimpleNamespace
? import record # for recordtype
? import dataclass
# tuple
dark_orange = (33, 1.00, 0.5)
print(dark_orange)
# dict
dark_orange = {'hue': 33, 'saturation': 1.0, 'lightness': 0.5}
print(dark_orange)
# simple class
class Color:
''' there are 6 basic dunder methods, __init__, __hash__, something else, __lt__ etc,
'''
def __init__(self, hue, saturation, lightness=0.5):
self.hue = hue
self.saturation = saturation
self.lightness = lightness
dark_orange = Color(33, 1.00, 0.5)
print(dark_orange)
# bunch recipe ########### Bunch can be written in three lines (not in std library)
dark_orange = Bunch(hue=33, saturation=1.00, lightness=0.5)
print(dark_orange)
# simple namespace ######### Bunch, but in the standard library, written in c
# good, because it's in the standard library... ha but it's just bunch
# this way is pretty good, repr but can't compare it, can't sort, can't hash...
# takes a bunch of space... has an object with a reference to a dictionary, 300 bytes
dark_orange = SimpleNamespace(hue=33, saturation=1.00, lightness=0.5)
# named tuples ######### i stole it, i'm an aggregator... i found it reinvented dozens of times
# i brought them all together in a massive chart, took the best ideas from each, aggregated
# published the recipe, got lots of people to use it, took it to python dev, we put it in
# internally same name tuple as before, but now written in a class format... before what?
class Color(NamedTuple):
'''
benefits from before
one from each line
see the datatype, use default
'''
hue: int
saturation: floor
lightness: float = 0.5
# calling the NamedTuple... 72 bytes, 4 times smaller than SimpleNamespace
# make one namedtuple then you can make another one based on it, just change an attribute and clone
# named tuples are hashable so they are useable as dictionary keys
# immutable! - whaa - you just have to use the replace() method to make another... too hard
dark_orange = Color(hue=33, saturation=1.00)
print(dark_orange)
''' Python 2.7 so it's commented
# record by george sakkis ####
# python 2.7 recipe and not changed for Python3
# we had mutable nametuples for a very long time, 8 years, no typing, needs typing
# but it didn't have a cool name#127.0.0.1 dev.hackersinresidence.c so no one knew!
Color = record.recordtype('Color', ['hue', 'saturation', 'lightness'])
# mutability is fun!
# 64 bytes, smaller than a tuple... tuples store the length, instances don't unless you add the field
# so this can't tell you its length... but it's smaller
dark_orange = Color(hue=, saturation=1.00, lightness=0.5)
'''
# dataclass ##################
# half the people see it as a data container, half the people see it as a class
# rorscach test
# you can start using it in Python3.6 with raymonds code, or wait for python3.7
# in python2, you can import recordtype as dataclass - ha!
# biggest way of all at 296 bytes, bigger than a dict
@dataclass
class Color:
''' it writes classes for you and holds data
'''
hue: int
saturation: float
lightness: float = 0.5
dark_orange = Color(hue=33, saturation=1.00)
print(dark_orange)
# dataclass with slots but no defaults ########
@dataclass
class Color:
''' 64 bytes but you can't have any default values, or use george's recipe
'''
slots = ['hue', 'saturation', 'lightness']
hue: int
saturation: float
lightness: float
dark_orange = Color(hue=33, saturation=1.00, lightness=0.5)
print(dark_orange)
# you learned the history of how to store 3 numbers
# this gives some output
Details about each data container option
Have to see the notes for this... it's really fast
Bunch and SimpleNamespace
Recipe
class Bunch(object):
def __init__(self, **kwds):
self.__dict__.update(kwds)
Optimization
class Bunch(object):
''' much faster, replace the dict with kwds
dict is guaranteed to be ordered
'''
def __init__(self, **kwds):
self.__dict__ = kwds
C version built into Python
from types import SimpleNamespace
Named Tuples - same size as regular tuple
- Replaces tons of stuff with NamedTuples when doing code review
- raymond loves fstrings - brought by eric smith
- who did dataclass - eric smith
- are you going to remember his name?
You can let the code generator build it then modify it yourself... how? what? I think it's written in python so you need to paste it in or something... no idea
Data Classes - PEP 557 has landed
- Validates that NamedTuple was doing the right thing but we need mutable. And a cool name.
- Immutable if you want (possible to freeze the data)
- PEP says - mutable namedtuples with defaults
- main design goal is to support static type checkers
- highly opinionated about typing, no way to use it without typing
- static typing was optional, not here... you can put a type of
None
orany
and it will be ignored. - have to explicitly opt out of typing with
None
orany
- raymond is working on a workaround
- boilerplate for class, attribute based dataholder
- is it a code generator? is it a data holder? two different ways of looking at it
- generates code for you, generates class docstring, init, repr, eq with strong typechecking, rich comparisons, hashing, read-only enforced by setattr, delattr
from dataclasses import make_dataclass
- this thing is huge unless there are slots... but with slots you lose defaults
- the language has a built in constraint against slots and defaults at the same time
Do you always know if something is going to be integer float? No. 12.5 timezones off in india!
When you import types it actually executes code... so there is debate about whether they will stringify types in make_dataclass
-
if you read python-dev you will get all of the drama
-
emulate a namedtuple (see slides) with
from operator import itemgetter
anddataclasses
Summary
I have been studying dataclasses since the first draft of PEP 557.
I really like the new dataclasses!
If you need mutable named tuples today, try George Sakkis's Record class or use the attached dataclass code that runs fine on Python3.6 but not prior. (code in raymond's slides)
See the 4 tuple code generator table: Namedtuple, Record, Dataclasses, Dataclasses w/ slots
Gains and losses
(see the slides)
Some Nice Improvements
(see the slides)
Some things that aren't as nice
- higher complexity and learning curve, 700 ish lines of code, couple hundred lines of tests, lots of specs.
- dataclasses is its own miniature ORM for python stuff, declare, fields() function, hashable, etc.
(see the slides)