Don’t Get in a Pickle with a Python namedtuple

In this blog I will show you what happens when you want to pickle an object that contains a Python namedtuple.

Python’s namedtuple is high-performance data type that lets us define a custom type which behaves like a tuple. For example, the following piece of code defines a new type Viewer, creates an instance of it and initialises its attributes:

    from collections import namedtuple

    Viewer = namedtuple('Viewer', 'gender age points')
    viewer = Viewer('X', 25, 356)

In the above, line 3 defines a new type Viewer, and line 4 defines and initialises a new variable viewer of type Viewer. viewer behaves like a tuple in a sense that it has built-in methods count() and index() and allows access to attributes via indexing or named arguments. For example:

    print(viewer[2])         # prints 356
    print(viewer.age)        # prints 25
    print(viewer.count('X')) # prints 1

Note that unlike with a list or a dict, to work with namedtuples we need to perform two operations: (1) define the new type, (2) create a new instance of it. Also note that the same two steps are followed when we work with classes. And a namedtuple is just a dynamically named class type. But how exactly does this dynamic part works? It works because when we define a new type (line 3 in the first code snippet), we are actually calling a factory function namedtuple that does the dynamic ‘stuff’ for us (i.e. returns a sub-class of a tuple that is named as what we specify in the function call).

Let’s see what happens when we create a class with a namedtuple member.

import pickle
from collections import namedtuple
import datetime as dt


class ViewerClass(object):

    # class-level type definition
    vt = namedtuple(
        'vt', 'start_date mon_views mon_streams name dob'
    )

    def __init__(
        self, start_date, mon_views, mon_streams, name, dob
    ):
        self._my_vt = ViewerClass.vt(
            start_date, mon_views, mon_streams, name, dob
        )

    def get_start_date(self):
        return self._my_vt.start_date

    def get_monthly_views(self):
        return self._my_vt.mon_views

    def get_monthly_streams(self):
        return self._my_vt.mon_streams

    def get_registration_details(self):
        return (
            'Name:'
            + self._my_vt.name
            + ' DOB:'
            + str(self._my_vt.dob)
        )

    def update_monthly_stream(self, new_mon_streams):
        self._my_vt.mon_streams = new_mon_streams

    def update_monthly_views(self, new_mon_views):
        self._my_vt.mon_views = new_mon_views


if __name__ == '__main__':

    viewer1 = ViewerClass(
        dt.date(2019, 1, 1),
        5,
        6234.80,
        'John',
        dt.date(1989, 12, 3),
    )
    print(
        "Viewer {} has streamed for {} seconds this month.".format(
            viewer1.get_registration_details(),
            viewer1.get_monthly_streams(),
        )
    )

    viewer2 = ViewerClass(
        dt.date(2019, 2, 1),
        5,
        5234.80,
        'Mary',
        dt.date(1989, 11, 11),
    )
    print(
        "Viewer {} has streamed for {} seconds this month.".format(
            viewer2.get_registration_details(),
            viewer2.get_monthly_streams(),
        )
    )

    print(type(viewer1))
    print(type(viewer1._my_vt))

The output of the print statements points to a potential problem that can occur if we try to pickle the viewer objects:

It turns out that the protected variable is of type ‘__main__.vt’ but not ‘__main__.ViewerClass.vt’. And if we try to pickle viewer1 we are going to get this error:

_pickle.PicklingError: Can’t pickle <class ‘__main__.vt’>: attribute lookup vt on __main__ failed

This error should make sense because vt is not defined within __main__, but is defined within __main__.ViewerClass, and thus is not visible to pickle as a subclass of a class.

There are several ways to fix this.

First, we can move the definition of vt outside of ViewerClass to the __main__. This will let pickle find vt at the level it is looking for it:

# module-level type definition
vt = namedtuple(
    'vt', 'start_date mon_views mon_streams name dob'
)


class ViewerClass(object):
    def __init__(
        self, start_date, mon_views, mon_streams, name, dob
    ):
        self._my_vt = vt(
            start_date, mon_views, mon_streams, name, dob
        )

    ...

Second solution involves changing a built-in private variable __qual_name__ to that of the class name:

import pickle
from collections import namedtuple
import datetime as dt


class ViewerClass(object):

    # class-level definition
    vt = namedtuple(
        'vt', 'start_date mon_views mon_streams name dob'
    )
    vt.__qualname__ = 'ViewerClass.vt'

    def __init__(
        self, start_date, mon_views, mon_streams, name, dob
    ):
        self._my_vt = ViewerClass.vt(
            start_date, mon_views, mon_streams, name, dob
        )

    ...

This fixes the issue and makes viewer1._my_vt of type ‘__main__.ViewerClass.vt’, under which pickle can look it up.

I must say that I prefer the first solution, since sub-classing from the ViewerClass may prove to be problematic, and we should avoid modifying private variables.