to all posts

The Perils of Default Arguments in Python

Published

All is well

If Python was your first programming language, this is probably pretty close to one the first functions you ever wrote:

def greet(name: str) -> None:
    print(f"Hello, {name}!")

If you save it to a file called greet.py and run python -i greet.py, you can call the function from the REPL:

❯ python -i greet.py
>>> greet("Irving")
Hello, Irving!

If you try to run it without any arguments, Python is upset.

>>> greet()
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    greet()
    ~~~~~^^
TypeError: greet() missing 1 required positional argument: 'name'

Python (as well as most other sane languages)1 requires you to pass all parameters that are listed in the function signature whenever you call it. If you want a parameter to be optional, this can be achieved by defining a default argument value:

def greet(str: name = "there") -> None:
    print(f"Hello, {name}!")

If we load the Python REPL with our new and shiny function, we’re able to call the greet without any arguments:

❯ python -i greet.py
>>> greet()
Hello, there!

Be aware that None still counts as a value, so the default argument won’t have any effect in this case:

>>> greet(None)
Hello, None!

We also see that the types we dutifully specified don’t have any runtime effects. As noted in the official documentation on typing:

The Python runtime does not enforce function and variable type annotations. They can be used by third party tools such as type checkers, IDEs, linters, etc.

Other than that, all is well.

Sometimes, all is not well

Oh.

The default value is evaluated only once, at the point of definition in the defining scope. That is a very roundabout way of saying that if you do this

default_url = "eaj.no"


def some_function():
    # Create an inner scope
    default_url = "example.com"
    return default_url


def get_post_url(post_id: str, base_url: str = default_url) -> str:
    return f"{base_url}/posts/{post_id}"

the default argument value for base_url will be "eaj.no", and not "example.com". Notably, we can also pass a variable as the default argument value.

The __defaults__ attribute of a function holds a tuple containing all its default arguments.2 Taking a look at the get_post_url function, we confirm the the default argument value is "eaj.no":

>>> get_post_url.__defaults__
('eaj.no',)

We could have called it, but we’re not here to take the easy way out.3

For immutable values, like strings or integers, this works perfectly. Mutable values, however, mean trouble.

Here is a function that adds a lowercase version of a string to a set, and then returns the set.

def add_lower(value: str, container: set[str]) -> set[str]:
    container.add(value.lower())
    return container

If we decide that having to provide a set to the container is a chore, we might try to do this instead:

def add_lower(
    value: str,
    container: set[str] = set(),  # Default value added here 👀
) -> set[str]:
    container.add(value.lower())
    return container

We have saved ourselves a bit of typing, and substituted it for a world of pain and suffering. As we learned above, the default set will be defined once, and from this point on be brotherly shared by all call sites.

>>> some_set = add_lower("ABC")
>>> definitely_another_set = add_lower("oh NO")
>>> print(definitely_another_set)
{'abc', 'oh no'}

The __defaults__ makes it clear:

>>> add_lower.__defaults__
({'oh no', 'abc'},)

Playing around in the REPL, the mistake is immediate and obvious. But, if the function was just moderately more complex, or the call sites were scattered around around different modules in a larger project, debugging the issues that would undoubtably appear would most likely be a frustrating endeavor.

This is one of those bugs where testing doesn’t really help. That’s not to say this behavior is untestable—this test would clearly fail:

def test_that_add_lower_doesnt_affect_other_sets() -> None:
    first_set = add_lower("abc")
    second_set = add_lower("def")

    assert first_set == {"abc"}
    assert second_set == {"def"}

But if you’re unfamiliar the issues with mutable default arguments, how would you even know to write this test?

There’s more

Even immutable default values can cause trouble. All available types in the datetime module are immutable, but this code does not behave the way one might expect:

import datetime as dt


def is_future_date(
    candidate_date: dt.date, today: dt.date = dt.date.today()
) -> bool:
    return candidate_date > today

It would work fine for a day, but after that it would start giving incorrect results. This is not a problem if the program never runs for very long, but a long-lived web server would be a different story.

In short, default arguments should never be used for values that are required to change between function calls.

Here’s another piece of broken code:

from dataclasses import dataclass
from uuid import UUID, uuid4


@dataclass
class Product:
    name: str
    product_id: UUID = uuid4()

We are using the dataclass decorator from the standard Python library to create a class representing a product. The decorator automatically creates the __init__, __repr__ and __eq__ methods, saving us a bit of boilerplate.

You have probably seen the problem: All products will get the same ID.

>>> Product(name="Banana")
Product(name='Banana', product_id=UUID('e5da25ee-4b96-45e8-8cd4-80d8de116dd4'))
>>> Product(name="Apple")
Product(name='Apple', product_id=UUID('e5da25ee-4b96-45e8-8cd4-80d8de116dd4'))

On the positive side, remembering every UUID in our database will be much easier this way.

So what do I do, Mr. Smartypants

Use None as a sentinel value and conditionally create the mutable default value in the function body instead. This is the fixed version of add_lower:

def add_lower(
    value: str, container: set[str] | None = None
) -> set[str]:
    if container is None:
        container = set()
    container.add(value.lower())
    return container

The same technique works for is_future_date:

import datetime as dt


def is_future_date(
    candidate_date: dt.date, today: dt.date | None = None
) -> bool:
    if today is None:
        today = dt.date.today()
    return candidate_date > today

This way, each function call will create the mutable or time-sensitive value at each function call if it is not provided at the call site.

For the dataclass example, Python already provides some guardrails. If we try to provide a mutable default value for any of the fields, we’ll get a ValueError. This is an updated (and even more broken) version of our previous product class:

from dataclasses import dataclass
from uuid import UUID, uuid4


@dataclass
class Product:
    name: str
    variants: list[str] = [] # Added mutable default value here 👀
    product_id: UUID = uuid4()

Running this file give us

❯ python bad_dataclass.py
Traceback (most recent call last):
  File "/private/tmp/bad_dataclass.py", line 5, in <module>
    @dataclass
     ^^^^^^^^^

# stracktrace cut ✂️

ValueError: mutable default <class 'list'> for field variants is not allowed: use default_factory

When defining a dataclass, the Python runtime will check if each of the given default values are hashable. If it encounters an unhashable value it assumes that is also mutable, and raises an error. Helpfully, the error message tells us what to do.

from dataclasses import dataclass, field  # 👈 Updated import
from uuid import UUID, uuid4


@dataclass
class Product:
    name: str
    variants: list[str] = field(default_factory=list)  # ✅ default factory
    product_id: UUID = field(default_factory=uuid4)  # ✅ default factory

With this, each product get their own variant list and a unique ID:

>>> Product(name="Banana")
Product(name='Banana', variants=[], product_id=UUID('89ebfb63-0d33-474c-a38a-d65c3b08eda1'))
>>> Product(name="Apple")
Product(name='Apple', variants=[], product_id=UUID('a1c88200-530c-42ab-bf82-6a6de7fd6e85'))
>>> banana = Product(name="Banana")
>>> apple = Product(name="Apple")
>>> apple.variants.append("Aroma")
>>> apple.variants
['Aroma']
>>> banana.variants
[]

At least, this will have to suffice until you can rewrite everything in Rust.4


Footnotes

  1. You know who I’m talking about.

  2. Python functions are objects.

  3. I must also regret to inform you that it is possible to assign an arbitrary tuple to the __defaults__ attribute. However, if I ever come across a code base where this is abused and git blame tells me it is your fault, I will personally come to your house and make you write 20 000 words about why that was a terrible decision.

  4. Or are we doing Zig now?