The Perils of Default Arguments in Python
All is well
If Python was your first programming language, this is probably pretty close to one the first functions you ever wrote:
def greet(name: str) -> None:
print(f"Hello, {name}!")
If you save it to a file called greet.py
and run python -i greet.py
, you can call the function from the REPL:
❯ python -i greet.py
>>> greet("Irving")
Hello, Irving!
If you try to run it without any arguments, Python is upset.
>>> greet()
Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
greet()
~~~~~^^
TypeError: greet() missing 1 required positional argument: 'name'
Python (as well as most other sane languages)1 requires you to pass all parameters that are listed in the function signature whenever you call it. If you want a parameter to be optional, this can be achieved by defining a default argument value:
def greet(str: name = "there") -> None:
print(f"Hello, {name}!")
If we load the Python REPL with our new and shiny function, we’re able to call the greet
without any arguments:
❯ python -i greet.py
>>> greet()
Hello, there!
Be aware that None
still counts as a value, so the default argument won’t have any effect in this case:
>>> greet(None)
Hello, None!
We also see that the types we dutifully specified don’t have any runtime effects. As noted in the official documentation on typing:
The Python runtime does not enforce function and variable type annotations. They can be used by third party tools such as type checkers, IDEs, linters, etc.
Other than that, all is well.
Sometimes, all is not well
Oh.
The default value is evaluated only once, at the point of definition in the defining scope. That is a very roundabout way of saying that if you do this
default_url = "eaj.no"
def some_function():
# Create an inner scope
default_url = "example.com"
return default_url
def get_post_url(post_id: str, base_url: str = default_url) -> str:
return f"{base_url}/posts/{post_id}"
the default argument value for base_url
will be "eaj.no"
, and not "example.com"
. Notably, we can also pass a variable as the default argument value.
The __defaults__
attribute of a function holds a tuple containing all its default arguments.2 Taking a look at the get_post_url
function, we confirm the the default argument value is "eaj.no"
:
>>> get_post_url.__defaults__
('eaj.no',)
We could have called it, but we’re not here to take the easy way out.3
For immutable values, like strings or integers, this works perfectly. Mutable values, however, mean trouble.
Here is a function that adds a lowercase version of a string to a set, and then returns the set.
def add_lower(value: str, container: set[str]) -> set[str]:
container.add(value.lower())
return container
If we decide that having to provide a set
to the container is a chore, we might try to do this instead:
def add_lower(
value: str,
container: set[str] = set(), # Default value added here 👀
) -> set[str]:
container.add(value.lower())
return container
We have saved ourselves a bit of typing, and substituted it for a world of pain and suffering. As we learned above, the default set will be defined once, and from this point on be brotherly shared by all call sites.
>>> some_set = add_lower("ABC")
>>> definitely_another_set = add_lower("oh NO")
>>> print(definitely_another_set)
{'abc', 'oh no'}
The __defaults__
makes it clear:
>>> add_lower.__defaults__
({'oh no', 'abc'},)
Playing around in the REPL, the mistake is immediate and obvious. But, if the function was just moderately more complex, or the call sites were scattered around around different modules in a larger project, debugging the issues that would undoubtably appear would most likely be a frustrating endeavor.
This is one of those bugs where testing doesn’t really help. That’s not to say this behavior is untestable—this test would clearly fail:
def test_that_add_lower_doesnt_affect_other_sets() -> None:
first_set = add_lower("abc")
second_set = add_lower("def")
assert first_set == {"abc"}
assert second_set == {"def"}
But if you’re unfamiliar the issues with mutable default arguments, how would you even know to write this test?
There’s more
Even immutable default values can cause trouble. All available types in the datetime
module are immutable, but this code does not behave the way one might expect:
import datetime as dt
def is_future_date(
candidate_date: dt.date, today: dt.date = dt.date.today()
) -> bool:
return candidate_date > today
It would work fine for a day, but after that it would start giving incorrect results. This is not a problem if the program never runs for very long, but a long-lived web server would be a different story.
In short, default arguments should never be used for values that are required to change between function calls.
Here’s another piece of broken code:
from dataclasses import dataclass
from uuid import UUID, uuid4
@dataclass
class Product:
name: str
product_id: UUID = uuid4()
We are using the dataclass
decorator from the standard Python library to create a class representing a product. The decorator automatically creates the __init__
, __repr__
and __eq__
methods, saving us a bit of boilerplate.
You have probably seen the problem: All products will get the same ID.
>>> Product(name="Banana")
Product(name='Banana', product_id=UUID('e5da25ee-4b96-45e8-8cd4-80d8de116dd4'))
>>> Product(name="Apple")
Product(name='Apple', product_id=UUID('e5da25ee-4b96-45e8-8cd4-80d8de116dd4'))
On the positive side, remembering every UUID in our database will be much easier this way.
So what do I do, Mr. Smartypants
Use None
as a sentinel value and conditionally create the mutable default value in the function body instead. This is the fixed version of add_lower
:
def add_lower(
value: str, container: set[str] | None = None
) -> set[str]:
if container is None:
container = set()
container.add(value.lower())
return container
The same technique works for is_future_date
:
import datetime as dt
def is_future_date(
candidate_date: dt.date, today: dt.date | None = None
) -> bool:
if today is None:
today = dt.date.today()
return candidate_date > today
This way, each function call will create the mutable or time-sensitive value at each function call if it is not provided at the call site.
For the dataclass example, Python already provides some guardrails. If we try to provide a mutable default value for any of the fields, we’ll get a ValueError
. This is an updated (and even more broken) version of our previous product class:
from dataclasses import dataclass
from uuid import UUID, uuid4
@dataclass
class Product:
name: str
variants: list[str] = [] # Added mutable default value here 👀
product_id: UUID = uuid4()
Running this file give us
❯ python bad_dataclass.py
Traceback (most recent call last):
File "/private/tmp/bad_dataclass.py", line 5, in <module>
@dataclass
^^^^^^^^^
# stracktrace cut ✂️
ValueError: mutable default <class 'list'> for field variants is not allowed: use default_factory
When defining a dataclass, the Python runtime will check if each of the given default values are hashable. If it encounters an unhashable value it assumes that is also mutable, and raises an error. Helpfully, the error message tells us what to do.
from dataclasses import dataclass, field # 👈 Updated import
from uuid import UUID, uuid4
@dataclass
class Product:
name: str
variants: list[str] = field(default_factory=list) # ✅ default factory
product_id: UUID = field(default_factory=uuid4) # ✅ default factory
With this, each product get their own variant list and a unique ID:
>>> Product(name="Banana")
Product(name='Banana', variants=[], product_id=UUID('89ebfb63-0d33-474c-a38a-d65c3b08eda1'))
>>> Product(name="Apple")
Product(name='Apple', variants=[], product_id=UUID('a1c88200-530c-42ab-bf82-6a6de7fd6e85'))
>>> banana = Product(name="Banana")
>>> apple = Product(name="Apple")
>>> apple.variants.append("Aroma")
>>> apple.variants
['Aroma']
>>> banana.variants
[]
At least, this will have to suffice until you can rewrite everything in Rust.4
Footnotes
-
You know who I’m talking about. ↩
-
Python functions are objects. ↩
-
I must also regret to inform you that it is possible to assign an arbitrary tuple to the
__defaults__
attribute. However, if I ever come across a code base where this is abused andgit blame
tells me it is your fault, I will personally come to your house and make you write 20 000 words about why that was a terrible decision. ↩ -
Or are we doing Zig now? ↩