Fooled (again) by Python

I am writing a task node for a workflow at work. Tasks are written in Python, and follow a pattern common in the industry: You write your task as a Python class that inherits from a specific base class, and implements an execute function. The workflow calls the execute function when the task is executed in a workflow step.

The workflow can pass parameters to the task and these materialize in the Python task as instances of a TaskParameter object.

The workflow framework makes us declare these parameters as class variables using factories based on the type of the parameter.

e.g.

class MyTask(BaseWorkflowTask):

  catch_fire_and_explode = boolean_param(name="Fire")

In the class code we can write things like

def execute(self):
  if catch_fire_and_explode:
    log.Warn("We really shouldn't do this.")

You get the idea.

Now, as any decent framework would do, the workflow framework allows you to set default values for a parameter, like

  catch_fire_and_explode = boolean_param(name="Fire", default=False)

In my task I need the workflow to pass a list of strings as a parameter. So I have

my_things = string_list_param(name="Things", default=[])

I want an empty list as a default since I have use cases where I don’t want to fill my_things in manually.

In such cases I have some logic that fills it in, something like

def foo(things: list[str]):
  if not things:
    for l in other_things:
      things.append(bar(l))

  moo(things, other_things)

Where things is passed in like

foo(my_things)

This worked fine. I was building up my task code over a series of commits. I was adding in unit tests as I went. Other people were reviewing the code. I was submitting it. Best practices, what?

At one stage I made an addition to the code far away from foo and added a test case to check it. I ran the test suite and the test case failed.

Naturally I suspected my new code and my understanding of it. I went through the logic, the code and the test case and couldn’t find a fault.

I ran the test case in the debugger. The test passed.

What?

I ran the test suite again from the command line. The test failed.

I ran just the failing test from the command line. The test passed.

OK, now I’m fully awake. This is going to be entertaining at the very least.

The debugger doesn’t give me the failure, so print statements it is. I print and I print and I print and I finally see something weird.

When the test runs by itself when foo(my_things) is called my_things is an empty list, as expected, because I’m expecting it to get the default value, which is an empty list.

When the test runs as part of the test suite somebody is filling out my_things with values: my_things = ["a banana", "a pear", "a partridge"].

It’s SPOOKY!

It’s especially spooky because ["a banana", "a pear", "a partridge"] is actually a valid list of things that one or more of the other tests see at some point during execution.

It seems this code is reflecting the contemporary sociopolitical gestalt: this is not normal. Python doesn’t behave like … like C, sliding in random values into variables when a pointer goes AWOL.

Hmm…

WAITAMINUTE!

This bug doesn’t happen when the test is run in isolation.
This bug is happening because a list is being filled out with values that might exist in another test.

I looked at how the framework’s string_list_param factory was implemented. All the factory functions return a TaskParameter object. This object has a _get method to resolve the value of the parameter when asked.

Inside _get is logic to either return whatever value the workflow framework injected during execution or the default value.

Or the default value.

Anyone who’s spend more than ten minutes with Python will be hopping around clutching their toe in sympathetic pain at this point.

Yes folks, values vs references strikes again.

Because our parameter is a list, we are getting a reference to it from the _get function. When _get returns the default, it returns a reference to the default empty list.

The way I wrote foo mutates the value passed to it. So, whatever test that runs first and gets the default value, mutates this list, which mutates the default value. The next test that runs and needs the default value, gets this strange mutated value.

Solution: Rewrite foo to not mutate the input parameter.

Question: From the point of view of the framework, should one return a deep copy of the defaults, because of cowboys like me? Probably.