Thursday, April 9, 2020

TDD: Roman Bowling Discussion

I discovered the Roman Bowling concept because I was looking for a familiar example to draw upon when discussing classifications of tests.

Let's start with a trivial test - we want to demonstrate that our code produces the correct output when when input is a representation of a perfect game. Many TDD exercises like to start with a trivial example, but my preference is to skip ahead to something with a bit of meat on it. Certain kinds of mistakes are easier to detect when working with real data.

def test_perfect_game(self):
    from roman import bowling

    pinfalls = ["X", "X", "X", "X", "X", "X", "X", "X", "X", "X",
                "X", "X"]
    score = bowling.score(pinfalls)
    self.assertEqual("CCC", score)

There's nothing here that's particularly beautiful, or enduring. We have a straight forward representation of the behavior that we want, written at a scaffolding level of quality. It's enough to get started, and certainly for this demonstration; the machine understands exactly the check we want to make, but a human being coming into this code cold might want a more informative prose style.

Is this a "unit test"?

The motivation for this exercise is to give some extra thought to what that question means, and which parts of that meaning are actually important.

To get to green from red, we can lean on the simplest thing - simply hard coding the desired output.

def score(pinfalls):
    return "CCC"

This arrangement has a lot of the properties we want of a test to be run after each refactoring: the test shares no mutable state with any other code that might be executing, it has no I/O or network dependencies. The test is eye blink quick to run, and the behavior is stable - if the test reports a problem, we really have introduced a mistake in the code somewhere.

Let's try a quick bit of refactoring, we're going to change this implementation here so that is justly slightly better aligned with our domain model. It's going to be a bit more convenient to work internally with integer values, and treat the romanization of that value as a presentation problem. So what I want to do here is introduce a 300, somehow, and then use that 300 to craft the desired representation.

def score(pinfalls):
    return "".join(["C" for _ in range(300 // 100)])
I find the syntax for joins in python a bit clumsy (I'm not fluent, and certainly don't think in the language), but I don't believe that the code written here would startle a maintainer with working knowledge of the language. Executing the test shows that the machine still understands.

Is it a "unit test"?

Now I want to perform a simple "extract method" refactoring. It may be obvious what's coming, but for this demonstration let's take exaggeratedly small steps.

def score(pinfalls):
    def to_roman(score):
        return "".join(["C" for _ in range(score // 100)])

    return to_roman(300)

Is it a "unit test"?

The implementation of to_roman hasn't become noticeably better; there's a lot more cleanup to be done. But having created the function, I can now start exploring the concept of distance.

def to_roman(score):
    return "".join(["C" for _ in range(score // 100)])

def score(pinfalls):
    return to_roman(300)

Is it a "unit test"?

from previous.exercise import to_roman

def score(pinfalls):
    return to_roman(300)

Is it a "unit test"?

In the first refactoring, the to_roman method definition was nested. Does it matter to us if we nest the import?

def score(pinfalls):
    from previous.exercise import to_roman

    return to_roman(300)

Is it a "unit test"?

Is there any point in this sequence of refactorings where you find yourself thinking "gee, I'm not really comfortable with this; I need to introduce a test double?"

My answers, today? I don't care if it is a unit test or not. I care whether it is still suitable for running between edits, and it still is. The test is still fast and isolated. Sure, roman.bowling.score depends on previous.exercise.to_roman, but to_roman has stable requirements and lots of tests; it's not a significantly riskier dependency than the python library itself.

Introducing a test double to "mitigate the risks" of an external dependency is a lousy trade when the risk of the external dependency is already low.

If we are concerned about having a suite of tests that help us to locate bugs, an interesting alternative to using a test double is write a test that uses a more complicated description of the behavior. In this case, for instance, we might write

def test_perfect_game(self):
    from roman import bowling

    pinfalls = ["X", "X", "X", "X", "X", "X", "X", "X", "X", "X",
                "X", "X"]
    score = bowling.score(pinfalls)
    from previous.exercise import to_roman
    self.assertEqual(to_roman(300), score)

Is it a "unit test"?

I think of this pattern as "behaves like". We aren't saying that roman.bowling.score calls previous.exercise.to_roman with a specific input; that's coupling to an implementation detail that we might reasonably want to change later. Instead, we are adopting a looser constraint that the code should produce the same behavior. The test and the code might both depend on the same external dependency, or not.

Taken to extremes, you end up with something like an oracle - a complete reference implementation in the test, and an assertion that the test subject behaves the same way for all outputs. But be careful, somewhere in the mix you want at least one test to ensure that you haven't inadvertently introduced the same bug into both implementations.

No comments:

Post a Comment