Cascade Faliure: TDD and Immutable Tests

I was working through the bowling game kata again recently, and discovered that there are three distinct moves that we make when playing. In any given move, we should see either a compile error or a test failure, but never both.

Legal TDD Moves

Refactoring

The most commonly discussed move is refactoring; we change the implementation of the production code, rerun the tests to ensure that the change hasn't broken any tests, and then commit/revert the change that we have made based on the outcome of the test.

The important property here, is that refactoring has a pre-condition that all of the tests are passing, and the post condition that all of the tests are passing. In other words, the bar is always GREEN when you enter or leave the protocol.

It's during the refactoring move that the production code tends to evolve toward the more generic.

Specifying

New behaviors are introduced by specifying. What we are doing in this phase is documenting additional constraints on the outcome (by creating a new test), and then hacking the production code to pass this new test in addition to all of the others.

Make the test work quickly, committing whatever sins are necessary in the process.

In this protocol, we start from a GREEN state, and proceed from there to RED and then to GREEN. This has the same cadence as the usual "[RED, GREEN], REFACTOR" mantra, but there's an additional restriction -- the new test we introduce is restricted to using the existing public API of the system under test.

Extending

New interfaces are introduced by extending. During this phase, no new constraints (no asserts) on behavior are introduced, only new affordances for working with the production code. The key idea in extending is this: you get from the RED bar to the GREEN bar by auto generating code (by hand, if necessary).

Because of these restricted rules, we see only compile errors in this move, no runtime errors; there can't be any runtime errors, because (a) the new interface is not yet constrained by specifications and (b) the code required to reach the new interface already satisfies its own specification.

Extension is almost purely a discovery exercise -- just sit down and write code to the API you wish that you had, then pass the test with automatically generated code.

Immutable Tests

My discovery of extending came about because I was trying to find a protocol that would support the notion of an immutable test. I've never been particularly comfortable with "refactoring tests", and Greg Young eventually convinced me to just go with it.

The high level argument goes something like this: the tests we write are messages to future developers about the intended behavior of the production code. As such, the messages should have semantics that be used in a consistent way over time. So the versioning guidelines apply.

So "refactoring" a test really looks like adding a new test that specifies the same behavior as the old test, checking that both tests are passing, committing, and then in a separate step removing the unwanted, redundant check.

Note that removing the accumulated cruft is optional; duplicate legacy tests are only a problem in two circumstances, analogous to the two phases that created them above. If you've learned that a specification is wrong, then you simply elide the tests that enforce that specification, and replace them with new tests that specify the correct behaviors. Alternatively, you've decided that the API itself is wrong -- which is a Great Big Deal in terms of backwards compatibility.

Experiences and Observations

Bootstrapping a new module begins with Extension. Uncle Bob's preference seems to be discovering the API in small steps, implementing enough of the API to pass each time a compile error appears. And that's fine, but in an immutable test world produces a lot of cruft.

There are two reasonable answers; one is to follow the same habits, creating a _new_ test for each change to the API, and then remove the duplication in the test code later. It feels kind of dumb, to be honest, but not much more dumb then stopping after each compiler error. Personally, I'm OK with the idea of writing a single long test that demonstrates the API, then adding afterwards a bunch of checks to specify the behavior.

I'm less happy about trying to guess the API in advance, however. Especially if we haven't done a spike first -- we're just guessing at the names of things, and the wrong guess means a bunch of test redaction later.

At this point, I'm a big fan of hiding design decisions, so I would rather be conservative about the API. This means I tend to think about proceeding from supporting specific use cases to supporting a general API.

More precisely, unit tests are dominated by a single shape:

Given this function
When I provide this input
Then I expect the output to satisfy these constraints.

So to my mind, the seed that is in the middle of the first extension is a function. Because we are operating at the boundary, the input and the constraints will typically be represented using primitives. I don't want those primitives leaking into my API until I've made a specific choice to lift them there. So rather than extending a specific type into the test, I'll drop in a function that describes the capability we are promising. For the bowling game, that first test would look something like

Simplest thing that could possibly work. If this test passes, I've got code I can ship, that does exactly what it says on the tin. Then I add a second test, that checks the result. Then I can engage the design dynamo ; and perhaps discover other abstractions that I want to lift into the API.

The same trick works when I have a requirement for a new behavior, but no clear path to get there with the existing API; just create a new function, hard code in the answer, and then start experimenting with ideas within the module boundary until the required interface becomes clear. Behind the module, you can duplicate existing implementations and hack in the necessary changes under the green bar, and then run the dynamo from there.

And if the deadline comes crashing down on you before the refactoring is done? SHIP IT. All of the hacks are behind the module, behind the API, so you can clean up the code later without breaking the API.

Naming gets harder with immutable tests, because the extension needs a name separate from the specification, and you need two test names to refactor, and so on.

In the long run, extensions are just demonstrations -- here's an example of how the API can be invoked. It's something that you can cut and paste into production code. They are perhaps more of a thought construct than an actual artifact that should appear in the code base.

When using functions, the extension check becomes redundant as soon as you introduce the first constraint on its output.

There are some interesting similarities with TDD as if you meant it. In my mind, green bar means you can _ship_, so putting implementation code in the test method doesn't not appeal at all. But if you apply those refactoring disciplines behind the module, being deliberately stingy about what you lift into the API, I think you've got something very interesting.

It shouldn't ever be necessary to lift an implementation into the API, other than the module itself. Interfaces only.

Eventually, the interesting parts of the module get surfaced as an abstraction itself, so that you can apply the same checks to a family of implementations.

Cascade Faliure

Friday, July 7, 2017

TDD and Immutable Tests