Wednesday, January 4, 2017

TDD: A Tale of two Katas

Over the Holiday, I decided to re-examine Peter Siebel's Fischer Random Chess Kata.

I have, after all, a lot more laps under my belt than when I was first introduced to it, and I thought some of my recent studies would allow me to flesh things out more thoroughly.

Instead, I got a really educational train wreck out of it.  What I see now, having done the kata twice, is that the exercise (once you get the insight to separate the non determinable generator from the board production rules) is about applying specifications to a value type, rather than evaluating the side effects of behaviors.

You could incrementally develop a query object that verifies that some representation of a fair chess row satisfies the constraints -- the rules would come about as you add new tests to verify that that some input satisfies/does-not-satisfy the specification, until the system under test correctly understands the rules.  Another way of saying the same thing: you could write a factory that accepts as input an unconstrained representation of a row, and produces a validated value type for only those inputs that satisfy the constraints.

But, as RFC 1149.5 taught us, you can't push a stateless query out of a local minima.

Realizing this -- that the battle I thought I was going to write about was doomed before I even reached the first green bar, I decided to turn my attention to the bowling game kata.

Amusingly enough, I started from a first passing test, and then never moved off of the original green bar.

Part of the motivation for the exercise was my recent review of Uncle Bob's Dijkstra's Algorithm kata.  I wanted to play more with the boundaries in the test, and get a better feel for how they arise as a response to the iteration through the tests.

So I copied (less than perfectly) Uncle Bob's first green bar, and then started channeling my inner Kent Beck

Do you have some refactoring to do first?
With that idea in mind, I decided to put my attention on "maximizes clarity".  There's some tension in here -- the pattern that emerges is so obviously generic that one is include to suggest I was the victim of big design up front, and that I wasn't waiting for the duplication in tests to realize that pattern for me.  So on some level, one might argue that I've violated YAGNI.  On the other hand, if you can put a name on something, then it is already been realized -- you are simply choosing to acknowledge that realization, or not.

In doing that, I was surprised -- there are more boundaries in play than I had previously recognized.

There's a boundary between the programmer and the specification designer.  We can't think at the IDE and have it do the right thing, we actually need to type something; furthermore, that thing we type needs to satisfy a generic grammar (the programming language).

The specification designer is code that essentially responsible for "this is what the human being really meant."  It's the little DSL we write that makes introducing new specifications easy.

There's a boundary between specification design and the test harness -- we can certainly generate specifications for a test in more than one way, or re-use a specification for more than one test.  Broadly, the specification is a value type (describing input state and output state) where the test is behavior -- organized interactions with the system under test.

The interface between the test and the system under test is another boundary.  The specification describes state, but it is the responsibility of the test to choose when and how to share that state with the system under test.

Finally, there is the boundary within the system under test -- between the actual production code we are testing, and the adapter logic that aligns the interface exposed by our production code with that of the test harness.

This bothered me for a while - I knew, logically, that this separation was necessary if the production code was have the freedom to evolve.  But I couldn't shake the intuition that I could name that separation now, in which case it should be made explicit.

The following example is clearly pointless code

And yet this is the code we write all the time

And that's not a bad idea -- we are checking that two outcomes, produced in different ways, match.
But the spellings are wrong: the names weren't telling me the whole story. In particular, I'm constantly having problems remembering the convention of which argument comes first.

It finally sank in: the boundary that I am trying to name is time. A better spelling of the above is:

We write a failing test, then update an implementation to satisfy a check written in the past, and then we refactor the implementation, continuing to measure that the check is still satisfied after each change. If we've done it right, then we should be able to use our earliest checks until we get an actual change in the required behavior.

I also prefer a spelling like this, because it helps to break the symmetry that gives me trouble -- I don't need to worry any longer about whether or not I'm respecting screen direction, I just need to distinguish then from now.

The adapter lives in the same space; it's binding the past interface with the present interface.  The simplest thing that could possible work has those two things exactly aligned.  But there's a trap there -- it's going to be a lot easier to make this seam explicit now, when there is only a single test, than later, when you have many tests using the wrong surface to communicate with your production code.

There's another interpretation of this sequence.  In many cases, the implementation we are writing is an internal element of a larger application.  So when we write tests specifically for that internal element, we are (implicitly) creating a miniature application that communicates more directly with that internal element.  The test we have written is communicating with the adapter application, not with the internal element.

This happens organically when you work from the outside in - the tests are always interfacing with the outer surface, while the rich behaviors are developed within.

The notion of the adapter as an application is a deliberate one -- the dependency arrow points from the adapter to the test harness.  The adapter is a service provider, implementing an interface defined by the test harness itself.  The adapter is also interfacing with the production code; so if you were breaking these out into separate modules, the adapter would end up in the composition root.

Key benefit of these separations; when you want to take a new interface out for a "test drive", you don't need to touch the tests in any way -- the adapter application serves as the first consumer of the new production interface.


Note that the checks were defined in the past, which is the heuristic that reminds you that checking is the responsibility of the test harness, not the adapter application.  The only check that the adapter can reliably perform is "is the model in an internally consistent state", which is nice, but the safety to refactor comes from having an independent confirmation that the application outputs are unchanged.

Another benefit to this exercise: it has given me a better understanding of primitive obsessionBoundaries are about representations, and primitives are a natural language for describing representations.  Ergo, it makes sense that we describe our specifications with primitives, and we use primitives to communicate across the test boundary to the (implicit) adapter application, and from there to our proposed implementation.  If we aren't aware of the intermediate boundaries, or are deferring them, there's bound to be a lot of coupling between our specification design and the production implementation.

No comments:

Post a Comment