Cascade Faliure: Test Driven Diamonds

I've been reviewing, and ruminating upon, a bunch of experiences folks have reported with the diamond kata. A few ideas finally kicked free.

Overview

Seb noted this progression in the tests

The usual approach is to start with a test for the simple case where the diamond consists of just a single ‘A’
The next test is usually for a proper diamond consisting of ‘A’ and ‘B’; It’s easy enough to get this to pass by hardcoding the result.
We move on to the letter ‘C’: The code is now screaming for us to refactor it, but to keep all the tests passing most people try to solve the entire problem at once.

Alistair expressed the same conclusion more precisely

all the work that they did so far is of no use to them whatsoever when they hit ‘C’, because that’s where all the complexity is. In essence, they have to solve the entire problem, and throw away masses of code, when they hit ‘C’

George's answer is "back up and work in smaller steps", but I don't find that answer particularly satisfactory. I believe that there's something deeper going on.

Background

Long ago, I was trying to demonstrate simplest thing that could possibly work on the testdrivendevelopment discussion group. I had offered this demonstration:

And Kent Beck in turn replied

Do you have some refactoring to do first?

Sadly, I wasn't ready for that particular insight that day. But I come back to it from time to time, and am starting to feel like I'm beginning to appreciate the shape of it.

On Unit Tests

Unit tests are a check that the our latest change didn't break anything -- the behavior specifications that were satisfied before the change are still satisfied after the change.

Big changes can be decomposed into many small changes. So if we are working in small increments, we are going to be running the tests a lot. Complex behaviors can be expressed as many simple behaviors; so we expect to have a lot of tests that we run a lot.

This is going to suck unless our test suite has properties that make it tolerable; we want the tests to be reliable, we want the interruption of running the tests to be as small as possible -- we're running the automated checks to support our real work.

One way to ensure that the interruption is brief is to run the tests in parallel. Achieving reliability with concurrently executing tests introduces some constraints - the behaviors of the tests need to be isolated from one another. Any assets that are shared by concurrently executing tests have to be immutable (otherwise you are vulnerable to races).

This constraint effectively means that the automated checks are pure functions; given-when-then describes the inputs: a deterministic representation of the initial configuration of the system under test, a message, a representation of the specification to be satisfied, and the output is a simple boolean.

This test function is in turn composed of some test boiler plate and a call to some function provided by the system under test.

Implications

... since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation — every sun, every planet, their orbits, their composition and their economic and social history from, say, one small piece of fairy cake.

If we are, in fact, testing a function, then all of the variation in outputs must be encoded into the inputs. That's the duplication Beck was describing.

Beck's ability to discover duplication has been remarked on elsewhere, and I suspect that this principle is a big part of it. Much of the duplication is implicit -- you need to dig to make the implicit explicit.

Example

To show what I mean, let's take a look at the production code for the Moose team after the first passing test.

We've got a passing test, so before we proceed: where is the duplication? One bit of duplication is that the "A" that appears in the output is derived from the `A` in the input. Also, the number of lines in the output is derived from the input, as is the relative position of the A in that output.

There's nothing magic about the third test, or even the second, that removes the shackles - you can attack the duplication as soon as you have a green bar. All of the essential complexity of the space is included in the first problem, so you can immediately engage the design dynamo and solve the entire problem.

So when do we need more than one test? I think of the bowling game, where the nature of the problem space allows us to introduce the rules in a gradual way. The sequencing of the tests in the bowling game allows us to introduce the complexity of the domain in a controlled fashion - marks, and the complexities of extra throws in the final frame are deferred.

But it doesn't have to be that way - the mars rover kata gives you a huge slice of navigation complexity in the first example; once you have eliminated the duplication between the input and the output, you are done with navigation, and can introduce some other complexity (wrapping, or collisions).

Of course, if the chunk of complexity is large, you can easily get stuck. Uncle Bob touched on this in his write up of the word wrap kata; passing the next test can be really difficult if you introduce too many new ideas at once. Using a test order that introduces new ideas one at a time is an effective way to build up a solution. Being willing to suspend complex work to find a more manageable slice is an important skill.

And, in truth, I did the same thing in the rover; essentially pushing as far as I could with the acceptance test alone, then suspending that work in favor of a new test isolating the next behavior I needed, refactoring my solution into my acceptance work, and continuing on until I next got stuck.

In an exercise dedicated to demonstrating how TDD works, that's an important property for a problem to have: a rule set with multiple elements that can be introduced into the solution separately.

Of course, that's not the only desirable property; you also need an exercise that can be explained in a single page, one in which you can make satisfactory progress with useful lessons in a single session. That introduces a pressure to keep things simple.

But there's a trap: if you strip down the problem to a single rule (like the diamond, or Fibonacci), then you lose the motivation for the increments - the number of test cases you introduce before actually solving the problem becomes arbitrary.

On the other hand, if simply returning a hard coded value is good enough to pass all of our acceptance tests, then why are we investing design capital in refactoring the method? The details clearly aren't all that important to the success of the project yet.

My mental model right now is something along the ideas of a budget; each requirement for a new variation in outputs gives us more budget to actually solve the problem, rather than just half assing it. For a problem like Fibonacci, teasing out the duplication is easy, so we don't need many iterations of capital before we just solve the damn thing. Teasing the duplication out of diamond is harder, so we need a commensurately larger budget to justify the investment.

Increments vs Iterations

I found the approaches that used property based testing to be absolutely fascinating. For quite a while I was convinced that they were wrong; but my current classification of them is that they are alien.

In the incremental approach, we target a specific example which is complete, but not general. Because the example is complete, it necessarily satisfies all invariants that are common to all possible outputs. Any acceptance test that only needs the invariant satisfied will be able to get buy with this single solution indefinitely.

The property based approach seems to explore the invariant generally, rather than targeting a specific example. For diamond

The output is a square
With sides of odd length
With the correct length
With top-bottom symmetry
With left-right symmetry
With the correct positions occupied by symbols
With the correct positions occupied by the correct symbols

This encourages a different way of investing in the solution - which properties do we need right now? For instance, if we are working on layout, then getting the dimensions right is urgent (so that we can properly arrange to fit things together), but we may be able to use a placeholder for the contents.

For the mars rover kata, such an approach might look like paying attention first to the shape of the output; that each row of output describes a position and an orientation, perhaps a rule that there are no collisions in the grid, that all of the positions are located within the input grid, that there is an output row for each input row, and so on.

I don't think property based tests really help in this kata. But I think they become very interesting if the requirements change. To offer a naive example, suppose we complete the implementation, and the acceptance tests are passing, and then during the playback the customer recognizes that the requirements were created incorrectly. The correct representation of the diamonds should be rotated 90 degrees

With a property driven test suite, you get to keep a lot of the tests; you retire the ones that conflict with the corrected requirements, and resume iterating until the new requirements are satisfied? With the example based? approach, the entire test suite gets invalidated in one go -- except for the one trivial example where the changed requirements didn't change the outcome.

Actually, that's kind of interesting on its own. The trivial example allows you to fully express, by means of aggressively eliminating the duplication, a complete solution to the diamond problem in one go. But what it doesn't give you is an artifact that documents for your successor that the original orientation of the diamond, rather than its successor, is the "correct" one.

The more complicated example makes it easier to distinguish which orientation is the correct one, but is more vulnerable to changing requirements.

I don't have an answer, just the realization that I have a question.

Cascade Faliure

Sunday, October 22, 2017

Test Driven Diamonds