Sunday, June 30, 2019

Usage Kata

The usage kata is intended as an experiment in applying Test Driven Development at the program boundary.

Create a command line application with the following behavior:

The command supports a single option: --help

When the command is invoked with the help option, a usage message is written to STDOUT, and the program exits successfully.

When the command is invoked without the help option, a diagnostic is written to STDERR, and the program exits unsuccessfully.

Example:

Sunday, June 23, 2019

Variations on the Simplest Thing That Could Possibly Work

Simplest Thing that Could Possibly Work is a unblocking technique

Once we get something on the screen, we can look at it. If it needs to be more we can make it more. Our problem is we've got nothing.

Hunt The Wumpus begins with a simple prompt, which asks the human operator whether she would like to review the game instructions before play begins.


That's a decent approximation of the legacy behavior.

But other behaviors may also be of interest, either as replacements for the legacy behavior, or supported alternatives.

For instance, we might discover that the prompt should behave like a diagnostic message, rather than like data output, in which case we'd be interested in something like

Or we might decide that UPPERCASE lacks readability

Or that the input hints should appear in a different order


And so on.

The point here being that even this simple line of code spans many different decisions.  If those decisions aren't stable, then neither is the behavior of the whole.

When we have many tests evaluating the behavior of a unstable decision, the result it a brittle test suite.

A challenging aspect to this: within the scope of a single design session, behaviors tend to be stable.  "This is the required behavior, today."  If we are disposing of the tests at the end of our design session, then there's no great problem to solve here.


On the other hand, if the tests are expected to be viable for many design sessions, then protecting the tests from the unstable decision graph constrains our design still further.

One way to achieve a stable decision graph is to enforce a constraint that new behaviors are added by extension, and that new behaviors will be delivered beside the old, and that clients can choose which behavior they prefer.  There's some additional overhead compared with making the change in "one" place.

Another approach is to create bulkheads within the design, so that only single elements are tightly coupled to a specific decision, and the behavior of compositions is evaluated in comparison to their simpler elements.  James Shore describes this approach in more detail within his Testing Without Mocks pattern language.

What I haven't seen yet: a good discussion of when.  Do we apply YAGNI, and defend against the brittleness on demand?  Do we speculate in advance, and invest more in the design to insure against an uncertain future?  Is there a checklist that we can work, to reduce the risk that we foul our process for reducing the risk?

Thursday, June 20, 2019

Design Decisions After the First Unit Test

Recently, I turned my attention back to one of my early "unit" tests in Hunt The Wumpus.

This is an "outside-in" test, because I'm still curious to learn different ways that the tests can drive the designs in our underlying implementations.

Getting this first test to pass is about Hello World difficulty level -- just write out the expected string.

At that point, we have this odd code smell - the same string literal is appearing in two different places. What does that mean, and what do we need to do about it?

One answer, of course, is to ignore it, and move on to some more interesting behavior.

Another point of attack is to go after the duplication directly. If you can see it, then you can tease it apart and start naming it. JBrains has described the design dynamo that we can use to weave a better design out of the available parts and a bit of imagination.

To be honest, I find this duplication a bit difficult to attack that way. I need another tool for bootstrapping.

It's unlikely to be a surprise that my tool of choice is Parnas. Can we identify the decision, or chain of decisions, that contribute to this particular string being the right answer?

In this case, the UPPERCASE spelling of the prompt helps me to discover some decisions; what if, instead of shouting, we were to use mixed case?

This hints that, perhaps, somewhere in the design is a string table, that defines what "the" correct representation of this prompt might be.

Given such a table, we can then take this test and divide it evenly into two parts - a sociable test that verifies that the interactive shell behaves like the string table, and a second solitary test that the string table is correct.

If we squint a bit, we might see that the prompt itself is composed of a number of decisions -- the new line terminator for the prompt, the format for displaying the hints about acceptable responses, the ordering of those responses, the ordering of the prompt elements (which we might want to change if the displayed language were right to left instead of left to right).

The prompt itself looks like a single opaque string, but there is duplication between the spelling of the hints and the checks that the shell will perform against the user input.

Only a single line of output, and already there is a lot of candidate variation that we will want to have under control.

Do we need to capture all of these variations in our tests? I believe that depends on the stability of the behavior. If what we are creating is a clone of the original behavior -- well, that behavior has been stable for forty years. It is pretty well understood, and the risk that we will need to change it after all this time is pretty low. On the other hand, if we are intending an internationalized implementation, then the English spellings are only the first increment, and we will want a test design that doesn't require a massive rewrite after each change.