Monday, September 24, 2018

TDD: Lighting Talk

What follows is a rough sketch of the lightning talk that I gave at the Boston Software Coders meetup.

Tim Ottinger's image, hosted by Uncle Bob.

The Red Green Refactor mantra has been around for a long time, and the cycle shown above is a familiar one.  But if you look carefully, it's not really an accurate representation of what you are doing.

Let's borrow Tony Hoare's notion of preconditions and postcondions, and apply them to the various stages of the Red Green Refactor cycle.

The precondition for entering the RED stage is that we have an empty set of failing tests, and a set P (which may or may not be empty) of passing tests. The post condition is that we have a single element set of failing tests (t), and the same set P of passing tests. (If introducing a new test causes other tests to fail, then we have some undesirable coupling that needs to be addressed).

The precondition for entering the GREEN stage is that we have a single element set of failing tests (t), and a set P of passing tests. The postcondition is that the set of failing tests in empty, and the set of passing tests is the union of (t) and P. Which is to say, we've moved test (t) from the failing column to the passing column, without breaking anything else.

The precondition for entering the REFACTORING stage is that we have an empty set of failing tests (), and a set P of passing tests. The postcondition is that we have an empty set of failing tests (), and the same set P of passing tests.

What we really have here are two different cycles that happen to share starting and terminal states. The upper cycle is used to add more tests, which is to say to add constraints on the behavior of the system. The lower cycle is used to improve the implementation.

On the happy path, the upper cycle has two distinct movements; first, we introduce a new constraint by extending only the test code. Then we satisfy the constraint by modifying only the production code. This is our test calibration, we've proven that the test is actually measuring the production behavior.

But there are unhappy paths - for instance, we introduce a new test, but the test passes when first run. That puts us in an unwelcome state of green, where we haven't yet demonstrated that the code is measuring production. So first we have to drive ourselves to a red state by changing the production code, before we revert back to a true green.

If you are practicing test driven development, you are already doing these things. Take some time to observe yourself doing them, and I promise that you'll make some interesting discoveries.

Sunday, September 23, 2018

TDD: What do tests describe?

The second thing I want to highlight is that refactoring does not change the observable behavior of the software. The software still carries out the same function that it did before. Any user, whether an end user or another programmer, cannot tell that things have changed. -- Martin Fowler.
Most TDD kata feature functions as the system under test. Sure, the underlying implementation might be classes, or monads, or whatever the flavor of the month happens to be. But in the Bowling Game, or Fizz Buzz, or Mars Rover, the inputs completely determine the output.

"1", "2", "Fizz"... each question has one and only one right answer. There is precisely one observable behavior that satisfies the requirements.

But that's not generally true - there are a lot of ways that the requirements may not completely constrain the system. For instance, in the fractions kata, unless you introduce a constraint that further restricts the behavior in some way, adding two fractions can produce any of a number of distinguishable denominators.

Systems with "random" behaviors, or hidden information, face this complexity. My approach to the Fischer Chess kata usually involves isolating the random number generator from a function -- but there are still 960 different layouts that could count as row(0).

So - what's really going on?

Friday, September 14, 2018

On aggregates: values, references, and transactions.

Gods help me, I'm thinking about aggregates again.

I think aggregates, and the literature around aggregates, deserve a poor reputation.

Part of the problem is that the early descriptions of the concepts have implicit in them a lot of the assumptions of enterprise solutions written in Java circa 2003.  And these assumptions haven't aged very well - we are writing a lot of software today that has different design considerations.

Sunday, September 9, 2018

TDD: Fractions

Come back tomorrow night. We're gonna do fractions -- Tom Lehrer

Yesterday, I happened upon J. B. Rainsberger's World's Best Intro to TDD.  Series 1 features a demonstration of a fractions kata - what does it look like to produce an implementation of Fraction::add, "one test at a time"?

J. B. follows the usual program of TDD:
  1. Think
  2. Write down examples that may prove interesting
  3. While more interesting examples remain unchecked:
    1. Choose the example that can be implemented with the least work
    2. Make that example pass
But a couple things caught my attention in the first parts of his demonstration; he kept talking about scaffolds, and whether or not it was time to implement equals.  It's off, in a way that I didn't fully grasp yet, so I decided to dig in.

TDD ain't what I thought it was

This weekend, I've been puzzling against some of the tensions in test driven development; I suspect that the idea that just unlocked is going to be useful for a long time.

The high level fog looks like this: TDD is several separable ideas in a matching trench coats.

The primary idea is that testing should be a first class concern in our design.  We demonstrate that our design satisfies the testability requirement by... writing tests.  Or, alternatively, by asserting that our design is so simple that there are obviously no deficiencies.

After that, we start opening paint cans with screw drivers.

A separate idea that gets added in is the idea that we should use tests to specify the behavior of the system.  After all, the system is already easy to test, and executable documentation is cheap to verify.

Furthermore, this practice serves as real feedback on the design -- if we try to specify a behavior, and discover that we can not, in fact, check it, then we can reject the hypothesis that our selected design is in fact testable, and take appropriate action.

An unrelated idea: that we should determine our implementation by introducing the constraints of our specification one at a time, making small changes to the code each time to ensure that all of the introduced constraints are satisfied.

This last idea is further refined by the idea that the small changes fall into a particular pattern: take the implementation used to calibrate the test, remove the duplication, then apply refactoring as necessary to clean up the implementation.

Saturday, September 8, 2018

TDD: Triangulation

Triangulation feels funny to me.  I use it only when I am completely unsure of how to refactor.  If I can see how to eliminate duplication between code and tests and create the general solution, then I just do it.  Why would I need to write another test to give me permission to write what I probably could have written in the first place.

There's a weird symptom in the TDD space - novices don't know when to write their production code.
When do you write the “real” code in TDD?
Seb Rose 
The code is now screaming for us to refactor it, but to keep all the tests passing most people try to solve the entire problem at once.
Nat Pryce
When test-driving development with example-based tests, an essential skill to be learned is how to pick each example to be most helpful in driving development forwards in small steps. You want to avoid picking examples that force you to take too big a step (A.K.A. “now draw the rest of the owl”).
I started thinking again about why that is.

Back in the day when I was first learning TDD, I -- and others around me -- seemed to have this weird blind spot: the idea was that introducing tests could move you from one local minima to another in your implementation.

One of the riddles was "how do you get to recursion?"  What kind of test could you introduce that would force you to abandon your preferred form of "simplest thing that could possibly work" in favor of code that was actually correct.

And the answer is almost "you can't".  If we are only constraining behaviors -- the messages into the system, and the messages that we come back out, then of course none of these tests are going to force any specific refactoring -- after all, disciplined refactoring doesn't change the observable behavior.

I say "almost", because property based testing allows you to introduce so many different constraints at the same time, it basically forces you to solve the constraint for an entire range of inputs, rather than a tiny subset.

Back in the day, Kent Beck wrote to me
Do you have some refactoring to do first?
And what I have come to see since then is that I hadn't recognized, and therefore hadn't addressed, the duplication sitting in front of me.

Two Axioms of Unit Testing:
  1. If you have written an isolated test, then it can be expressed as a function, mapping an input message to an output message.
  2. If you have written a function, then the output duplicates the input.
In discussing recursion,  Kent demonstrated that the problem falls apart really quickly if you can refactor the output until the solution is obvious.

factorial(n == 4)
 -> 24
 -> 4 * 6
 -> 4 * factorial(3)
 -> n * factorial(n-1)

JBrains writes of the design dynamo, but I don't think that's what we are doing here.

We started from a failing test.  We hard code "the answer" into the production code, which proves that the test is properly calibrated.  Now we can wire up the outputs to the inputs -- if that changes the observable behavior, the tests will catch an error.  When the trivial duplication is removed, we can then assess what our next move should be.

The bowling game kata could start out this way - Uncle Bob began by specifying a gutter game; twenty rolls into the gutter should produce a score of zero.

And if you look carefully at what the score is, it's just the sum of the inputs.  So you make that replacement, and ta-da, the test passes, the duplication is gone, and you can think about whether or not there are abstractions worth discovering in the design.

Uncle Bob, of course, didn't do this -- he skips directly to testAllOnes without addressing this duplication, and then applies this change later.

And I wonder if that set the tone -- that there was some minimum number of pathological solutions that needed to be attempted before you can actually start doing something real.

Kent Beck
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence
An implementation that adds up bowling pins obviously has no deficiencies; all the lines of code are covered, we tried the API that we have defined and can evaluate its suitability in our context.

We don't need MOAR TESTS to achieve these ends.

In other words, the introduction of other tests are satisfying some other need, and we should be thinking about that need, and where it fits into our development process.

Now, I think it's pretty clear that a single example is not an honest representation of the complete specification.  But if that's the goal, deliberately introducing a bunch of broken implementations to ensure that the specification is satisfied is an expensive way to achieve that result.