Friday, February 8, 2019

TDD: Hello World

As an experiment, I recently tried developing HelloWorld using a "test driven" approach.

You can review the commit history on GitHub.

In Java, HelloWorld is a one-liner -- except that you are trapped in the Kingdom of Nouns, so there is boilerplate to manage.

Now you can implement HelloWorld in a perfectly natural way, and test it -- System.setOut allows you to replace the stream, so the write happens to a buffer that is under the control of the test.

It's not entirely clear to me what happens, however, if you have multiple tests concurrently writing to that stream.  The synchronization primitives ensure that each write is atomic, but there is a lot of time for the stream to be corrupted with other writes by the time the test harness gets to inspect the result.

This is why we normally design our tests so that they are isolated from shared mutable state; we want predictable results.  So in HelloWorld, this means we need to be able to ensure that the write happens to an isolated, rather than a shared stream.

So instead of testing HelloWorld::main, we end up testing HelloWorld.writeTo, or some improved spelling of the same idea.

Another pressure that shows up quickly is duplication - the byte sequence we want to test needs to be written into both the test and the implementation.  Again, we've learned patterns for dealing with that -- the data should move toward the test, so we have a function that accepts a message/prompt as an argument (in addition to passing along the target stream).  As an added bonus, we get a more general implementation for free.

Did we really need a more general implementation of HelloWorld?

Another practice that I associate with TDD is using the test as an example of how the subject may be used -- if the test is clumsy, then that's a hint that maybe the API needs some work.  The test needs a mutable buffer, and a PrintStream around it, and then needs to either unpack the contents of the buffer or express the specification as a byte array, when the natural primitive to use is a String literal.

You can, indeed, simplify the API, replacing the buffer with a useful object that serves a similar role.  At which point you either have two parallel code paths in your app (duplication of idea), or you introduce a bunch of additional composition so that the main logic always sees the same interface.

Our "testable" code turns more and more into spaghetti.

Now, it's possible that I simply lack imagination, and that once all of these tests are in place, you'll be able to refactor your way to an elegant implementation.  But to me, it looks like a trash fire.

There's a lesson here, and I think it is: left-pad.

Which is to say, not only is HelloWorld "so simple that there are obviously no deficiencies", but also that it is too simple to share; which is to say, the integration cost required to share the element exceeds the costs of writing it from scratch each time you need it.

Expressed a different way: there is virtually no chance that the duplication is going to burn you, because once written the implementation will not require any kind of coordinated future change (short of a massive incompatibility being introduced in the language runtime itself, in which case you are going to have bigger fires to fight).

Tuesday, February 5, 2019

The Influence of Tests

Some years ago, I became disenchanted with the notion that TDD uses tests to "drive" design in any meaningful way.



I came to notice two things: first, that the tests were just as happy to pass whatever cut and paste hack served as "the simplest thing that could possibly work", second that all of the refactoring patterns are reversible.

So what is being test infected buying me?

One interesting constraint on tests is that we want them to be reliable.  If the test subject hasn't changed, then we should get the same collection of observations if we move the test bed in time and space.  This in turn means we need to restrict the tests interaction with unstable elements -- I/O, the clock, the network, random entropy.  Our test subjects often expect to interact with these elements, so within the test environment we need to be able to provide a substitute.

So one of the design patterns driven by testing is "dependency injection".  Somewhere recently I came across the spelling "configurable dependency", which I think is better.  It helps to sharpen my attention on the fact that we are describing something that we change when we transition from a production environment to a test environment, which in turn suggests certain approaches.

But we're really talking about something more specific: configurable effects or perhaps configurable non-determinism.

The test itself doesn't care much about how much buffer surrounds the effect; but if we allow test coverage to influence us here, then we want the substituted code to be as small as we can manage.  To lean of Gary Bernhardt's terminology, we want the test to be able to control a thin imperative shell.

But then what?  We can keep pouring inputs through the shell without introducing any new pressures on the design.
Our designs must consist of many highly cohesive, loosely coupled components, just to make testing easy. -- Kent Beck, Test Driven Development by Example
I came across this recently, and it helps.

A key problem with the outside in approach, is that the "costs" of setting up a test are disproportionate to the constraint we are trying to establish.  Composition of the test subject requires us to draw the rest of the owl when all we need is a couple of circles.

To borrow an idea from Dan North, testing all the way from the boundary makes for really lousy examples, because the noise gets in the way of the idea.

The grain of the test should match the grain of the constraint it describes - if the constraint is small, then we should expect that the composition will have low complexity.

What we have then, I think, is a version of testing, the human author applying a number of heuristics when designing an automated check to ensure that the subject(s) will exhibit the appropriate properties.  In other words, we're getting a lot of mileage out of aligning the test/subject boundaries before we even get to green.

The kinds of design improvements that we make while refactoring?
There is definitely a family of refactorings that are motivated by the idea of taking some implementation detail and "lifting" it into the testable space. I think that you can fairly say that the (future) test is influencing the design that emerges during the refactoring.

I'm not convinced that we can credit tests for the results that emerge from the Design Dynamo.  My current thinking is that they are playing only a supporting role - repeatedly evaluating compliance with the constraints after each change, but not encouraging the selection of a particular change.

Further Reading

Mark Seemann: The TDD Apostate.

Michael Feathers: Making Too Much of TDD.

Sunday, January 27, 2019

Refactoring in the Wild

The past few days, I've been reviewing some code that looks like it could use some love.

It's part of an adapter, that connect the core logic running in our process with some shared mutable state which is managed by Postgres.


It's code that works, rather than clean code that works.  The abstractions aren't very good, separable concerns are coupled to no great advantage, division of work is arbitrary.  And yet...

In the course of our recent work, one of the developers noticed that we could eliminate a number of calls to the remote database by making a smarter version of the main query.  We're using JDBC, and in this case the change required modifying the select clause of the query and making the matching change in the handling of the result set.

Both bits of code were in the same function and fit on the same screen.  There's a duplicated pattern for working with queries, connections, statements, recordsets -- but nobody had come along and tried to "eliminate" that duplication, so the change was easy.

Also, because of changes we've made to the hosting of the database, we needed to change the strategy we use for managing the connection to the database.  That change ended up touching a lot of different methods that were accessing the connection data member directly - so we needed to do an Encapsulate Variable refactoring to avoid duplicating the details of the strategy everywhere.


Applying that refactoring today was no more difficult than introducing it six years ago would have been.


YAGNI... yet.

Saturday, January 19, 2019

QotD: Kata Kata Kata!

If you don't force a kata to yield insights about working on real problems, you are wasting opportunities.

A Lesson of a Small Refactoring

We demand rigidly defined areas of doubt and uncertainty. -- Douglas Adams
Working through a Fibonacci number kata last night, I discovered that I was frequently using a particular pattern in my refactoring.


What this gives me in the code is clear separation between the use cases that are not currently covered by tests. I can then be more aggressive in the section of code that is covered by tests, because I've already mitigated the risk of introducing in inadvertent change.

The early exit variation I didn't discover until later, but I think I like it better.  In particular, when you get to a state where the bottom is done, the branch with the early exit gets excised and everything just falls through to the "real" implementation.

This same trick shows up again when it is time to make the next change easy:

It's a bit odd, in that we go from an implementation with full line coverage to one with additional logic that reduces the line coverage. I'm OK with this; it is a form of scaffolding that isn't expected to survive until the implementation is published.

What I find intriguing here is the handling of the code path that doesn't yet have tests, and the inferences one might draw for working on "legacy" code....

Thursday, January 10, 2019

Why I Practice TDD

TL; DR: I practice because I make mistakes; each mistake I recognize is an opportunity to learn, and thereby not make that mistake where it would be more expensive to do so.

Last night, I picked up the Fibonacci kata again.

I believe it was the Fibonacci kata where I finally got my first understanding of what Kent Beck meant by "duplication", many years ago.  So it is something of an old friend.  On the other hand, there's not a lot of meat to it - you go through the motions of TDD, but it is difficult to mine for insights.

Nonetheless, I made three significant errors, absolute howlers that have had me laughing at myself all day.

Error Messages

Ken Thompson has an automobile which he helped design. Unlike most vehicles, it has neither a speedometer, nor gas gauge, nor any of the other numerous idiot lights which plague the modern driver. Rather, if the driver makes a mistake, a giant "?" lights up in the center of the dashboard. "The experienced driver," says Thompson, "will usually know what's wrong."

In the early going of the exercise, I stopped to review my error messages, writing up notes about the motivations for doing them well.

Mid exercise, I had a long stare at the messages.  I was in the middle of a test calibration, and I happened to notice a happy accident in the way that I had made the test(s) fail.

But it wasn't until endgame that I finally discovered that I had transposed the expected and actual arguments in my calls to the Assertions library.

The contributing factors -- I've abandoned TestNG in favor of JUnit5, but my old JUnit habits haven't kicked back in yet.  While all the pieces are still fresh in my mind, I don't really see the data.  I just see pass and fail, and a surprise there means revert to the previous checkpoint.

Fence Posts

The one really interesting point in Fibonacci is the relationship between a recursive implementation and an iterative one.  A recursive implementation falls out pretty naturally when removing the duplication (as I learned from Beck long ago), but as Steve McConnell points out in Code Complete: recursion is a really powerful technique and Fibonacci is just an awful application of it.

What recursion does give you is a lovely oracle that you can use to check your production implementation.  It's an intermediate step along the way.

In "refactoring" away from the recursive implementation, I managed to introduce an off-by-one error in my implementation.  As a result, my implementation was returning a Fibonacci number, just not the one it was supposed to.

Well that happens, right? you run the tests, they go red, and you discover your mistake.

Not. So Much.

I managed to hit a perfect storm of errors.  At the time, I had failed to recall the idea of using an oracle, so I didn't have that security blanket.  I went after the recursion as soon as it appeared, so I was working in the neighborhood of fibonacci(2), which is of course equal to fibonacci(1).

My other tests didn't catch the problem, because I had tests that measured the internal consistency
of the implementation, rather than independently verifying the result.  One of the hazards that comes about from testing an implementation "in the calculus of itself."  Using property tests was "fine", but I needed one more unambiguous success before relying on them.

Independent Verification

One problem I don't think I've ever seen addressed in a Fibonacci kata is integer overflow.  The Fibonacci series grows roughly geometrically.  Java integers have a limited number of bits, so ending up with a number that is too big is inevitable.

But my tests kept passing - well past the point where my estimates told me I should be seeing symptoms.

The answer?  In Java, integer operators do not indicate overflow or underflow in any way. JLS 4.2.2

And that's just as true in the test code as it is in the production code.  In precisely the same way that I was only checking the internal consistency of my own algorithm, I also fell into the trap of checking the internal consistency of the plus operator.

There are countermeasures one can take -- asserting a property that the returned value must be non-negative, for instance.

Conclusions

One thing to keep in mind is that production code makes a really lousy test of the test code, especially in an environment where the tests are "driving" the implementation.  The standard for test code should be that it obviously has no deficiencies, which doesn't leave much room for creativity.


 

 

Monday, January 7, 2019

A study in ports and adapters

I've got some utilities that I use in the interactive shell which write representations of documents to standard out.  I've been trying to go fast to go fast with them, but that's been a bit rough, so I've been considering a switch to go slow to go smooth approach.

My preference is to work from the outside in; if the discipline of test driven development is going to be helpful as a design tool, then we should really be allowing it to guide where the module boundaries belong.

So let's consider an outside in approach to these shell applications.  My first goal is to establish how I can disconnect the logic that I want to test from the context that it runs in.  Having that separation will allow me to shift that logic into a test harness, where I can measure its behavior.

In the case of these little shell tools, there are three "values" I need to think about.  The command line arguments are one, the environment that the app is executing in is a second, and the output effect is the third.  So in Java, I can be thinking about a world that looks like this:


In the ports and adapters lingo, TheApp::main is taking on the role of an adapter, connecting the java run time to the port: TheApp::f.

For testing, I don't want effects, I want values.  More specifically, I want values that I can evaluate for correctness independently from the test subject.  And so I'm aiming for an adapter that looks like a function.

So I'll achieve that by providing a PrintStream that I can later query for information

In my tests, I can then verify, independently, that the array of bytes returned by the adapter has the correct properties.

Approaching the problem from the outside like this, I'm no longer required to guess about my modules -- they naturally arise as I identify interesting decisions in my implementation through the processes of removing duplication and introducing expressive names.

Instead, my guesses are about the boundary itself: what do I need from the outside world? How can I describe that element in such a way that it can be implemented completely in process?

Standard input, standard output, standard error are all pretty straight forward in Java, which somewhat obscures the general case.  Time requires a little bit of thinking about.  Connecting to remote processes may require more thought -- we probably don't want to be working down at the level of a database connection pool, for example.

We'll often need to invent an abstraction to express each of the roles that are being provided by the boundary.  A rough cut that allows the decoupling of the outer world gives us a starting point from which to discover these more refined boundaries.