Wednesday, February 20, 2019

Bowling Driven Design

The bowling game kata is odd.

I've never seen Uncle Bob demonstrate the kata himself; so I can't speak to its presentation or its effectiveness beyond the written description.  It has certainly inspired some number of readers, including myself at various times.


I'm becoming more comfortable with the idea that the practice is just a ritual... a meditation... bottle shaking....

In slide two of the Powerpoint deck, the first requirement is presented: Game must implement a specific API.

What I want to call attention to here: this API is not motivated by the tests. The requirements are described on slide #3, followed by a modeling session on slides #4-#9 describing some of the underlying classes that may appear. As of slide #52 at the end of the deck, none of the classes from the modeling session have appeared in the solution.

Furthermore, over the course of writing the tests, some helper methods are discovered; but those helper methods remain within the boundary of the test -- the API remains fixed.

All of the complexity in the problem is within the scoring logic itself, which is a pure function -- given a complete legal sequence of pin falls, compute the game score. Yet the API of the test subject isn't a function, but a state machine.  The scoring logic never manages to scape from the boundaries of the Game object -- boundaries that were set before we had any tests at all.
If you want to lean to think the way I think, to design the way I design, then you must learn to react to minutia the way I react. -- Uncle Bob
What the hell are we driving here?  Some logic, I suppose.

Monday, February 18, 2019

Aggregates: Separation of concerns

A question on Stack Overflow lead me to On Aggregates and Domain Service interaction, written by Marco Pivetta in January of 2017.  Of particular interest to me were the comments by Mathias Verraes.

What I recognized is that the description of aggregate described by Mathias is very similar to the description of protocols described by Cory Benfield.  So I wanted to try to write that out, long hand.

Aggregates are information (state), and also logic that describes how to integrate new information with the existing information.  In accordance with the usual guidelines of object oriented development, we package the data structure responsible for tracking the information with the rules for mutating the data structure and the transformations that use the data structure to answer queries.

Because the responsibility of the object is this data structure and its operations, and because this data structure is a local, in memory artifact, there's no room (in the responsibility sense) for effects.

How do we read, or write, information that isn't local to the aggregate?

The short answer is that responsibility goes out into the application layer (which in turn may delegate the responsibility to the infrastructure layer; those details aren't important here).

The aggregate incorporates information and decides what needs to be done; the application layer does it, and reports the results back to the aggregate as new information.

Spelling the same idea a different way - the aggregate is a state machine, and it supports two important queries.  One is "what representation can I use to recover your current state?", so that we can persist the work rather than needing to keep the aggregate live in memory for its entire lifetime.  The other is "what work can the application layer do for you?".

Put another way, we handle the aggregate's demands for remote data asynchronously.  The processing of the command ends when the model discovers that it needs data which isn't available.  The application queries the model, discovering the need for more data, and can then fetch the data.  Maybe it's available now? then that data is passed to the aggregate which can integrate that information into its state.

If the information isn't available now, then we can simply persist the existing work, and resume it later when the information does become available.  This might look like a scheduled callback, for example.

If your model already understands "time", then it can report its own timing requirements to the application, so that those can also be included in the scheduling.


Refactoring: paint by numbers

I've been working on a rewrite of an API, and I've been trying to ensure that my implementation of the new API has the same behavior as the existing implementation.  This has meant building up a suite of regression checks, and an adapter that allows me to use the new implementation to support a legacy client.

In this particular case, there is a one-to-one relationship between the methods in the old API and the new -- the new variant just uses different spellings for the arguments and introduces some extra seams.

The process has been "test first" (although I would not call the design "test driven").  It begins with a check, using the legacy implementation.  This stage of the exercise is to make sure that I understand how the existing behavior works.

We call a factory method in the production code, which acts as a composition root to create an instance of the legacy implementation. We pass a reference to the interface to a check, which exercises the API through a use case, validating various checks along the way.

Having done this, we then introduce a new test; this one calling a factory method that produces an instance of an adapter, that acts as a bridge between legacy clients and the new API.

The signature of the factory method here is a consequence of the pattern that follows, where I work in three distinct cycles
  • RED: begin a test calibration, verifying that the test fails
  • GREEN: complete the test calibration, verifying that the test passes
  • REPLACE: introduce the adapter into the mix, and verify that the test continues to pass.
To begin, I create an implementation of the API that is suitable for using to calibrate a test by ensuring that a broken implementation fails. This is straight forward; I just need to throw UnsupportedOperationExceptions

Then, I created an abstract decorator, implementing the legacy API by simply dispatching each method to another implementation of the same interface.

And then I define my adapter, which extends the wrapper of the legacy API, and also accepts an instance of the new API.

Finally, with all the plumbing in place, I return a new instance of this class from the factory method.

My implementation protocol then looks like this; first, I run the test using the adapter as is. With no overrides in place, each call in the api gets directed to TEST_CALIBRATION_FACADE, which throws an UnsupportedOperationException, and the check fails.

To complete the test calibration, I override the implementation of the method(s) I need locally, directing them to a local instance of the legacy implementation, like so:

The test passes, of course, because we're using the same implementation that we used to set up the use case originally.

In the replace phase, the legacy implementation gets inlined here in the factory method, so that I can see precisely what's going on, and I can start moving the implementation details to the new API.

Once I've reached the point that all of the methods have been implemented, I can ditch this scaffolding, and provide an implementation of the legacy interface that delegates all of the work directly to v2; no abstract wrapper required.

There's an interesting mirroring here; the application to model interface is v1 to v2, then then I have a bunch of coordination in the new idiom, but at the bottom of the pile, the v2 implementation is just plugging back into v1 persistence. You can see an example of that here - Booking looks fairly normal, just an orchestration with the repository element. WrappedCargo looks a little bit odd, perhaps -- it's not an "aggregate root" in the usual sense, it's not a domain entity lifted into a higher role. Instead, it's a plumbing element wrapped around the legacy root object (with some additional plumbing to deal with the translations).

Longer term, I'll create a mapping from the legacy storage schema to an entity that understands the V2 API, and eventually swap out the O/RM altogether by migrating the state from the RDBMS to a document store.

Friday, February 8, 2019

TDD: Hello World

As an experiment, I recently tried developing HelloWorld using a "test driven" approach.

You can review the commit history on GitHub.

In Java, HelloWorld is a one-liner -- except that you are trapped in the Kingdom of Nouns, so there is boilerplate to manage.

Now you can implement HelloWorld in a perfectly natural way, and test it -- System.setOut allows you to replace the stream, so the write happens to a buffer that is under the control of the test.

It's not entirely clear to me what happens, however, if you have multiple tests concurrently writing to that stream.  The synchronization primitives ensure that each write is atomic, but there is a lot of time for the stream to be corrupted with other writes by the time the test harness gets to inspect the result.

This is why we normally design our tests so that they are isolated from shared mutable state; we want predictable results.  So in HelloWorld, this means we need to be able to ensure that the write happens to an isolated, rather than a shared stream.

So instead of testing HelloWorld::main, we end up testing HelloWorld.writeTo, or some improved spelling of the same idea.

Another pressure that shows up quickly is duplication - the byte sequence we want to test needs to be written into both the test and the implementation.  Again, we've learned patterns for dealing with that -- the data should move toward the test, so we have a function that accepts a message/prompt as an argument (in addition to passing along the target stream).  As an added bonus, we get a more general implementation for free.

Did we really need a more general implementation of HelloWorld?

Another practice that I associate with TDD is using the test as an example of how the subject may be used -- if the test is clumsy, then that's a hint that maybe the API needs some work.  The test needs a mutable buffer, and a PrintStream around it, and then needs to either unpack the contents of the buffer or express the specification as a byte array, when the natural primitive to use is a String literal.

You can, indeed, simplify the API, replacing the buffer with a useful object that serves a similar role.  At which point you either have two parallel code paths in your app (duplication of idea), or you introduce a bunch of additional composition so that the main logic always sees the same interface.

Our "testable" code turns more and more into spaghetti.

Now, it's possible that I simply lack imagination, and that once all of these tests are in place, you'll be able to refactor your way to an elegant implementation.  But to me, it looks like a trash fire.

There's a lesson here, and I think it is: left-pad.

Which is to say, not only is HelloWorld "so simple that there are obviously no deficiencies", but also that it is too simple to share; which is to say, the integration cost required to share the element exceeds the costs of writing it from scratch each time you need it.

Expressed a different way: there is virtually no chance that the duplication is going to burn you, because once written the implementation will not require any kind of coordinated future change (short of a massive incompatibility being introduced in the language runtime itself, in which case you are going to have bigger fires to fight).

Tuesday, February 5, 2019

The Influence of Tests

Some years ago, I became disenchanted with the notion that TDD uses tests to "drive" design in any meaningful way.

I came to notice two things: first, that the tests were just as happy to pass whatever cut and paste hack served as "the simplest thing that could possibly work", second that all of the refactoring patterns are reversible.

So what is being test infected buying me?

One interesting constraint on tests is that we want them to be reliable.  If the test subject hasn't changed, then we should get the same collection of observations if we move the test bed in time and space.  This in turn means we need to restrict the tests interaction with unstable elements -- I/O, the clock, the network, random entropy.  Our test subjects often expect to interact with these elements, so within the test environment we need to be able to provide a substitute.

So one of the design patterns driven by testing is "dependency injection".  Somewhere recently I came across the spelling "configurable dependency", which I think is better.  It helps to sharpen my attention on the fact that we are describing something that we change when we transition from a production environment to a test environment, which in turn suggests certain approaches.

But we're really talking about something more specific: configurable effects or perhaps configurable non-determinism.

The test itself doesn't care much about how much buffer surrounds the effect; but if we allow test coverage to influence us here, then we want the substituted code to be as small as we can manage.  To lean of Gary Bernhardt's terminology, we want the test to be able to control a thin imperative shell.

But then what?  We can keep pouring inputs through the shell without introducing any new pressures on the design.
Our designs must consist of many highly cohesive, loosely coupled components, just to make testing easy. -- Kent Beck, Test Driven Development by Example
I came across this recently, and it helps.

A key problem with the outside in approach, is that the "costs" of setting up a test are disproportionate to the constraint we are trying to establish.  Composition of the test subject requires us to draw the rest of the owl when all we need is a couple of circles.

To borrow an idea from Dan North, testing all the way from the boundary makes for really lousy examples, because the noise gets in the way of the idea.

The grain of the test should match the grain of the constraint it describes - if the constraint is small, then we should expect that the composition will have low complexity.

What we have then, I think, is a version of testing, the human author applying a number of heuristics when designing an automated check to ensure that the subject(s) will exhibit the appropriate properties.  In other words, we're getting a lot of mileage out of aligning the test/subject boundaries before we even get to green.

The kinds of design improvements that we make while refactoring?
There is definitely a family of refactorings that are motivated by the idea of taking some implementation detail and "lifting" it into the testable space. I think that you can fairly say that the (future) test is influencing the design that emerges during the refactoring.

I'm not convinced that we can credit tests for the results that emerge from the Design Dynamo.  My current thinking is that they are playing only a supporting role - repeatedly evaluating compliance with the constraints after each change, but not encouraging the selection of a particular change.

Further Reading

Mark Seemann: The TDD Apostate.
Michael Feathers: Making Too Much of TDD.