Tuesday, April 23, 2019

Saturday, April 20, 2019

Sketching Evil Hangman, featuring GeePaw

GeePaw Hill christened his twitch channel this week to a presentation of his approach to TDD, featuring an implementation of Evil Hangman.

Evil Hangman is a mimic of the traditional word guessing game, with a twist -- evil hangman doesn't commit to a solution immediately.  It's a good mimic - the observable behavior of the game is entirely consistent with a fair implementation that has committed to some word in the corpus.  But it will beat you unfairly if it can.

So, as a greenfield problem, how do you get started?

From what I can see, there are three approaches that you might take:
  • You can start with a walking skeleton, making some arbitrary choices about I/O, and work your way inward.
  • You can start with the functional core, and work your way inward
  • You can start with an element from a candidate design, and work your way outward.
GeePaw, it seems, is a lot more comfortable when he has pieces that he can microtest.  I got badly spooked on that approach years ago when I looked into the sudoku exercise.  So my preference is to choose an observable behavior that I understand, fake it, remove the duplication, and then add additional tests to the suite that force me to extend the solution's design.

Ultimately, if you examine the complexity of the test subjects, you might decide that I'm writing "integrated" or "acceptance" tests.  From my perspective, I don't care - the tests are fast, decoupled from the environment, and embarrassingly parallel.  Furthermore, the purpose of the tests is to facilitate my design work, not to prove correctness.

What this produces, if I do it right, is tests that are resilient to changes in the the design, but which may be brittle to changes in requirements.

My early tests, especially in greenfield work, tend to be primitive obsessed.  All I have in the beginning are domain agnostic constructs to work with, so how could they be anything but?  I don't view this as a failing, because I'm working outside in -- which is to say that my tests are describing the boundary of my system, where things aren't object oriented.  Primitives are the simplest thing that could possibly work, and allow me to move past my writer's block into having arguments with the code.

As a practice exercise, I would normally choose to start from the boundary of the functional core -- we aren't being asked to integrate with anything in particular, and my experiences so far haven't suggested that there is a lot of novelty there.
One should not ride in the buggy all the time. One has the fun of it and then gets out.
So, where to begin?

I'm looking for a function - something that will accept some domain agnostic arguments and return a domain agnostic value that I can measure.

Here, we know that the basic currency is that the player will be making guesses, and the game will be responding with clues.  So we could think in terms of a list of string inputs and a list of string outputs.  But the game also has hidden state, and I know from hard lessons that making that state an input to the test function will make certain kinds of verification easier.

The tricky bit, of course, is that I won't always know what that hidden state is until I get into the details.  I may end up discovering that my initial behaviors depend on some hidden variable that I hadn't considered as part of the API, and I'll need to remediate that later.

In this case, one of the bits of state is the corpus - the list of words that the game has to choose from.  Using a restricted word list makes it easier to specify the behavior of the implementation.  For instance, if all of the words in the corpus are the same length, then we know exactly how many dashes are going to be displayed in the initial hint.  If there is only a single word in the corpus, then we know exactly how the app will respond to any guess.

Making the corpus an input is the affordance that we need to specify degenerate cases.

Another place where degeneracy might be helpful is allowing the test to control the players mistake allotment.  Giving the human player no tolerance for errors allows us to explore endgame behavior much more easily.

And if we don't think of these affordances in our initial guess?  That's fine - we introduce a new entry point with an "extract method refactoring", eliminating duplication by having the original test subject delegate its behavior to our improved API, deprecating the original, and eliminating it when it is no longer required.

Simplest thing that can possibly work is a proposal, not a promise.

For the most part, that's what my sketches tend to look like: some exploration of the problem space, a guess at the boundary, and very little early speculation about the internals.

Friday, April 19, 2019

TDD and incremental delivery

I spend a lot of time thinking about breaking tests, and what that means about TDD as a development ritual.

I recently found a 2012 essay by Steven Thomas reviewing Jeff Patton's 2007 Mona Lisa analogy.  This in turn got me thinking about iteration specifically.

A lot of the early reports of Extreme Programming came out of Chrysler Comprehensive Compensation, and there's a very interesting remark in the post mortem
Subsequent launches of additional pay populations were wanted by top management within a year.
To me, that sounds like shorthand for the idea that the same (very broad) use case was to be extended to cover a larger and more interesting range of inputs with only minor changes to the behaviors already delivered.

The tests that we leave behind serve to describe the constraints necessary to harvest the low hanging fruit, as best we understood them at the time, to identify regressions when the next layer of complexity was introduced to the mix.

We're writing more living documentation because we are expecting to come back to this code next year, or next month, or next sprint.

I envision something like a ten case switch statement -- we'll implement the first two cases now, to cover perhaps a third of the traffic, and then defer the rest of the work until "later", defined as far enough away that the context has been evicted from our short term memory.

If the requirements for the behaviors that you implemented in the first increment are not stable, then there is non trivial risk that you'll need several iterations to get those requirements right.  Decisions change, and the implications of those changes are going to ripple to the nearest bulkhead, in which case we may need a finer grain testing strategy than we would if the requirements were stable.

At the other extreme, I'm working an an old legacy code base; this code base has a lot of corners that are "done" -- modules that haven't changed in many years.  Are we still profiting by running those tests?

This is something we should keep in mind as we kata.  If we want to be preparing for increments with significant time intervals between them, then we need bigger input spaces with stable requirements.

A number of the kata are decent on the stable requirements bit -- Roman numerals haven't changed in a long time -- but tend to be too small to justify not solving the whole thing in one go.  Having done that, you can thank your tests and let them go.

The best of the kata I'm familiar with for this approach would be the Gilded Rose - we have a potentially unlimited catalog of pricing rules for items, so we'll incrementally adapt the pricing logic until the entire product catalog is covered.

But - to do that without breaking tests, we need stable pricing rules, and we need to know in advance which products follow which rules.  If we were to naively assume, for example, that Sulfuras is a normal item, and we used it as part of our early test suite, then the updated behavior would break those tests.  (Not an expensive break, in this case -- we'd likely be able to replace Sulfuras with some other normal item, and get on with it).

Expressing the same idea somewhat differently: in an iterative approach, we might assume that Sulfuras is priced normally, and then make adjustments to the tests until they finally describe the right pricing constraints; in an incremental approach, Sulfuras would be out of scope until we were ready to address it.

I think the scheduling of refactoring gets interesting in an incremental approach - how much refactoring do you do now, when it is still uncertain which increment of work you will address next?  Is Shameless Green the right conclusion for a design sessions?

The Sudoku problem is one that is one that I struggle to classify.  One the one hand, the I/O is simple, and the requirements are stable, so it ought to hit the sweet spot for TDD.  You can readily imagine partitioning the space of sudoku problems into trivial, easy, medium, hard, diabolical sets, and working on one grouping at a time, and delivering each increment in turn.

On the other hand, Dr Norvig showed that, if you understand the problem, you can simply draw the rest of the fucking owl. The boundaries between the categories are not inherent to the problem space.

Wednesday, April 10, 2019

Read Models vs Write Models

At Stack Exchange, I answered a question about DDD vs SQL, which resulted in a question about CQRS that I think requires more detail than is appropriate for that setting.
The "read model" is not the domain model (or part of it)? I am not an expert on CQRS, but I always thought the command model is quite different from the classic domain model, but not the read model. So maybe you can give an example for this?
So let's lay some ground work

A domain model is not a particular diagram; it is the idea that the diagram is intended to convey.  It is not just the knowledge in the domain expert's head; it is a rigorously organized and selective abstraction of that knowledge.  -- Eric Evans, 2003.
A Domain Model creates a web of interconnected objects, where each object represents some meaningful individual, whether as large as a corporation or as small as a single line on an order form.  -- Martin Fowler, 2003.
I think that Fowler's definition is a bit tight; there's no reason that we should need to use a different term when modeling with values and functions, rather than objects.

I think it is important to be sensitive to the fact that in some contexts we are talking about the abstraction of expert knowledge, and in others we are talking about an implementation that approximates that abstraction.

Discussions of "read model" and "write model" almost always refer to the implemented approximations.  We take a single abstraction of domain knowledge, and divide our approximation of it into two parts - one that handles our read use cases, and another that handles our write use cases.

When we are handling a write, there are usually constraints to ensure the integrity of the information that we are modeling.  That might be as simple as a constraint that we not overwrite information that was previously written, or it might mean that we need to ensure that new writes are consistent with the information already written.

So to handle a write, we will often take information from our durable store, load it into volatile memory, then create from that information a structure in memory into which the new information will be integrated.  The "domain logic" calculates new information, which is written back to the durable store.

On the other hand, reads are safe; "asking the question shouldn't change the answer".  In that case, we don't need the domain logic, because we aren't going to integrate new information. We can take the on disk representation of the information, and transform it directly into our query response, without passing through the intermediate representations we would use when writing.

We'll still want input sanitation, and message semantics that reflect our understanding of the domain experts abstraction, but we aren't going to need "aggregate roots", or "locks" or the other patterns that prevent the introduction of errors when changing information.  We still need the data, and the semantics that approximate our abstraction, but we don't need the rules.

We don't need the parts of our implementation that manage change.

When I answer "the query itself is unlikely to pass through the domain model", that's shorthand for the idea that we don't need to build domain specific data structures as we translate the information we retrieved from our durable store into our response message.


Monday, April 1, 2019

TDD from Edge to Edge

In my recent TDD practice, I've been continuing to explore the implications of edge to edge tests.

The core idea being, is this - if we design the constraints on our system at the edge, then we maximize the degrees of freedom we have in our design.

Within a single design session, this works just fine. My demonstration of the Mars Rover kata limits its assumptions to the edge of the system, and the initial test is made to pass by simply removing the duplication.

The advantage of such a test is that it is resilient to changes in the design. You can change the arrangement of the internals of the test subject, and the test itself remains relevant.

The disadvantage of such a test is that it is not resilient to changes in the requirements.

It's common in TDD demonstrations to work with a fixed set of constraints throughout the design session. Yes, we tend to introduce the constraints in increments, but taken as a set they tend to be consistent.

The Golden Master approach works just fine under those conditions; we can extend our transcript with descriptions of extensions, and then amend the test subject to match.

But a change in behavior? And suddenly an opaque comparison to the Golden Master fails, and we have to discard all of the bath water in addition to the baby.

We might describe the problem this way: the edge to edge test spans many different behaviors, and a change to a single behavior in the system may butterfly all the way out to the observable behavior. In other words, the same property that makes the test useful when refactoring acts against us when we introduce a modification to the behavior.

One way to side step this is to take as given that a new behavior means a new test subject. We'll flesh out a element from scratch, using the refactoring task in the TDD cycle to absorb our previous work into the new solution. I haven't learned that this is particularly convenient for consumers. "Please update your dependency to the latest version of my library and also change the name you use to call it" isn't a message I expect to be well received by maintainers that haven't already introduced design affordances for this sort of change.

So what else? How do we arrange our tests so that we don't need to start from scratch each time we get a request for a breaking change?

Recently, I happened to be thinking about this check in one of my tests.

When this check fails, we "know" that there is a bug in the test subject.  But why do we know that?

If you squint a bit, you might realize that we aren't really testing the subject in isolation, but rather whether or not the behavior of the subject is consistent with these other elements that we have high confidence in.  "When you hear hoof beats, expect horses not zebras".

Kent Beck describes similar ideas in his discussion of unit tests.

There is a sort of transitive assertion that we can make: if the behavior of the subject is consistent with some other behavior, and we are confident that the other behavior is correct, then we can assume the behavior of the test subject is correct.

What this affords is that we can take the edge to edge test and express the desired behavior as a composition of other smaller behaviors that we are confident in.  The Golden Master can be dynamically generated from the behavior of the smaller elements.

Of course, the confidence in those smaller elements comes from having tests of their own, verifying that those behaviors are consistent with simpler, more trusted elements.  It's turtles all the way down.

In this sort of design, the smaller components in the system act as bulkheads for change.

I feel that I should call out the fact that some care is required in the description of the checks we create in this style.  We should not be trying to verify that the larger component is implemented using some smaller component, but only that its behavior is consistent with that of the smaller component.