Sunday, April 25, 2021

TDD: When the API changes

If I discover my API is bad after writing 60 tests, I have to change a lot! -- shared by James Grenning

My experience with the TDD literature is that it tends to concentrate of situations that are fixed.  Here's a problem, implement a solution in bite sized steps, done.

And to be fair, in a lot of cases -- where the implications of "bite sized steps" are alien, that's enough.  There's a lot going on, and eliminating distractions via simplification is, I believe, a good focusing technique.  Red-Green-Refactor is the "new" idea, and we want to keep that centered until the student can exercise the loop without concentration.

But -- if we intend that TDD be executed successfully outside of lessons -- we're going to need to re-introduce complicating factors so that students can experience what they are going to face in the wild.

Wednesday, April 21, 2021

TDD: Show Your Work Designs

I was reading Brian Marick's write up of his experience with Hillel Wayne's budget modeling experiment.  

Toward the end of the exercise, Marick writes:

I decided to write an affordability function that would call the same functions as can_afford? but return rich results instead of funneling all the possibilities into true or false....  Then I wrote can_afford? in terms of affordability.

I've seen this pattern a number of times recently.  It came up during a transport tycoon exercise,  where I started exploring the idea of generating a complete report of a simulation, and then extracting from that report the answer to the simple question.

Before that, it appeared when I started using kitchen sink logging in my AWS Lambda functions; the event handler produces a record of all of the information it has collected along the way, one field of which is "the answer".  The log entry for the request get a detailed document of all of the (not secret) information, and we can later carve that information into smaller pieces.

What the pattern reminds me of is STEM exams in high school; writing out the derivation of the answer long form, with the final calculation circled at the bottom.  The motivation is, I think, much the same; the view of the intermediate calculations is an important diagnostic tool when the final answer indicates a fault.

GeePaw Hill has recently been discussing the Made application and the Making application.  The Made application provides narrow, targeted affordances, designed to delight the end user who pays the bills.  But the purpose of the Making application is delightful (cost effective) making, where the human in the loop has different concerns.

Injecting the Making into the Made gives us, perhaps, the best of both worlds.

I also see similarity between this idea and Ward Cunningham's early description of technical debt; the underlying report is going to be more closely aligned with how we think about the domain than the simple distillation of the answer, and with the long form design in place, we have code that is aligned with the business, and should be easy to change when the business expects the code to be easy to change.

 

Wednesday, March 10, 2021

TDD: Unreachable states

 This week, I tried a master mind coding exercise, two different ways.

Mastermind was a code breaking game from my childhood; each incorrect guess returns a hint, describing how close your guess was to the goal.

At the Boston Software Crafters meetup, our breakout room first attacked the problem of identifying all 5000+ codes (ten letters, but no repeats).  Once we got that sorted, we then started working on implementing the filters we would need to eliminate candidates.  And progress, though steady from that point, was slow - we had to think a lot about what the next guess might be.

Working the problem on my own the next day, I made two changes to my approach - I deliberately introduced an (untested) adapter between the game client and my more easily tested design, and then with that more easily tested design I started working with unreachable candidate lists.

By unreachable, what I mean is that there is no sequence of guesses that would eliminate all of the other possibilities and leave just the two samples that I had selected.

Although the samples were not reachable, they were easy to reason about.  I could concentrate my attention on how the new logic should interact with these two data points, ignoring all of the other considerations as "out of scope".

In the end, my test suite included five assertions, never more than two lines of code per assertion.  And yet, when I hooked it up to the "real" data, the system worked, right out of the gate.

@ScottWlaschin argues that it can be useful to choose designs that make illegal states unrepresentable; I don't disagree - but I think that some care is required in choosing an appropriate definition of "illegal". Some of the states that you won't encounter in a healthy system are still useful when trying to explore the properties of that system.

Tuesday, March 2, 2021

Dependency Inversion Review

Earlier this week, I decided to dig out a copy of Robert Martin's 1996 article on the Dependency Inversion Principle.

I don't find his example particularly satisfactory, in particular the way that he works the example confuses, I believe, a number of different concerns.  So I thought to try the exercise of a "purer" approach, as I would do it today.

To begin, let's consider the original starting implementation of Copy()

Now,my first priority is that I not break any existing clients. So my intention is to refactor this code without changing the behavior or the signature.

That means I'm going to work my way up to an "extract function" refactoring, where the new function has the re-usable design that we are looking for.  

To begin, we need to think about replacing our dependencies on the I/O functions with abstractions.  Martin dives quickly into "objects" to address this in his examples, but that seems an imprecise hammer to use, given that functions already permit a perfectly satisfactory abstraction - the function pointer.

With an couple of variables to capture the functions we are invoking in our default implementation, it's not trivial to extract our improved method.

And done.

There's no particular magic to the fact that I use function pointers here.  In the kingdom of nouns,  these would be abstraction class instances or interfaces.  In a language like python, it would be a "callable".

Note that we have changed the code by adding a third leaf dependency to the copy function.  Copy() is otherwise a lot simpler, but mostly because we've stashed the complexity under another card.  If CopyV2 is part of the published interface, then we have introduced a new capability that allows consumers to provide substitutes for ReadKayboard and WritePrinter; CopyV2 is likely easier to test than its predecessor.

On the other hand, we're introducing the liability of more code now, in the hopes of accruing some benefit later.  And this isn't a particularly difficult refactoring to introduce "later".

Is the ease with which this refactoring can be introduced representative of the code that we encounter in the wild?  I believe so - but spaghetti code is certainly a thing, and this example doesn't obviously demonstrate that handling entanglement is trivial.

 

Tuesday, December 8, 2020

Jason Gorman: Outside In

Jason Gorman recently published a short video where he discusses outside in TDD. It's good, you should watch it.

That said, I do not like thing.

So I want to start by first focusing on a few things that I did like:

First, it is a very clean presentation.  He had a clear vision of the story he wanted to tell about his movement through the design, and his choice to mirror that movement helps to re-enforce the lesson he is sharing.

Second, I really like his emphasis on the consistency within an abstraction; that everything within a module is at the same abstraction level.

Third, I really liked the additional emphasis he placed on the composition root.  It's an honest presentation of some of the trade offs that we face.  I myself tend to eschew "unit tests" in part because I don't like the number of additional edges it adds to the graph - here, he doesn't sweep those edges under the rug; they are right there where you can see them.


I learned TDD during an early part of the Look ma, no hands! era, which has left me somewhat prone to question whether the ritual is providing the value, or if instead it is the prior experience of the operator that should get the credit for the designs we see on the page. 

Let us imagine, for a moment, that we were to implement the first test that he shows, and furthermore we were going to write that code "assert first".  Our initial attempt might look like the implementation below

Of course, this code does not compile.  Let's try "just enough code to compile".  So we need a Product with a constructor, and an Alert interface with a send method, and then to create a product instance and an alert mock.  In the interest of a clean presentation, I'll take advantage of the fact that Java allows me to have more than one implementation within the same source file.

What more do we need to make this test pass? Not very much

This is not, by any means, completed code - there is certainly refactoring to be done, and duplication to remove, and all the good stuff.

The text of Jason Gorman's test, and his presentation of it, lead me to believe that the test was not supporting refactoring.  What I believe happened instead is that Gorman had a modular design in his mind, and he typed it in.  The behavior of ReorderLevel, in particular, has no motivation at all in the statement of the problem - it is an element introduced in the test because we've already decided that there should be a boundary there.

This isn't a criticism of the design, but rather the notion that TDD can lead to modular code.  I'm not seeing leading here; but something more akin to the fact that TDD happened to be near by while outside-in modular design was happening.

The assignment of credit is completely suspect.

The second thing that caught my eye is expressed within this assert method itself.  The presentation at the end of the exercise shows us tests, and code coverage, and contract tests, all great stuff... but after all that is done, we still ended up with a "data class" that doesn't implement its own version of `equals`.  The mockito verification shown in the first test is an identity check - the test asks did this instance handle move through the code from where we put it in to where we took it out?

Product, here, is a data structure that holds information that was copied over the web.  The notion that we need a specific instance of it are suspect.

Is it an oversight?  I've been thinking on that, and my current position is that it is indistinguishable from an oversight.  We're looking at a snapshot of a toy project, so it's very hard to know from the outside whether or not `Product` should override the behavior of `Object.equals` -- but there's no obvious point in the demonstration that you can point to and say "if that were a mistake, we would catch it here".

In other words: what kinds of design mistakes are the mistake detectors detecting?  Do we actually make those mistakes?  How long must we wait until the savings from detecting the mistakes offsets the cost of implementing and maintaining the mistake detectors?

Myself, I like designing from the outside in.  I have a lot of respect for endotesting. But at the end of the day, I find that the demonstrated technique doesn't lead to a modular design, but rather locks in a specific modular design.  And that makes me question whether the benefits of locking in offset the costs of maintaining the tests.

Thursday, November 26, 2020

TDD: Controlled Data

 I have been paying more attention to parsing of late, and that in turn led me to some new understandings of TDD.

Back in the day when I was first learning about RED/GREEN/REFACTOR, the common approach to attacking a problem was to think about what the design might look like at the end, choose a candidate class from that design, and see what could be done to get it working.

Frequently, people would begin from a leaf class, expecting to build their way up, and that seemed wrong to me.  The practical flaw was the amount of work they would need to do before you could finally point to something that delivered value.  The theoretical flaw was this: I already knew how to guess what the design should be.  What I wanted, and what I thought I was being offered, was emergent design -- or failing that, experiments that would honestly test the proposition that a sound design could emerge from tests.

The approach I preferred would start from the boundary; let's create a sample client, and using that requirement begin enumerating the different behaviors we expect, and discover all of the richness underneath by rearranging the implementation until it is well aligned with our design sense.

Of course, we're still starting with a guess, but it's a small guess -- we know that if our code is going to be useful there must be a way to talk to it.  Bootstrapping can be a challenge -- what does the interface to our code look like.

And in group exercises, I've had a fair amount of success with this simple heuristic: choose something that's easy to type.

Let's take a look at the bowling game, in the original Klingon.  These days, I most commonly see the bowling game exercise presented as a series of drills (practice these tests and refactorings until it becomes second nature); but in an early incarnation it was a re-enactment of a pair programming episode.

Reviewing their description, two things stand out.  First, that they had initially guessed at an interface with extra data types, but rejected it when they realized the code that they "wanted to type" would not compile.  And second, that they defer the question of what the implementation should do with an input that doesn't belong to the domain

tests in the rest of the system will catch an invalid argument.

I want to take a moment to frame that idea using a different language.  Bertrand Meyer, in Object-oriented Software Construction, begins with a discussion of Correctness and Robustness.  What I see here is that Koss and Martin chose to defer Robustness concerns in order to concentrate on Correctness.  For the implementation that they are currently writing, properly sanitized inputs are a pre-condition imposed on the client code.

But when you are out near an unsanitized boundary, you aren't guaranteed that all inputs can be mapped to sanitized inputs.  If your solution is going to be robust, then you are going to need graceful affordances for handling abnormal inputs.

For an implementation that can assume sanitized inputs, you can measure the code under test with a relatively straight forward client like


But near the boundary?


I don't find that this produces a satisfactory usage example.  Even if we were to accept that throwing a unchecked exception is a satisfactory response to an invalid input, this example doesn't demonstrate graceful handling of the thrown exception.

Let's look at this first example again.  What is the example really showing us?

I would argue that what this first example is showing us is that we can create a bowling score from a sanitized sequence of inputs.  It's a recipe, that requires a single external ingredient.

Can we do the same with our unsanitized inputs? Yes, if we allow ourselves to be a little bit flexible in the face of ambiguity.

A parser is just a function that consumes less-structured input and produces more-structured output. -- Alexis King

I want to offer a definition that is consistent in spirit, but with a slightly different spelling: a parser is just a function that consumes less-structured input and produces ambiguous more-structured output.

When we parse our unsanitized sequence of throws, what gets produced is either a sanitized sequence of throws or an abnormal sequence of throws, but without extra constraints on the input sequence we don't necessarily know which.

In effect, we have a container with ambiguous, but sanitized, contents.

That ambiguity is part of the essential complexity of the problem when we have to include sanitizing our inputs.  And therefore it follows that we should be expressing that idea explicitly in our design.

We don't have to guess all of the complexity at once, because we can start out limiting our focus to those controlled inputs that should always produce a bowling score; which means that all of the other cases that we haven't considered yet can be lumped into an "unknown" category -- that's going to be safe, because a correct implementation must not use the unknown code path when provided with pre-sanitized inputs.

When we later replace the two alternative parser with one that produces more alternatives -- that's just more dead branches of code that can again be expressed as unknown.

In the simple version of the bowling game exercise, we need three ingredients

  • Our unsantizied input
  • a transform to use when we terminate the example with a valid score
  • a transform to use when we terminate the example with abnormal inputs.

So we can immediately reach for whatever our favorite pattern for transforming different results might be.

Following these ideas, you can spin up something like this in your first pull:


And now that everything is compiling, you can dig into the RED/GREEN/REFACTOR cycle and start exploring possible designs.

Now, I've palmed a card here, and I want to show it because it is all part of the same lesson.  The interface provided here embraces the complexity of unsanitized data, but it drops the ball on timing -- I build into this interface an assumption that the unsanitized data is all arrives at the same time.  If we are defining for a context where unsanitized throws arrive one at a time, then our examples may need to show how we explicitly handle memory; and we may have to make good guesses about whether we need to announce the abnormal input at once, or if we can let it wait until the score method is invoked.

The good news: often we have over simplified because our attention was on the most common case; so in the case that we discover our errors late, there can be a lot of re-use in our implemetation.

The bad news: if our clients are over simplified, then the implementations that are using our clients are going to need rewrites.

If we're aware, we can still tackle the problem in slices - first releasing a client that delivers business value in the simplest possible context, and then expanding our market share with new clients the deliver that value to other contexts.

Friday, May 8, 2020

HTML5 forms with base.href

I'm working on a REST API, and to keep things simple for myself I'm using text/html for my primary media type.  The big advantage is that I can use a general purpose web browser to test my API by hand.

In this particular case, the client communicates with the API via a services of web form submissions.  That allows my server to control which cached representations will be invalidated when the form is submitted.

As an experiment, instead of copying the target-uri into the form actions, I decided to try using a BASE element, with no action at all, expecting that the form would be submitted to the base href.

But what the HTML 5 specification says is:
If action is the empty string, let action be the URL of the form document.
So that doesn't work.  Via stack overflow, I discovered a candidate work around - the target of the form can be a fragment; the fragment delimiter itself means the form action is not empty, and therefore relative lookup via the base element is enabled.

Somewhat worrisome: I haven't managed to find clear evidence that form behavior with fragments is well defined.  It occurred to me that perhaps  the standard was playing clever with URL vs URI, but I abandoned that idea after clicking a few more links to review the examples of URLs, and discovered fragments there.

I suspect that I need to let go of this idea - it's not that much extra work to copy the base uri into the forms that want it, and I've got a much clearer picture of what happens in that case.