Wednesday, December 26, 2018

Study: SimpleDateFormat

I got an interesting lesson in code duplication today.

In reviewing some of our legacy code, I happened across this simple looking piece of code:

and it occurred to me to wonder... how many places are we calling that constructor?

And the answer is: a lot.

A quick scan of the various repositories I had cached locally showed that three different format strings were each being used to initialize the SimpleDateFormat at more than 50 different places in our code.

“Don’t Repeat Yourself” was never about code. It’s about knowledge. -- Mathias Verraes

Could it really be the case that this code was, in fact, DRY? Had we written the code the most effective way we could?

The strongest piece of evidence I have? Timestamps on the changes. The lines of code in question are old. They haven't needed to change in the years since they were written.

Why not? A large chunk of the credit must belong with the fact that SimpleDateFormat is part of the standard library, and the Java team has ensured that functionality has been sufficiently stable for our use.

A secondary possibility is that the date format is taking an in memory representation of a date time, and converting into an human readable representation to be included in a message, or vice versa. The adapters are stable because the message formats have been stable.

Because the message formats are stable, the code in question has very little reason to change. As a consequence, we don't get burned when we take what could be a call into a common method and instead choose to inline it.

What I think this suggests: when/if we decide to update the date formatting and parsing logic, we should be expecting to keep the actual format strings inline. Encapsulating the implementation details is, I think, easier to justify when you are in the middle of an exercise to change those details.

Tuesday, November 6, 2018

Ayende: Fear of an Empty Source File

And yet, probably the hardest thing for me is to start writing from scratch. If there is no code already there, it is all too easy to get lost in the details and not actually be able to get anywhere.  -- Oren Eini
This paralysis is the same thing I try to attack with simplest thing that could possibly work.

Dynamic friction encourages more progress than static friction.

Saturday, October 27, 2018

REST, Command Messages and URI Design

I posted the examples above into a discussion about how to send command messages to a domain model.

The domain model is the inner core of the onion. In particular, it shouldn't be directly coupled to HTTP.

The client sends one HTTP request to a single socket, and that the listener on that socked translates the HTTP Request into a message that the domain model understands. There's no particular advantage, from the point of view of the domain model, that some of the information is provided in the message body, and some in the headers, when it is all being transformed before it reaches the core domain logic.

Consider, for example, a form in a web page. It is a perfectly straight forward exercise to ensure that the application/x-www-form-urlencoded representation of the message is complete by adding a few hidden fields that the consumer doesn't need to worry about. When this is done, the particular URL used in the action really doesn't matter.

This is generally true: the resource identifier doesn't have to support the domain, because we have the message body for that.

The target resource identified by the request line becomes a free variable.

In HTTP, GET doesn't have a message body. When we request a resource from the server, any parameters that the server requires to discriminate between resources needs to be included in the identifier itself. Because HTTP is designed for large-grain hypermedia transfer, there are provisions for the server to provide to the client the appropriate caching semantics.

The real significance of the URI for POST is this -- it gives us the means to invalidate the representations of a specific resource from the client's cache.

Saturday, October 20, 2018

Wumpus RNG and the functional core

I've been experimenting with refactoring the wumpus, and observed an interesting interaction with the functional core.

For illustration, let's consider the shooting of an arrow.  After fighting a bit with the console, the player submits a firing solution.  This, can pass along to the game module, but there are two additional twists to consider. 

First: if the firing solution requires a tunnel between rooms that doesn't exist, the arrow instead travels randomly -- and might end up bouncing several more times. 

Second: if the arrow should miss, the wumpus may move to another room.  Again, randomly.

In my refactorings, the basic shape of the method is a function of three arguments: the firing solution, and some source of "random" numbers.  At this abstraction level, shooting the arrow looks to be atomic, but in some scenarios the random numbers are consumed.

I took some time today to sketch the state machine, and a slightly different picture emerges.  The game is Hunting, and invoking the shootArrow method takes the game to some other state; an incorrectly specified firing solution takes us to a BouncingArrow state, and invoking the bounceArrow method.  When the arrow has finished bouncing, we may end up in the ArrowMissed state, in which case we wakeWumpus, and discover whether or not the hunter has met his end.



The internal core knows nothing about sources of data - no command line, no random numbers, just computing transitions from one state to the next. This means that it is very easy to test, everything just sits in memory. The orchestration logic gets to worry about where the data comes from, but is completely decoupled from the core logic of the game.

Wednesday, October 17, 2018

Thought on Mocks

Based on a discussion with @rglover.
Any veterans have a clear reason why not to use mocks?
"Mocks" are fun, because without a shared context you can't always be certain that the other side of the debate understands your arguments the way that you intend them.  Is the conversation specifically about behavior verification ? about all flavors of test doubles ? Are we talking about the test scaffolding that developers use to drive their designs, or the tests we run at choke points to prevent mistakes from propagating to the next stage of the pipeline?

Both Katrina Owen and Sandi Metz  share a model of what should be asserted by tests; incoming commands and queries are checked by asserting the result returned by a query of the system under test; but checking outgoing commands is tricker.

So let's assume for the moment that last problem is the one we are trying to solve.


When I was learning poker, John Finn and Frank Wallace introduced me to the concept of investment odds: the key idea being that our play should be based on a calculus derived from
  • the payoff
  • the probability of success
  • the cost to play
and a comparison of these values against the alternative plays available.


Now, let's look to the Three Laws of TDD, as described by Robert Martin.
You are not allowed to write any production code unless it is to make a failing unit test pass.
If we were to accept this discipline, then we would absolutely need to be able to satisfy a failing check.  That would mean that either we would need to include in the composition of the system under tests the real, production-grade collaborator, or we need to fake it.  Or we can tell Uncle Bob to get stuffed, and write the code we need.

Does the mock give us the best investment odds?

That's the first question, and the answer depends on circumstance.  Simply writing the code requires no additional investment, but does nothing to mitigate risk.

Using the real collaborator gives the most accurate results.  But that's not without cost - you'll pay for latency and instability in the collaborator.  For collaborators within the functional core, that may be acceptable - but the odds of it becoming unacceptable increase if the collaborator is coupled to the imperative shell.  Latency at an automated choke point may be acceptable, but paying that tax during the development cycle is not ideal.  Unstable tests incur investigation costs if the results are taken seriously, and are a distraction if no one can take the initiative to delete them.

There can also be additional taxes if the setup of the collaborator is complicated enough that it distracts from the information encoded into the test itself.  And of course the test becomes sensitive to changes in the setup of the collaborator.

One can reasonably posit circumstances in which the mock is a better play than either of all or nothing.

But there is another candidate: what about a null object?

A null object alone doesn't give you much beyond not testing at all.  It does suggest a line of inquiry; what are you getting from the mock that you aren't getting from the null object... and is the added benefit of the mock something you should be getting from the production code instead?

In other words, are the tests trying to direct your attention to a instrumentation requirement that should be part of your production code?

Especially in the case where you are making off a command that would normally cross a process boundary, having some counters in place to track how many calls were made, or something about the distribution of the arguments passed, could easily have value to the operators of the system.

And, having introduced that idea into the system, do the investment odds still favor the mock?

Even away from the boundary, I think the question is an interesting one; suppose the system under test is collaborating with another module within the functional core.  Is the element of the test that is prejudicing us toward using a mock actually trying to signal a different design concern?  Is the test trying to tell us that the composition is too much work, or that the code is slow, and we we need smarter logic at the boundary of slow?

I don't think the answer is "never mock"; but rather that there is a decision tree, and the exercise of questioning the choice at each fork is good practice.



Red Blue Refactor

As far as I can tell, the mantra "Red Green Refactor" is derived from the colors of the UI controls in the early GuiTestRunners, which were influenced by the conventions of traffic control system, which perhaps were influenced by railroad control systems.

(Image from JSquaredZ).

As labels, `Red` and  `Green` are fine - they are after all just hints at a richer semantic meaning. 

As colors, they suck as a discriminator for the percentage of the population with Deuteranopia.

Looking for another color pair to use, I'm currently thinking that past and future are reasonable metaphors for code improvement, so I'm leaning toward a red/blue pairing.

(Image from Wikipedia)

Of course, in English, "past" and "passed" are homophones, so this perhaps only trades one problem for another. 


Friday, October 12, 2018

Event Sourcing: lessons in failure, part 2

I've written a couple of solo projects using "event sourcing", failing miserably at them because I failed to properly understand how to properly apply that pattern to the problem I was attempting to solve.

Part 2: Fantasy League Scoring

Our fantasy league used a bespoke scoring system, so I decided to try my hand at creating a report for myself to track how every player in baseball was doing.  This gave me extra insights about how I might improve my team by replacing players in the middle of the season.

And to a large extent, it worked - I was able to pick up a number of useful pieces that would otherwise have slipped under the radar, turning over my team much more aggressively than I would have otherwise.

It was still pretty much a crap shoot -- past performance does not promise future results.  But it did have the benefit of keeping me more engaged.

Failure #1: Where are the events?

Again, I had a book of record issue - the events were things happening in the real world, and I didn't have direct access to them.  What I had was a sort of proxy - after a game ended, a log of the game would become available.  So I could take that log, transform it into events, and the proceed happily.

Well, to be honest, that approach is pretty drunk.

The problem is that the game log isn't a stream of events, it is a projection.  Taking what is effectively a snapshot and decomposing it reverses cause and effect.  There were two particular ways that this problem would be exposed in practice.

First, it would sometimes happen that the logs themselves would go away.  Not permanently, usually, but at least for a time.  Alternatively, they might be later than expected.  And so it would happen that data would get lost - because a stale copy of a projection was delivered instead of a live one.  Again, the projections aren't my data, they are cached copies of somebody else's data, that I might use in my own computations.

Second, when these projections appear, they aren't immutable.  That's a reflection of both problems with data entry upstream (a typo needs to be fixed), and also the fact that within the domain, the interpretation of the facts can change over time -- the human official scorers will sometimes reverse an earlier decision.

In other words, for what I was doing, events didn't gain me anything over just copying the data into a RDBMS, or for that matter writing the data updates into a git repository on disk.

Failure #2: Caching

The entire data processing engine depended on representations of data that changed on a slow cadence (once a day, typically) and I wasn't tracking any meta data about how fresh the data was, how stable it ought to be, whether the new data in question was a regression from what had been seen earlier, and so on.

In an autonomous system, this is effectively a sort of  background task - managing local static copies of remote data.

To make this even more embarrassing; I was of course downloading this data from the web, and the HTTP specification has a lot to say about caching, that I didn't even consider.

(I also failed to consider the advantages I might get from using a headless browser, rather than just an html parser.  This bit me hard, and made a big contribution toward the abandoning of the project.)

Failure #3: What's missing?

The process I was working from only considered logs that were available; there was no monitoring of logs that might be missing, or that might have been removed.  This introduced small errors in data comparisons.

I needed to be able to distinguish "here are Bob Shortstop's 6 scores from last week" from "here are Bob Shortstop's 5 scores from last week, and there is a game unaccounted for".

Again, I was thinking of events as things that happen, rather than as events as a way of representing state over time.

Failure #4: Process telemetry

What I wanted, when the wheels were done turning, was the collection of reports at the end.  And that meant that I wasn't paying enough attention to the processes I was running.  Deciding when to check for updated data, on what grain, and orchestrating the consequences of changes to the fetched representations was the real work, and instead I was thinking of that as just "update files on disk".  I didn't have any reports I could look at to see if things were working.

Again, everything was flat, everything was now, absolutely no indications of when data had appeared, or which representation was the source of a particular bit of data.

Solving the Wrong Problem

In effect, what happened is that I would throw away all of my "events" every morning, then regenerate them all from the updated copies of the data in the cache.  If all of your events are disposable, then something is going badly wrong.

The interesting things to keep track of were all related to the process, and discovering that I wanted to refresh the caches slowly, but in a particular priority.

What I should have been looking toward was Rinat's model of a process manager; how would I support a dashboard showing a list of decisions to be made, could I then capture the priorities of the "domain expert" and automate the work.  Could I capture time as a first class concern driving the elements of the system forward?

Some of the reason that I missed this is that I had too much time -- I was deliberately hobbling the fetch of the data, which meant that the cost of redoing all of the work was lost in the noise.  On the other hand, that doubly emphasizes the point that all of the value add was in the bookkeeping, which I never addressed.

Key Question:

Did I need temporal queries?  No.






Wednesday, October 10, 2018

Event Sourcing: Lessons on failure, part one.

I've written a couple of solo projects using "event sourcing", failing miserably at them because I failed to properly understand how to properly apply that pattern to the problem I was attempting to solve.

Part 1: Fantasy Draft Automation 

I came into event sourcing somewhat sideways - I had first discovered the LMAX disruptor around March of 2013.  That gave me my entry into the idea that state could be message driven.  I decided, after some reading and experimenting, that a message driven approach could be used to implement a tool I needed for my fantasy baseball draft.

My preparation for the draft was relatively straight forward - I would attempt to build many ranked lists of players that I was potentially interested in claiming for my team, and during the draft I would look at these lists, filtering out the players that had already been selected.

So what I needed was a simple way to track lists of all of the players that had already been drafted, so that they could be excluded from my lists.  Easy.

Failure #1: Scope creep

My real ambition for this system was that it would support all of the owners, including helping them to track what was going on in the draft while they were away.  So web pages, and twitter, and atom feeds, and REST, and so on.

Getting all of this support right requires being able to accurately report on all of the players who were drafted.  Which in turn means managing a database of players, and keeping it up to date when somebody chooses to draft a player that I hadn't been tracking, and dealing with the variations in spellings, and the fact that players change names and so on.

But for MVP, I didn't need this grief.  I had already uniquely identified all of the players that I was specifically interested in.  I just needed to keep track of those players; so long as I had all of the people I was considering in the player registry, and could track which of those had been taken (no need to worry about order, and I was tracking my own choices separately anyway).

Failure #2: Where is the book of record?

A second place where I failed was in understanding that my system wasn't the book of record for the actions of the draft.  I should have noticed that we had been drafting for years without this database.  And over the years we've worked out protocols for unwinding duplicated picks, and resolving ambiguity.

What I was really doing was caching outcomes from the real world process into my system.  In other words, I should have been thinking of my inputs as a stream of events, not commands, and arranging for the system to detect and warn about conflicts, rather than rejecting messages that would introduce a conflict.

There was no particular urgency about matching picks with identifiers of players in the registry, or in registering players who were not part of the registry.  All of that math could be delayed a hundred milliseconds without anybody noticing.

Failure #3: Temporal queries

The constraints that the system with were trying to enforce the rules that only players in the player registry could be selected, and that each player in the registry could only be selected once.  In addition to the fact that wasn't the responsibility of the system, it was complicated by the fact that the player registry wasn't static.

Because I was trying to track the draft faithfully (not realizing until later that doing so wasn't strictly necessary for my use case), I would stop the program when my registry had a data error.  The registry itself was just dumb bytes on disk; any query I ran against the database was a query against "now".  So changing the entries in the registry would change the behavior of my program during "replay".

Failure #4: Compatibility

Somewhat related to the above - I wasn't always being careful to ensure that the domain logic was backwards compatible with the app that wrote the messages, nor did my message journal have any explicit markers in it to track when message traffic should switch to the new handlers.

So old messages would break, or do something new, screwing up the replay until I went into the "immutable" journal to fix the input errors by hand.

Failure #5: Messages

My message schemas, such as they were, were just single lines of text - really just a transcript of what I was typing at the interactive shell.  And my typing sucks, so I was deliberately making choices to minimize typing.  Which again made it harder to support change.


Refactoring toward stateless systems?

The other day, I skimmed the video's of J. B. Rainsberger's Point of Sale Exercise.

In the videos, he talks about the design dynamo, and made a point in passing that removing duplication pushes the data toward the test.

For testing pure functions, that's clearly true - once you map the outputs to the inputs, then the function is fully general, and the specific examples to be tried, along with the answer crib, lives in the test code.  Fine.

When testing a stateful system, it's a similar idea.  The system under test is composed in its ground state, and then the test pushes additional data in to drive the system to the target state, and then we ask the big question.  But looked at from the high level, we're still dealing with a function.

But there are a number of cases where it feels natural to keep internal state within the object; especially if that state is immutable, or deliberately excluded from the API.  Wumpus has examples of each of these.  99 Bottles has duplicated immutable state, in the form of the verse templates.  Horses for courses, I suppose.  Plus the ratchet that teaches us that test data should not flow toward the production code.

But it kicked an idea loose...

If we are moving the state out of the production code, then we are largely moving it toward a database.  The composition root is responsible for wiring up the system to the correct database; in our tests, the test itself takes on this responsibility.

That in turn got me thinking about objects versus "APIs".  When we ported our systems to the web, sessions became a lot shorter - the REST architectural constraint calls for sessions that are but a single request long.

So testing such a system, where our domain starts in its default state, and then we "arrange" the preconditions of our test; this is analogous to a sequence of sessions, rather than one single session that handles multiple messages.

If you were to reflect that honestly in your test, you would have a lot of code in the test reading state out of the "objects" and copying it to the database, then pulling it out again for the next session, and so on.

I wonder if that would break us out of the object framing?

Kata: Refactor the Wumpus

I've shared a Java port of Hunt the Wumpus on Github.

https://github.com/DanilSuits/refactor-the-wumpus

The port is deliberately dreadful -- I tried to simulate a legacy code base by adhering as closely as I could manage, both in structure and in style, to the original.

Java doesn't have a useful goto, and I needed line number hints to keep track of where I was in the source code, so I've introduced a few awful names as well.

But there is a battery of tests available: record-and-playback of my implementation through a number of potentially interesting transcripts.
The key thing is that correct behavior is defined by what the set of classes did yesterday -- Michael Feathers
Producing stable, repeatable behaviors took a bit of thinking, but I was able to work out (eventually) that feature flags were the right approach.  By accident, I got part of the way there early; I had elected to make the record-and-playback tests an optional part of the build via maven profiles.

The argument for a feature flag goes something like this: what I'm really doing is introducing a change in behavior that should not be visible at run time - a dark deploy, so to speak.  Therefore, all of the techniques for doing that are in bounds.

It took a bit of trial and error before I hit on the right mechanism for implementing the changed behavior.  The code in the repository is only a sketch (the object of this exercise is _wumpus_, not feature flags), but if you squint you may be able to see Mark Seemann's ideas taking form.

With the tests in place to manage expectations, you can then engage the Simple Design Dynamo and get to work. I think in practice this particular exercise is biased more toward improve names than it is toward remove duplication, because of the style of the original.
Make the change easy, then make the easy change.  -- Kent Beck
My guess is that rather than trying to attack the code base as a whole, that it may be more effective to work toward particular goals.  Parnas taught us to limit the visibility of design decisions, so that we might more easily change them.  So look through the code for decisions that we might want to change.
  • The existing code implements its own interactive shell; how would we change the code to replace it with a library?
  • The interface for making a move or shooting an arrow is a bit clumsy, can it be replaced?
  • What changes do we need to support a web version, where each input from the player occurs in its own session.
  • Knowing the layout of the tunnel system gives the hunter a significant advantage when shooting arrows.  We could disguise the hunter's location by using randomized names.
  • Can we change the system to support mazes of different size? Mazes with more tunnels?  Mazes where the rooms are not symmetric, with "missing" tunnels? Mazes with more/fewer hazards?
  • Does the language need to be English? 

Monday, September 24, 2018

TDD: Lighting Talk

What follows is a rough sketch of the lightning talk that I gave at the Boston Software Coders meetup.

Tim Ottinger's image, hosted by Uncle Bob.

The Red Green Refactor mantra has been around for a long time, and the cycle shown above is a familiar one.  But if you look carefully, it's not really an accurate representation of what you are doing.

Let's borrow Tony Hoare's notion of preconditions and postcondions, and apply them to the various stages of the Red Green Refactor cycle.

The precondition for entering the RED stage is that we have an empty set of failing tests, and a set P (which may or may not be empty) of passing tests. The post condition is that we have a single element set of failing tests (t), and the same set P of passing tests. (If introducing a new test causes other tests to fail, then we have some undesirable coupling that needs to be addressed).

The precondition for entering the GREEN stage is that we have a single element set of failing tests (t), and a set P of passing tests. The postcondition is that the set of failing tests in empty, and the set of passing tests is the union of (t) and P. Which is to say, we've moved test (t) from the failing column to the passing column, without breaking anything else.

The precondition for entering the REFACTORING stage is that we have an empty set of failing tests (), and a set P of passing tests. The postcondition is that we have an empty set of failing tests (), and the same set P of passing tests.

What we really have here are two different cycles that happen to share starting and terminal states. The upper cycle is used to add more tests, which is to say to add constraints on the behavior of the system. The lower cycle is used to improve the implementation.

On the happy path, the upper cycle has two distinct movements; first, we introduce a new constraint by extending only the test code. Then we satisfy the constraint by modifying only the production code. This is our test calibration, we've proven that the test is actually measuring the production behavior.

But there are unhappy paths - for instance, we introduce a new test, but the test passes when first run. That puts us in an unwelcome state of green, where we haven't yet demonstrated that the code is measuring production. So first we have to drive ourselves to a red state by changing the production code, before we revert back to a true green.

If you are practicing test driven development, you are already doing these things. Take some time to observe yourself doing them, and I promise that you'll make some interesting discoveries.

Sunday, September 23, 2018

TDD: What do tests describe?

The second thing I want to highlight is that refactoring does not change the observable behavior of the software. The software still carries out the same function that it did before. Any user, whether an end user or another programmer, cannot tell that things have changed. -- Martin Fowler.
Most TDD kata feature functions as the system under test. Sure, the underlying implementation might be classes, or monads, or whatever the flavor of the month happens to be. But in the Bowling Game, or Fizz Buzz, or Mars Rover, the inputs completely determine the output.

"1", "2", "Fizz"... each question has one and only one right answer. There is precisely one observable behavior that satisfies the requirements.

But that's not generally true - there are a lot of ways that the requirements may not completely constrain the system. For instance, in the fractions kata, unless you introduce a constraint that further restricts the behavior in some way, adding two fractions can produce any of a number of distinguishable denominators.

Systems with "random" behaviors, or hidden information, face this complexity. My approach to the Fischer Chess kata usually involves isolating the random number generator from a function -- but there are still 960 different layouts that could count as row(0).

So - what's really going on?

Friday, September 14, 2018

On aggregates: values, references, and transactions.

Gods help me, I'm thinking about aggregates again.

I think aggregates, and the literature around aggregates, deserve a poor reputation.

Part of the problem is that the early descriptions of the concepts have implicit in them a lot of the assumptions of enterprise solutions written in Java circa 2003.  And these assumptions haven't aged very well - we are writing a lot of software today that has different design considerations.


Sunday, September 9, 2018

TDD: Fractions

Come back tomorrow night. We're gonna do fractions -- Tom Lehrer

Yesterday, I happened upon J. B. Rainsberger's World's Best Intro to TDD.  Series 1 features a demonstration of a fractions kata - what does it look like to produce an implementation of Fraction::add, "one test at a time"?

J. B. follows the usual program of TDD:
  1. Think
  2. Write down examples that may prove interesting
  3. While more interesting examples remain unchecked:
    1. Choose the example that can be implemented with the least work
    2. Make that example pass
But a couple things caught my attention in the first parts of his demonstration; he kept talking about scaffolds, and whether or not it was time to implement equals.  It's off, in a way that I didn't fully grasp yet, so I decided to dig in.

TDD ain't what I thought it was

This weekend, I've been puzzling against some of the tensions in test driven development; I suspect that the idea that just unlocked is going to be useful for a long time.

The high level fog looks like this: TDD is several separable ideas in a matching trench coats.

The primary idea is that testing should be a first class concern in our design.  We demonstrate that our design satisfies the testability requirement by... writing tests.  Or, alternatively, by asserting that our design is so simple that there are obviously no deficiencies.

After that, we start opening paint cans with screw drivers.

A separate idea that gets added in is the idea that we should use tests to specify the behavior of the system.  After all, the system is already easy to test, and executable documentation is cheap to verify.

Furthermore, this practice serves as real feedback on the design -- if we try to specify a behavior, and discover that we can not, in fact, check it, then we can reject the hypothesis that our selected design is in fact testable, and take appropriate action.

An unrelated idea: that we should determine our implementation by introducing the constraints of our specification one at a time, making small changes to the code each time to ensure that all of the introduced constraints are satisfied.

This last idea is further refined by the idea that the small changes fall into a particular pattern: take the implementation used to calibrate the test, remove the duplication, then apply refactoring as necessary to clean up the implementation.

Saturday, September 8, 2018

TDD: Triangulation

Triangulation feels funny to me.  I use it only when I am completely unsure of how to refactor.  If I can see how to eliminate duplication between code and tests and create the general solution, then I just do it.  Why would I need to write another test to give me permission to write what I probably could have written in the first place.

There's a weird symptom in the TDD space - novices don't know when to write their production code.


https://softwareengineering.stackexchange.com/users/8802/johnny
When do you write the “real” code in TDD?
Seb Rose 
The code is now screaming for us to refactor it, but to keep all the tests passing most people try to solve the entire problem at once.
Nat Pryce
When test-driving development with example-based tests, an essential skill to be learned is how to pick each example to be most helpful in driving development forwards in small steps. You want to avoid picking examples that force you to take too big a step (A.K.A. “now draw the rest of the owl”).
I started thinking again about why that is.

Back in the day when I was first learning TDD, I -- and others around me -- seemed to have this weird blind spot: the idea was that introducing tests could move you from one local minima to another in your implementation.

One of the riddles was "how do you get to recursion?"  What kind of test could you introduce that would force you to abandon your preferred form of "simplest thing that could possibly work" in favor of code that was actually correct.

And the answer is almost "you can't".  If we are only constraining behaviors -- the messages into the system, and the messages that we come back out, then of course none of these tests are going to force any specific refactoring -- after all, disciplined refactoring doesn't change the observable behavior.

I say "almost", because property based testing allows you to introduce so many different constraints at the same time, it basically forces you to solve the constraint for an entire range of inputs, rather than a tiny subset.

Back in the day, Kent Beck wrote to me
Do you have some refactoring to do first?
And what I have come to see since then is that I hadn't recognized, and therefore hadn't addressed, the duplication sitting in front of me.

Two Axioms of Unit Testing:
  1. If you have written an isolated test, then it can be expressed as a function, mapping an input message to an output message.
  2. If you have written a function, then the output duplicates the input.
In discussing recursion,  Kent demonstrated that the problem falls apart really quickly if you can refactor the output until the solution is obvious.

factorial(n == 4)
 -> 24
 -> 4 * 6
 -> 4 * factorial(3)
 -> n * factorial(n-1)

JBrains writes of the design dynamo, but I don't think that's what we are doing here.

We started from a failing test.  We hard code "the answer" into the production code, which proves that the test is properly calibrated.  Now we can wire up the outputs to the inputs -- if that changes the observable behavior, the tests will catch an error.  When the trivial duplication is removed, we can then assess what our next move should be.

The bowling game kata could start out this way - Uncle Bob began by specifying a gutter game; twenty rolls into the gutter should produce a score of zero.

And if you look carefully at what the score is, it's just the sum of the inputs.  So you make that replacement, and ta-da, the test passes, the duplication is gone, and you can think about whether or not there are abstractions worth discovering in the design.

Uncle Bob, of course, didn't do this -- he skips directly to testAllOnes without addressing this duplication, and then applies this change later.

And I wonder if that set the tone -- that there was some minimum number of pathological solutions that needed to be attempted before you can actually start doing something real.

Kent Beck
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence
An implementation that adds up bowling pins obviously has no deficiencies; all the lines of code are covered, we tried the API that we have defined and can evaluate its suitability in our context.

We don't need MOAR TESTS to achieve these ends.

In other words, the introduction of other tests are satisfying some other need, and we should be thinking about that need, and where it fits into our development process.


Now, I think it's pretty clear that a single example is not an honest representation of the complete specification.  But if that's the goal, deliberately introducing a bunch of broken implementations to ensure that the specification is satisfied is an expensive way to achieve that result.



Monday, August 27, 2018

TDD: Writing Bad Code

David Tanzer writes of his students asking why they are supposed to write bad code.

He attributes the phenomena as a consequence of the discipline of taking small steps.  Perhaps that is true, but I think there is a more exact explanation for the symptoms that he describes.

After creating a failing test, our next action is to modify the test subject such that the new test passes.  This is, of course, our red/green transition.  Because green tests give us a measure of security against various common classes of mistakes, we want to transition out of the deliberately red state as quickly as possible. 

Quickly, here, is measured in wall clock time, but I think one could reasonably argue that what we really mean is smallest number of code edits.

But I think we get lost in the motivation for the action -- although the edits that take us from red to green are in the implementation of the test subject, the motivation for the action is still the test.  What we are doing is test calibration: demonstrating to ourselves that (in this moment) the test actually measures the production code.

At the point when our test goes green, we don't have a "correct" implementation of our solution.  What we have is a test that constrains the behavior of the solution.

Now we have the option of changing the production implementation to something more satisfactory, confident that if a mistake changes the behavior of our test subject, that the test will notify us when it is next run.  If the tests are fast, then we can afford to run them even as frequently as after each edit, reducing the interval between introducing a mistake and discovering it.

To some degree, I think the real code comes after test calibration.  After the test is calibrated, we can start worrying about out code quality heuristics - transforming the implementation of the test subject from one that meets the minimum standard to one that would survive a code review.

Sunday, August 19, 2018

REST: Fielding 6.5.2

Reviewing Fielding's thesis yet again, I noticed a sentence that I think has not gotten as much exposure as it should

What makes HTTP significantly different from RPC is that the requests are directed to resources using a generic interface with standard semantics that can be interpreted by intermediaries almost as well as by the machines that originate services. -- Fielding
Emphasis added.

If I'm doing it right, I can put a commodity cache in front of my origin server, and it will be able to do useful work because it understands the semantics of the meta data in the HTTP traffic.  The cache doesn't need any special knowledge about my domain, or the payloads being exchanged.

Wednesday, August 15, 2018

Ian Cooper on Aggregates

I'm pretty impressed by the way that Ian Cooper describes "aggregates", and decided to capture his description in the hopes that I can keep it at the forefront of my own thinking

Aggregates in DDD are a way of doing a coarse-grained lock.

Some assertions:
An entity is a row in a relational table i.e. has an unique id
A value type is one or more columns in a row in one or more relational tables.

In order to update an entity I lock its row to ensure ACID properties.
This can scale badly if I need to lock a lot of entitles as we get page and table lock escalation.

If the problem is parent-child e.g. an order and order lines, I could lock the parent row, and not the children to avoid table lock escalation. To make this work, my code has to enforce the rule that no child rows can be accessed, without first taking a lock on the parent entity row.
So my repository needs to enforce a strategy similar to 'lock parent for update' if we succeed, then allow modification of parent and children.
At scale, you may want to turn off table lock escalation on the children at this point. (DANGER WILL ROBINSON, DANGER, DANGER). Because you don't want lock escalation when you lock the object graph.

Aggregates pre-date event sourcing and NoSQL, so its easiest to understand the problem in relational DBs that they were intended to solve.
This is the reason why you don't allow pointers to children, all access has to go through the parent, which must be locked
Usually I don't store anything apart from the ID on the other entity for the root, because I want you to load via the repo, which does the lock for update, give you an object if required
You can also use a pessimistic lock, if you want to report the cause of collisions to a user

Rice and Foemmel, in Patterns of Enterprise Application Architecture, write
Eric Evans and David Siegel define an aggregate as a cluster of associated objects that we treat as a unit for data changes. Each aggregate has a root that provide the only access point to members of the set and a boundary that defines what's included in the set. The aggregate's characteristics call for a Coarse-Grained Lock, since working with any of its members requires locking all of them. Locking an aggregate yields an alternative to a shared lock that I call a root lock. By definition locking the root locks all members of the aggregate. The root lock gives a single point of contention.
To my mind, there are really two distinct ideas in the Evans formulation of the aggregate
  1. we have a lock around a set of data that describes one or more entities
  2. the expression of that lock is implicit in one of the domain entities (the "aggregate root") within that set
To be completely honest, I'm still not entirely convinced that mapping rows to "entities" is the right idea -- rows on disk look to me more like values (state) than entities (behavior).

Finally - I still feel that there isn't enough literature describing change: what are the degrees of freedom supported by this family of designs, do they make common changes easy? do they make rarer changes possible?

Monday, July 16, 2018

REST, Resources, and Flow

It occurred to me today that there are two important ideas in designing a REST API.

The more familiar of the two is the hypermedia constraint.  The client uses a bookmark to load a starting point, and from there navigates the application protocol by examining links provided by the server, and choosing which link to follow.

That gives the server a lot of control, as it can choose which possible state transitions to share with the client.  Links can be revealed or hidden as required; the client is programmed to choose from among the available links.  Because the server controls the links, the server can choose the appropriate request target.  The client can just "follow its nose"; which is to say, the client follows a link by using a request target selected by the server.

The second idea is that the request targets proposed by the server are not arbitrary.  HTTP defines cache invalidation semantics in RFC 7234.   Successful unsafe requests invalidate the clients locally cached representation of the request target.

A classic example that takes advantage of this is collections: if we POST a new item to a collection resource, then the cached representation of the collection is automatically invalidated if the server's response indicates that the change was successful.

Alas, there is an asymmetry: when we delete a resource, we don't have a good way to invalidate the collections that it may have belonged to.

Thursday, June 14, 2018

CQRS Meetup

Yesterday's meetup of the Boston DDD/CQRS/ES group was at Localytics, and featured a 101 introduction talk by James Geall, and a live coding exercise by Chris Condon.
CQRS is there to allow you to optimize the models for writing and reading separately.  NOTE: unless you have a good reason to pay the overhead, you should avoid the pattern.
 James also noted that good reasons to pay the overhead are common.  I would have liked to hear "temporal queries" here - what did the system look like as-at?

As an illustration, he described possibilities for tracking stock levels as a append only table of changes and a roll-up/view of a cached result.  I'm not so happy with that example in this context, because it implies a coupling of CQRS to "event sourcing".  If I ran the zoo, I'd probably use a more innocuous example: OLTP vs OLAP, or a document store paired with a graph database.

The absolute simplest example I've been able to come up with is an event history; the write model is optimized for adding new information to the end of the data structure as it arrives. In other words, the "event stream" is typically in message order; but if we want to show a time series history of those events, we need to _sort_ them first.  We might also change the underlying data structure (from a linked list to a vector) to optimize for other search patterns than "tail".
Your highly optimized model for "things your user wants to do" is unlikely to be optimized for "things your user wants to look at".
This was taken from a section of James's presentation explaining why the DDD/CQRS/ES tuple appear together so frequently.  He came back to this idea subsequently in the talk, when responding to some confusion about the read and write models

You will be doing roll ups in the write model for different reasons than those which motivate the roll ups in the read model.
A lot of people don't seem to realize that, in certain styles, the write model has its own roll ups.  A lot of experts don't seem to realize that there is more than one style -- I tried to give a quick calca on an alternative style at the pub afterwards, but I'm not sure how well I was able to communicate the ideas over the background noise.

The paper based contingency system that protects the business from the software screwing up is probably a good place to look for requirements.
DDD in a nut shell, right there.

That observation brings me back to a question I haven't found a good answer to just yet: why are we rolling our own business process systems, rather than looking to the existing tooling for process management (Camunda, Activiti and the players in the iBPMS Magic Quadrant)?  Are getting that much competitive advantage from rolling our own?

Event sourcing gives you a way to store the ubiquitous language - you get release from the impedance mismatch for free.  A domain expert can look at a sequence of events and understand what is going on.
A different spelling of the same idea - the domain expert can look at a given set of events, and tell you that the information displayed on the roll up screen is wrong.  You could have a field day digging into that observation: for example, what does that say about UUID appearing in the event data?

James raised the usual warning about not leaking the "internal" event representations into the public API.  I think as a community we've been explaining this poorly - "event" as a unit of information that we use to reconstitute state gets easily confused with "event" as a unit of information broadcast by the model to the world at large.

A common theme in the questions during the session was "validation"; the audience gets tangled up in questions about write model vs read model, latency, what the actual requirements of the business are, and so on.

My thinking is that we need a good vocabulary of examples of different strategies for dealing with input conflicts.  A distributed network of ATM machines; both in terms of the pattern of a cash disbursement, and also reconciling the disbursements from multiple machines when updating the accounts.  A seat map on airline, where multiple travelers are competing for a single seat on the plane.

Chris fired up an open source instance of Event Store, gave a quick tour of the portal, and then started a simple live coding exercise: a REPL for debits and credits, writing changes to the stream, and then then reading it back.  In the finale, there were three processes sharing data - two copies of the REPL, and the event store itself.

The implementation of the logic was based on the Reactive-Domain toolkit; which reveals its lineage, as it is an evolution of ideas acquired from working with Jonathan Oliver's Common-Domain and with Yves Reynhout, who maintains AggregateSource.

It's really no longer obvious to me what the advantage of that pattern is; it always looks to me as though the patterns and the type system are getting in the way.  I asked James about this later, and he remarked that no, he doesn't feel much friction there... but he writes in a slightly different style.  Alas, we didn't have time to explore further what that meant.

Sunday, June 10, 2018

Extensible message schema

I had an insight about messages earlier this week, one which perhaps ought to have been obvious.  But since I have been missing it, I figured that I should share.

When people talk about adding optional elements to a message, default values for those optional elements are not defined by the consumer -- they are defined by the message schema.

In other words, each consumer doesn't get to choose their own preferred default value.  The consumer inherits the default value defined by the schema they are choosing to implement.

For instance, if we are adding a new optional "die roll" element to our message, then consumers need to be able to make some assumption about the value of that field when it is missing.

But simply rolling a die for themselves is the "wrong" answer, in the sense that it isn't repeatable, and different consumers will end up interpreting the evidence in the message different ways.  In other words, the message isn't immutable under these rules.

Instead, we define the default value in the schema - documenting that the field is xkcd 221 compliant; just as every consumer that understands the schema agrees on the semantics of the new value, they also agree on the semantic meaning of the new value's absence.


If two consumers "need" to have different default values, that's a big hint that you may have two subtly different message elements to tease apart.

These same messaging rules hold when your "message" is really a collection of parameters in a function call.  Adding a new argument is fine, but if you aren't changing all of the clients at the same time then you really should continue to support calls using the old parameter list.

In an ideal world, the default value of the new parameter won't surprise the old clients, by radically changing the outcome of the call.

To choose an example, suppose we've decided that some strategy used by an object should be configurable by the client.  So we are going to add to the interface a parameter that allows the client to specify the implementation of the strategy they want.

The default value, in this case, really should be the original behavior, or it semantic equivalent.


Wednesday, June 6, 2018

Maven Dependency Management and TeamCity

A colleague got bricked when an engineer checked in a pom file where one of the dependencyManagement entries was missing a version element.

From what I see in the schema, the version element is minOccurs=0, so the pom was still valid.

Running the build locally, the build succeeded.  A review of the dependency:tree output was consistent with an unmanaged dependency -- two of the submodules showed different resolved versions (via transitive dependencies).

Running the build locally, providing the version element, we could see the dependencies correctly managed to the same version in each of the submodules.

But in TeamCity?  Well, the version of the project with the corrected pom built just fine.  Gravity still works.  But the bricked pom, produced this stack trace.


[18:28:22]W:  [Step 2/5] org.apache.maven.artifact.InvalidArtifactRTException: For artifact {org.slf4j:slf4j-log4j12:null:jar}: The version cannot be empty.
 at org.apache.maven.artifact.DefaultArtifact.validateIdentity(DefaultArtifact.java:148)
 at org.apache.maven.artifact.DefaultArtifact.(DefaultArtifact.java:123)
 at org.apache.maven.bridge.MavenRepositorySystem.XcreateArtifact(MavenRepositorySystem.java:695)
 at org.apache.maven.bridge.MavenRepositorySystem.XcreateDependencyArtifact(MavenRepositorySystem.java:613)
 at org.apache.maven.bridge.MavenRepositorySystem.createDependencyArtifact(MavenRepositorySystem.java:120)
 at org.apache.maven.project.DefaultProjectBuilder.initProject(DefaultProjectBuilder.java:808)
 at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:617)
 at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:405)
 at org.apache.maven.DefaultMaven.collectProjects(DefaultMaven.java:663)
 at org.apache.maven.DefaultMaven.getProjectsForMavenReactor(DefaultMaven.java:654)
 at sun.reflect.GeneratedMethodAccessor1975.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl$4.run(MavenEmbedderImpl.java:447)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl.executeWithMavenSession(MavenEmbedderImpl.java:249)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl.readProject(MavenEmbedderImpl.java:430)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl.readProjectWithModules(MavenEmbedderImpl.java:336)
 at jetbrains.maven.MavenBuildService.readMavenProject(MavenBuildService.java:732)
 at jetbrains.maven.MavenBuildService.sessionStarted(MavenBuildService.java:206)
 at jetbrains.buildServer.agent.runner2.GenericCommandLineBuildProcess.start(GenericCommandLineBuildProcess.java:55)


It looks to me like some sort of pre-flight check by the MavenBuildService before it surrenders control to maven.

If I'm reading the history correctly, the key is https://issues.apache.org/jira/browse/MNG-5727

That change went into 3.2.5; TeamCity's mavenPlugin (we're still running 9.0.5, Build 32523) appears to be using 3.2.3.

A part of what was really weird? Running the build on the command line worked; In my development environment, I had been running 3.3.9. So I had this "fix"; and everything was groovy. When I sshed into the build machine, I was running... 3.0.4. Maybe that's too early for the bug to appear? Who knows - I hit the end of my time box.

If you needed to read this... good luck.

Thursday, May 31, 2018

Bowling Kata v2

Inspired by

Part One

Produce a solution to the bowling game kata using your favorite methods.

Part Two

Changes in bowling alley technology are entering the market; the new hardware offers cheaper, efficient, environmentally friendly bowling.

But there's a catch... the sensors on the new bowling alleys don't detect the number of pins knocked down by each roll, but instead the number that remain upright.

Without making any changes to the existing tests: create an implementation that correctly scores the games reported by the new bowling alleys.

Examples

A perfect game still consists of twelve throws, but when the ball knocks down all of the pins, the sensors report 0 rather than 10.

Similarly, a "gutter game" has twenty throws, and the sensors report 10 for each throw.

A spare where 9 pins are first knocked down, and then 1, is reported as (1,0).

Part Three

The advanced bonus round

Repeat the exercise described in Part Two, starting from an implementation of the bowling game kata that is _unfamiliar_ to you.

Update

I found Tim Riemer's github repository, and forked a copy to use as the seed for part three.  The results are available on github: https://github.com/DanilSuits/BowlingGameKataJUnit5

Thursday, May 24, 2018

enforce failed: Could not match item

The Symptom

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce-pedantic-rules) on project devjoy-security-kata: Execution enforce-pedantic-rules of goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce failed: Could not match item :maven-deploy-plugin:2.8.2 with superset -> [Help 1]

The Stack Trace

org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce-pedantic-rules) on project devjoy-security-kata: Execution enforce-pedantic-rules of goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce failed: Could not match item :maven-deploy-plugin:2.8.2 with superset
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
 at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
 at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
 at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
 at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
 at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
 at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
 at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.PluginExecutionException: Execution enforce-pedantic-rules of goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce failed: Could not match item :maven-deploy-plugin:2.8.2 with superset
 at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:145)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
 ... 20 more
Caused by: java.lang.IllegalArgumentException: Could not match item :maven-deploy-plugin:2.8.2 with superset
 at com.github.ferstl.maven.pomenforcers.model.functions.AbstractOneToOneMatcher.handleUnmatchedItem(AbstractOneToOneMatcher.java:63)
 at com.github.ferstl.maven.pomenforcers.model.functions.AbstractOneToOneMatcher.match(AbstractOneToOneMatcher.java:55)
 at com.github.ferstl.maven.pomenforcers.PedanticPluginManagementOrderEnforcer.matchPlugins(PedanticPluginManagementOrderEnforcer.java:142)
 at com.github.ferstl.maven.pomenforcers.PedanticPluginManagementOrderEnforcer.doEnforce(PedanticPluginManagementOrderEnforcer.java:129)
 at com.github.ferstl.maven.pomenforcers.CompoundPedanticEnforcer.doEnforce(CompoundPedanticEnforcer.java:345)
 at com.github.ferstl.maven.pomenforcers.AbstractPedanticEnforcer.execute(AbstractPedanticEnforcer.java:45)
 at org.apache.maven.plugins.enforcer.EnforceMojo.execute(EnforceMojo.java:202)
 at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
 ... 21 more

The Cause

<plugin>
    <artifactId>maven-deploy-plugin</artifactId>
    <version>2.8.2</version>
</plugin>

The Cure

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-deploy-plugin</artifactId>
    <version>2.8.2</version>
</plugin>

The Circumstances

I had created a new maven project in Intellij IDEA 2018.1. That produced a build/pluginManagement/plugins section where the plugin entries did not include a groupId element. Maven (3.3.9), as best I could tell, didn't have any trouble with it; but the PedanticPluginManagementOrderEnforcer did.

My best guess at this point is that there is some confusion about the plugin search order

But I'm afraid I'm at the end of my time box. Good luck, DenverCoder9.

Friday, May 11, 2018

Tests, Adapters, and the lifecycle of an API Contract

The problem that I faced today was preparing for a change of an API; the goal is to introduce new interfaces with new spellings that produce the same observable behaviors as the existing code.

Superficially, it looks a bit like paint by numbers.  I'm preparing for a world where I have two different implementations to exercise with the same behavior, ensuring that the same side effects are measured at the conclusion of the automated check.

But the pattern of the setup phase is just a little bit different.  Rather than wiring the automated check directly to the production code, we're going to wire the check to an adapter.


The basic principles at work are those of a dependency injection friendly framework, as described by Mark Seemann.
  • The client owns the interface
  • The framework is the client
In this case, the role of the framework is played by the scenario, which sets up all of the mocks, and verifies the results at the conclusion of the test.  The interface is a factory, which takes a description of the test environment and returns an instance of the system under test.

That returned instance is then evaluated for correctness, as described by the specification.

Of course, if the client owns the interface, then the production code doesn't implement it -- the dependency arrow points the wrong direction.

We beat this with an adapter; the automated check serves as a bridge between the scenario specification and a version of the production code.  In other words, the check stands as a demonstration that the production code can be shaped into something that satisfies the specification.

This pattern gives us a reasonably straight forward way to push two different implementations through the same scenario, allowing us to ensure that the implementation of the new api provides equivalent capabilities to its predecessor.

But I didn't discover this pattern trying to solve that problem...

The problem that I faced was that I had two similar scenarios, where the observable outcome was different -- the observable behavior of the system was a consequence of some configuration settings. Most of my clients were blindly accepting the default hints, and producing the normal result. But in a few edge cases, a deviation from the default hints produced a different result.

The existing test suite was generally soft on this scenario. My desired outcome was two fold -- I wanted tests in place to capture the behavior specification now, and I wanted artifacts that would demonstrate that the edge case behavior needed to be covered in new designs.

We wouldn't normally group these two cases together like this. We're more likely to have a suite of tests ensuring that the default configuration satisfies its cases, and that the edge case configuration satisfies a different suite of results.

We can probably get closer to the desired outcome by separating the scenario and its understanding of ExpectedResult from the specifications.

And likewise for the edge case.

In short, parallel suites with different expected results in shared scenarios, with factory implementations that are bound to a specific implementation.

The promise (actually, more of a hope) is that as we start moving the api contracts through their life cycles -- from stable to legacy/deprecated to retired -- we will along the way catch that there are these edges cases that will need resolution in the new contracts.  Choose to support them, or not, but that choice should be deliberate, and not a surprise to the participants.

Tuesday, May 1, 2018

Ruminations on State

Over the weekend, I took another swing at trying to understand how boundaries should work in our domain models.

Let's start with some assumptions.

First, we capture information in our system because we think it is going to have value to us in the future.  We think there is profit available from the information, and therefore we capture it.  Write only databases aren't very interesting, we expect that we will want to read the data later.

Second, that for any project successful enough to justify additional investment, we are going to know more later than we do today.

Software architecture is those decisions which are both important and hard to change.
We would like to defer hard to change decisions as late as possible in the game.

One example of such a hard decision would be carving up information into different storage locations.  So long as our state is ultimately guarded by a single lock, we can experiment freely with different logical arrangements of that data and the boundaries within the model.  But separating two bounded sets of information into separate storage areas with different locks, then discovering the logical boundaries are faulty makes a big mess.

Vertical scaling allows us to concentrate on the complexity of the model first, with the aim of carving out the isolated, autonomous bits only after we've accumulated several nines of evidence that it is going to work out as a long term solution.

Put another way, we shouldn't be voluntarily entering a condition where change is expensive until we are driven there by the catastrophic success of our earlier efforts.

With that in mind, let's think about services.  State without change is fundamentally just a cache.   "Change" without state is fundamentally just a function.  The interesting work begins when we start combining new information with information that we have previously captured.

I find that Rich Hickey's language helps me to keep the various pieces separate.  In easy cases, state can be thought of as a mutable reference to a value S.  The evolution of state over time looks like a sequence of updates to the mutable reference, where some pure function calculates new values from their predecessors and the new information we have obtained, like so


Now, this is logically correct, but it is very complicated to work with. op(), as shown here, is made needlessly complicated by the fact that it is managing all of the state S. Completely generality is more power than we usually need. It's more likely that we can achieve correct results by limiting the size of the working set. Generalizing that idea could look something like


The function decompose and its inverse compose allow us to focus our attention exclusively on those parts of current state that are significant for this operation.

However, I find it more enlightening to consider that it will be convenient for maintenance if we can re-use elements of a decomposition for different kinds of messages. In other words, we might instead have a definition like


In the language of Domain Driven Design, we've identified re-usable "aggregates" within the current state that will be needed to correctly calculate the next change, while masking away the irrelevant details. New values are calculated for these aggregates, and from them a new value for the state is calculated.

In an object oriented domain model, we normally see at least one more level of indirection - wrapping the state into objects that manage the isolated elements while the calculation is in progress.


In this spelling, the objects are mutable (we lose referential transparency), but so long as their visibility is limited by the current function the risks are manageable.

Sunday, April 22, 2018

Spotbugs and the DDD Sample application.

As an experiment, I decided to run spotbugs against my fork of the DDDSample application.

For the short term, rather than fixing anything in the first pass, I decided to simply suppress the warnings to address them later.

   9 @SuppressFBWarnings("EI_EXPOSE_REP")
   8 @SuppressFBWarnings("EI_EXPOSE_REP2")
 These refer to the fact that java.util.Date isn't an immutable type; exchanging a reference to a Date means that the state of the domain model could be changed from the outside.  Primitive values and immutable types like java.lang.String.

6 @SuppressFBWarnings("SE_BAD_FIELD")
This warning caught my attention, because it flags a pattern that I was already unhappy with in the implementation: value types that hold a reference to an entity.  Here, the ValueObject abstraction is tagged as Serializable, but the entities are not, and so it gets flagged.

That the serialization gets flagged is just a lucky accident.  The bigger question to my mind is whether or not nesting a entity within a value is an anti-pattern.  In many cases, you can do as well by capturing the identifier of the entity, rather than the entity itself.

Those two issues alone cover about 75% of the issues flagged by spotbugs.

Friday, March 30, 2018

TDD: Tales of the Fischer King

Chess960 is a variant of chess introduced by Bobby Fischer in the mid 1990s. The key idea is that each players' pieces, rather than being assigned to fixed initial positions on the home rank, are instead randomized, subject to the following constraints
  • The bishops shall be placed on squares of opposite colors
  • The king shall be positioned between the two rooks.
There are 960 different positions that satisfy these constraints.

In December of 2002, Peter Seibel proposed :
Here's an interesting little problem that I tried to tackle with TDD as an exercise....  Write a function (method, procedure, whatever) that returns a randomly generated legal arrangement of the eight white pieces.
Looking back that the discussion, what surprises me is that we were really close to a number of ideas that would become popular later:
  • We discuss property based tests (without really discovering them)
  • We discuss mocking the random number generator (without really discovering that)
  • We get really very close to the idea that random numbers, like time, are inputs
I decided to retry the exercise again this year, to explore some of the ideas that I have been playing with of late.

Back in the day, Red Green Refactor, was understood as a cycle; but these days, I tend to think of it as a protocol.  Extending an API is not the same flavor of Red as putting a new constraint on the system under test, the Green in the test calibration phase is not the same as the green after refactoring, and so on.

I tend to lean more toward modules and functions than I did back then; our automated check configures a specification that the system under test needs to satisfy.  We tend to separate out the different parts so that they can be run in parallel, but fundamentally it is one function that accepts a module as an argument and returns a TestResult.

An important idea is that the system under test can be passed to the check as an argument.  I don't feel like I have the whole shape of that yet, but the rough idea is that once you are inside the check, you are only talking to an API.  In other words, the checks should be re usable to test multiple implementations of the contract.  Of course, you have to avoid a naming collision between the check and the test.

One consequence of that approach is that the test suite can serve the role of an acceptance test if you decide to throw away the existing implementation and start from scratch.  You'll need a new set of scaffolding tests for the new implementation to drive the dynamo.  Deleting tests that are over fitting a particular implementation is a good idea anyway.

One of the traps I fell into in this particular iteration of the experiment: using the ordering bishops-queens-knights is much easier to work with than attacking the rook king rook problem.  I decided to push through it this time, but I didn't feel like it progressed nearly as smoothly as it did the first time through.

There's hidden duplication in the final iteration of the problem; the strategies that you use for placing the pieces are tightly coupled to the home row.  In this exercise, I didn't even go so far as encapsulating the strategies.

Where's the domain model?  One issue with writing test first is that you are typically crossing the boundary between the tests and the implementation; primitives are the simplest thing that could possibly work, as far as the interface is concerned.

Rediscovering that simplest thing was originally a remedy for writer's block was a big help in this exercise.  If you look closely at the commits, you'll see a big gap in the dates as I looked for a natural way to implement the code that I already had in mind.  A more honest exercise might have just wandered off the rails at that point.

I deliberately put down my pencil before trying to address the duplication in either the test or the implementation.  J. B. Rainsberger talks about the power of the TDD Dynamo; but Sandi Metz right points out that keeping things simple leaves you extremely well placed to make the next change easy.

During the exercise, I discovered that writing the check is a test, in the sense described by James Bach and Michael Bolton.
Testing is the process of evaluating a product by learning about it through exploration and experimentation
This is precisely what we are doing during the "Red" phase; we are exploring our candidate API, trying to evaluate how nice it is to work with.  The implementation of the checks can later stand in as an example of how to consume the API.

The code artifacts from this exercise, and the running diary, are available on Github.



Sunday, March 11, 2018

TDD in a sound bite

Test Driven Development is a discipline that asserts that you should not implement functionality until you have demonstrated that you can document the requirements in code.

Monday, February 5, 2018

Events in Evolution

During a discussion today on the DDD/CQRS slack channel, I remembered Rinat Abdullin's discussion of evolving process managers, and it led to the following thinking on events and boundaries.

Let us begin with out simplified matching service; the company profits by matching buyers and sellers.  To begin, the scenario we will use is that Alice, Bob, and Charlie place sell orders, and David places a buy order.  When our domain expert Madhav receives this information, he uses his knowledge of the domain and recognizes that David's order should be matched with Bob's.

Let's rephrase this, using the ideas provided by Rinat, and focusing on the design of Madhav's interface to the matching process.  Alice, Bob, Charlie, and David have placed their orders.  Madhav's interface at this point shows the four unmatched orders; he reviews them, decides that Bob and David's orders match, and sends a message describing this decision to the system.  When that decision reaches the projection, it updates the representation, and now shows two unmatched orders from Alice and Charlie, and the matched orders from Bob and David.

Repeating the exercise: if we consider the inputs of the system, then we see that Madhav's decision comes after four inputs.

So the system uses that information to build Madhav's view of the system. When Madhav reports his decision to the system, it rebuilds his view from five inputs:

With all five inputs represented in the view, Madhav can see his earlier decision to match the orders has been captured and persisted, so he doesn't need to repeat that work.

At this point in the demonstration, we don't have any intelligence built into the model. It's just capturing data; Udi Dahan might say that all we have here is a database. The database collects inputs; Madhav's interface is built by a simple function - given this collection of inputs, show that on the screen.

Now we start a new exercise, following the program suggested by Rinat; we learn from Madhav that within the business of matching, there are a lot of easy cases where we might be able to automate the decision. We're not trying to solve the whole problem yet; in this stage our ambitions are very small: we just want the interface to provide recommendations to Madhav for review. Ignore all the hard problems, don't do anything real, just highlight a possibility. We aren't changing the process at all - the inputs are the same that we saw before.

We might iterate a number of times on this, getting feedback from Madhav on the quality of the recommendations, until he announces that the recommendations are sufficiently reliable that they could be used to reduce his workload.

Now we start a new exercise, where we introduce time as an element. If there is a recommended match available, and Madhav does not override the recommendation within 10 seconds, then the system should automatically match the order.

We're introducing time into the model, and we want to do that with some care. In 1998, John Carmack told us
If you don't consider time an input value, think about it until you do -- it is an important concept


This teaches us that we should be thinking about introducing a new input into our process flow

Let's review the example, with this in mind. The orders arrive as before, and there are no significant changes to what we have seen before

But treating time as an input introduces an extra row before the match is made

While that seems simple enough, something arbitrary has crept in. For example, why would time be an input in only 10 second bursts?

Or perhaps it's a better separation of concerns to use a scheduler

And we start to notice that things are getting complicated; what's worse, they are getting complicated in a way that Madhav didn't care about when he was doing the work by hand.

What's happening here is that we've confused "inputs" with "things that we should remember". We need to remember orders, we need to remember matches -- we saw that when Madhav was doing the work. But we don't need to remember time, or scheduling; those are just plumbing constructs we introduced to allow Madhav to intercede before the automation made an error.

Inputs and thing we should remember were the same when our system was just a database. No, that's not quite right; they weren't the same, they were different things that looked the same. They were different things that happened to have the same representation because all of the complicated stuff was outside of the system. They diverged when we started making the system more intelligent.

With this revised idea, we have two different ways of thinking about the after situation; we can consider its inputs

Or instead we can think about its things to remember

"Inputs" and "Things to Remember" are really really lousy names, and those spellings don't really lend themselves well to the ubiquitous language of domain modeling. To remain in consistent alignment with the literate, we should use the more common terminology: commands and events.

In the design described so far, we happen to have an alignment of commands and events that is one to one. To illustrate, we can think of the work thus far as an enumeration of database transactions, that look something like:

In the next exercise, consider what would happen if we tried to introduce as an invariant that there should be no easily matched items (following the rules we were previously taught by Madhav) left unmatched. In other words, when David's order arrives, it should be immediately matched with Bob's, rather than waiting 10 seconds. That means that one command (the input of David's order) produces two different events; one to capture the observation of David's order, the other to capture the decision made by the automation as a proxy for Madhav.

The motivation for treating these as two separate events is this: it most closely aligns with the events we were generating when Madhav was making all of the decisions himself. Whether we use Madhav in the loop making the decisions, or simply reviewing scheduled decisions, or leaving all of the decisions to the automation, the captured list of events is the same. That in turn means that these different variations in implementation here do not impact the other implementations at all. We're limiting the impact of the change by ensuring the the observable results are consistent in all three cases.

Tuesday, January 16, 2018

Events are messages that describe state, not behavior

I have felt, for some time now, that the literature explains event sourcing poorly.

The basic plot, that current state is just a left fold over previous behaviors, is fine, so far as it goes.  But it rather assumes that the audience is clear on what "previous behaviors" means.

And I certainly haven't been.

In many, perhaps even most, domain models can be thought of as state machines:

Cargo begins its life cycle when it is booked, and and our delivery document changes state when the cargo is received, and when an itinerary is selected, it gets loaded in port, the itinerary changes, each of these messages arrives and changes the state of the delivery document.

So we can think of the entire process as a state machine; each message from the outside world causes the model to follow a transition edge out of one state node and into another.

If we choose to save the state of the delivery document, so that we can resume work on it later, there are three approaches we might take
  • We could simply save the entire delivery document as is
  • We could save the sequence of messages that we received
  • We could save a sequence of patch documents that describe the differences between states
Event sourcing is the third one.

We call the patches "events", and they have domain specific semantics; but they are fundamentally dumb documents that decouple the representation of state from the model that generated it.

This decoupling is important, because it allows us to change the model without changing the semantics of the persisted representation.

To demonstrate this, let's imaging a simple trade matching application.  Buy and Sell orders come in from different customers, and the model is responsible for pairing them up.  There might be elaborate rules in place for deciding how matches work, but to save the headache of working them out we'll instead focus our attention on a batch of buy and sell orders that can be paired arbitrarily -- the actual selects are going to be determined by the model's internal tie breaker strategy.

So we'll image that a new burst of messages appear, at some new price -- we don't need to worry about any earlier orders.  The burst begins...


After things have settled down, we restart the service. That means that the in memory state is lost, as has to be recovered from what was written to the persistent store. We now get an additional burst of messages.


Using a first in, first out tiebreaker, we would expect to see pairs (A,1), (B,2), (C,3), and (D,4). If we were using a last in, first out tiebreaker, we would presumably see (D,1), (E,2), (C,3), (B,4).

But what happens if, during the shutdown, we switch from FIFO to LIFO? During the first phase, we see (A,1) matched, as before. After that, we should see (D,2), (E,3), (C,4).

In order to achieve that outcome, the model in the second phase needs knowledge of the (A,1) match before the shutdown. But it can only know about that match if there is evidence of the match written to the persistent store. Without that knowledge, the LIFO strategy would expect that (D,1) were already matched, and would in turn produce (C,2), (E, 3), and (A,4). The last of these conflicts with the original (A,1) match. In other words, we're in a corrupted state.

Writing the entire document to the event store works just fine, we read a representation that suggests that A and 1 are unavailable, and the domain model can proceed accordingly. Writing the sequence of patches works, because when we fold the patches together we get the state document. It's only the middle case, where we wrote out representations that implied a particular model, that we got into trouble.

The middle approach is not wrong, by any means. The LMAX architecture worked this way; they would write the input messages to a journal, and in the event of a failure they would recover by starting a copy of the same model. The replacement of the model behavior happened off hours.

Similarly, if you have the list of inputs and the old behavior, you can recover current state in memory, and then write out a representation of that state that will allow a new model to pick up from where the old left off.

Not wrong, but different. One approach or the other might be more suitable for your unique collection of operational constraints.