Tuesday, November 6, 2018

Ayende: Fear of an Empty Source File

And yet, probably the hardest thing for me is to start writing from scratch. If there is no code already there, it is all too easy to get lost in the details and not actually be able to get anywhere.  -- Oren Eini
This paralysis is the same thing I try to attack with simplest thing that could possibly work.

Dynamic friction encourages more progress than static friction.

Saturday, October 27, 2018

REST, Command Messages and URI Design

I posted the examples above into a discussion about how to send command messages to a domain model.

The domain model is the inner core of the onion. In particular, it shouldn't be directly coupled to HTTP.

The client sends one HTTP request to a single socket, and that the listener on that socked translates the HTTP Request into a message that the domain model understands. There's no particular advantage, from the point of view of the domain model, that some of the information is provided in the message body, and some in the headers, when it is all being transformed before it reaches the core domain logic.

Consider, for example, a form in a web page. It is a perfectly straight forward exercise to ensure that the application/x-www-form-urlencoded representation of the message is complete by adding a few hidden fields that the consumer doesn't need to worry about. When this is done, the particular URL used in the action really doesn't matter.

This is generally true: the resource identifier doesn't have to support the domain, because we have the message body for that.

The target resource identified by the request line becomes a free variable.

In HTTP, GET doesn't have a message body. When we request a resource from the server, any parameters that the server requires to discriminate between resources needs to be included in the identifier itself. Because HTTP is designed for large-grain hypermedia transfer, there are provisions for the server to provide to the client the appropriate caching semantics.

The real significance of the URI for POST is this -- it gives us the means to invalidate the representations of a specific resource from the client's cache.

Saturday, October 20, 2018

Wumpus RNG and the functional core

I've been experimenting with refactoring the wumpus, and observed an interesting interaction with the functional core.

For illustration, let's consider the shooting of an arrow.  After fighting a bit with the console, the player submits a firing solution.  This, can pass along to the game module, but there are two additional twists to consider. 

First: if the firing solution requires a tunnel between rooms that doesn't exist, the arrow instead travels randomly -- and might end up bouncing several more times. 

Second: if the arrow should miss, the wumpus may move to another room.  Again, randomly.

In my refactorings, the basic shape of the method is a function of three arguments: the firing solution, and some source of "random" numbers.  At this abstraction level, shooting the arrow looks to be atomic, but in some scenarios the random numbers are consumed.

I took some time today to sketch the state machine, and a slightly different picture emerges.  The game is Hunting, and invoking the shootArrow method takes the game to some other state; an incorrectly specified firing solution takes us to a BouncingArrow state, and invoking the bounceArrow method.  When the arrow has finished bouncing, we may end up in the ArrowMissed state, in which case we wakeWumpus, and discover whether or not the hunter has met his end.

The internal core knows nothing about sources of data - no command line, no random numbers, just computing transitions from one state to the next. This means that it is very easy to test, everything just sits in memory. The orchestration logic gets to worry about where the data comes from, but is completely decoupled from the core logic of the game.

Wednesday, October 17, 2018

Thought on Mocks

Based on a discussion with @rglover.
Any veterans have a clear reason why not to use mocks?
"Mocks" are fun, because without a shared context you can't always be certain that the other side of the debate understands your arguments the way that you intend them.  Is the conversation specifically about behavior verification ? about all flavors of test doubles ? Are we talking about the test scaffolding that developers use to drive their designs, or the tests we run at choke points to prevent mistakes from propagating to the next stage of the pipeline?

Both Katrina Owen and Sandi Metz  share a model of what should be asserted by tests; incoming commands and queries are checked by asserting the result returned by a query of the system under test; but checking outgoing commands is tricker.

So let's assume for the moment that last problem is the one we are trying to solve.

When I was learning poker, John Finn and Frank Wallace introduced me to the concept of investment odds: the key idea being that our play should be based on a calculus derived from
  • the payoff
  • the probability of success
  • the cost to play
and a comparison of these values against the alternative plays available.

Now, let's look to the Three Laws of TDD, as described by Robert Martin.
You are not allowed to write any production code unless it is to make a failing unit test pass.
If we were to accept this discipline, then we would absolutely need to be able to satisfy a failing check.  That would mean that either we would need to include in the composition of the system under tests the real, production-grade collaborator, or we need to fake it.  Or we can tell Uncle Bob to get stuffed, and write the code we need.

Does the mock give us the best investment odds?

That's the first question, and the answer depends on circumstance.  Simply writing the code requires no additional investment, but does nothing to mitigate risk.

Using the real collaborator gives the most accurate results.  But that's not without cost - you'll pay for latency and instability in the collaborator.  For collaborators within the functional core, that may be acceptable - but the odds of it becoming unacceptable increase if the collaborator is coupled to the imperative shell.  Latency at an automated choke point may be acceptable, but paying that tax during the development cycle is not ideal.  Unstable tests incur investigation costs if the results are taken seriously, and are a distraction if no one can take the initiative to delete them.

There can also be additional taxes if the setup of the collaborator is complicated enough that it distracts from the information encoded into the test itself.  And of course the test becomes sensitive to changes in the setup of the collaborator.

One can reasonably posit circumstances in which the mock is a better play than either of all or nothing.

But there is another candidate: what about a null object?

A null object alone doesn't give you much beyond not testing at all.  It does suggest a line of inquiry; what are you getting from the mock that you aren't getting from the null object... and is the added benefit of the mock something you should be getting from the production code instead?

In other words, are the tests trying to direct your attention to a instrumentation requirement that should be part of your production code?

Especially in the case where you are making off a command that would normally cross a process boundary, having some counters in place to track how many calls were made, or something about the distribution of the arguments passed, could easily have value to the operators of the system.

And, having introduced that idea into the system, do the investment odds still favor the mock?

Even away from the boundary, I think the question is an interesting one; suppose the system under test is collaborating with another module within the functional core.  Is the element of the test that is prejudicing us toward using a mock actually trying to signal a different design concern?  Is the test trying to tell us that the composition is too much work, or that the code is slow, and we we need smarter logic at the boundary of slow?

I don't think the answer is "never mock"; but rather that there is a decision tree, and the exercise of questioning the choice at each fork is good practice.

Red Blue Refactor

As far as I can tell, the mantra "Red Green Refactor" is derived from the colors of the UI controls in the early GuiTestRunners, which were influenced by the conventions of traffic control system, which perhaps were influenced by railroad control systems.

(Image from JSquaredZ).

As labels, `Red` and  `Green` are fine - they are after all just hints at a richer semantic meaning. 

As colors, they suck as a discriminator for the percentage of the population with Deuteranopia.

Looking for another color pair to use, I'm currently thinking that past and future are reasonable metaphors for code improvement, so I'm leaning toward a red/blue pairing.

(Image from Wikipedia)

Of course, in English, "past" and "passed" are homophones, so this perhaps only trades one problem for another. 

Friday, October 12, 2018

Event Sourcing: lessons in failure, part 2

I've written a couple of solo projects using "event sourcing", failing miserably at them because I failed to properly understand how to properly apply that pattern to the problem I was attempting to solve.

Part 2: Fantasy League Scoring

Our fantasy league used a bespoke scoring system, so I decided to try my hand at creating a report for myself to track how every player in baseball was doing.  This gave me extra insights about how I might improve my team by replacing players in the middle of the season.

And to a large extent, it worked - I was able to pick up a number of useful pieces that would otherwise have slipped under the radar, turning over my team much more aggressively than I would have otherwise.

It was still pretty much a crap shoot -- past performance does not promise future results.  But it did have the benefit of keeping me more engaged.

Failure #1: Where are the events?

Again, I had a book of record issue - the events were things happening in the real world, and I didn't have direct access to them.  What I had was a sort of proxy - after a game ended, a log of the game would become available.  So I could take that log, transform it into events, and the proceed happily.

Well, to be honest, that approach is pretty drunk.

The problem is that the game log isn't a stream of events, it is a projection.  Taking what is effectively a snapshot and decomposing it reverses cause and effect.  There were two particular ways that this problem would be exposed in practice.

First, it would sometimes happen that the logs themselves would go away.  Not permanently, usually, but at least for a time.  Alternatively, they might be later than expected.  And so it would happen that data would get lost - because a stale copy of a projection was delivered instead of a live one.  Again, the projections aren't my data, they are cached copies of somebody else's data, that I might use in my own computations.

Second, when these projections appear, they aren't immutable.  That's a reflection of both problems with data entry upstream (a typo needs to be fixed), and also the fact that within the domain, the interpretation of the facts can change over time -- the human official scorers will sometimes reverse an earlier decision.

In other words, for what I was doing, events didn't gain me anything over just copying the data into a RDBMS, or for that matter writing the data updates into a git repository on disk.

Failure #2: Caching

The entire data processing engine depended on representations of data that changed on a slow cadence (once a day, typically) and I wasn't tracking any meta data about how fresh the data was, how stable it ought to be, whether the new data in question was a regression from what had been seen earlier, and so on.

In an autonomous system, this is effectively a sort of  background task - managing local static copies of remote data.

To make this even more embarrassing; I was of course downloading this data from the web, and the HTTP specification has a lot to say about caching, that I didn't even consider.

(I also failed to consider the advantages I might get from using a headless browser, rather than just an html parser.  This bit me hard, and made a big contribution toward the abandoning of the project.)

Failure #3: What's missing?

The process I was working from only considered logs that were available; there was no monitoring of logs that might be missing, or that might have been removed.  This introduced small errors in data comparisons.

I needed to be able to distinguish "here are Bob Shortstop's 6 scores from last week" from "here are Bob Shortstop's 5 scores from last week, and there is a game unaccounted for".

Again, I was thinking of events as things that happen, rather than as events as a way of representing state over time.

Failure #4: Process telemetry

What I wanted, when the wheels were done turning, was the collection of reports at the end.  And that meant that I wasn't paying enough attention to the processes I was running.  Deciding when to check for updated data, on what grain, and orchestrating the consequences of changes to the fetched representations was the real work, and instead I was thinking of that as just "update files on disk".  I didn't have any reports I could look at to see if things were working.

Again, everything was flat, everything was now, absolutely no indications of when data had appeared, or which representation was the source of a particular bit of data.

Solving the Wrong Problem

In effect, what happened is that I would throw away all of my "events" every morning, then regenerate them all from the updated copies of the data in the cache.  If all of your events are disposable, then something is going badly wrong.

The interesting things to keep track of were all related to the process, and discovering that I wanted to refresh the caches slowly, but in a particular priority.

What I should have been looking toward was Rinat's model of a process manager; how would I support a dashboard showing a list of decisions to be made, could I then capture the priorities of the "domain expert" and automate the work.  Could I capture time as a first class concern driving the elements of the system forward?

Some of the reason that I missed this is that I had too much time -- I was deliberately hobbling the fetch of the data, which meant that the cost of redoing all of the work was lost in the noise.  On the other hand, that doubly emphasizes the point that all of the value add was in the bookkeeping, which I never addressed.

Key Question:

Did I need temporal queries?  No.

Wednesday, October 10, 2018

Event Sourcing: Lessons on failure, part one.

I've written a couple of solo projects using "event sourcing", failing miserably at them because I failed to properly understand how to properly apply that pattern to the problem I was attempting to solve.

Part 1: Fantasy Draft Automation 

I came into event sourcing somewhat sideways - I had first discovered the LMAX disruptor around March of 2013.  That gave me my entry into the idea that state could be message driven.  I decided, after some reading and experimenting, that a message driven approach could be used to implement a tool I needed for my fantasy baseball draft.

My preparation for the draft was relatively straight forward - I would attempt to build many ranked lists of players that I was potentially interested in claiming for my team, and during the draft I would look at these lists, filtering out the players that had already been selected.

So what I needed was a simple way to track lists of all of the players that had already been drafted, so that they could be excluded from my lists.  Easy.

Failure #1: Scope creep

My real ambition for this system was that it would support all of the owners, including helping them to track what was going on in the draft while they were away.  So web pages, and twitter, and atom feeds, and REST, and so on.

Getting all of this support right requires being able to accurately report on all of the players who were drafted.  Which in turn means managing a database of players, and keeping it up to date when somebody chooses to draft a player that I hadn't been tracking, and dealing with the variations in spellings, and the fact that players change names and so on.

But for MVP, I didn't need this grief.  I had already uniquely identified all of the players that I was specifically interested in.  I just needed to keep track of those players; so long as I had all of the people I was considering in the player registry, and could track which of those had been taken (no need to worry about order, and I was tracking my own choices separately anyway).

Failure #2: Where is the book of record?

A second place where I failed was in understanding that my system wasn't the book of record for the actions of the draft.  I should have noticed that we had been drafting for years without this database.  And over the years we've worked out protocols for unwinding duplicated picks, and resolving ambiguity.

What I was really doing was caching outcomes from the real world process into my system.  In other words, I should have been thinking of my inputs as a stream of events, not commands, and arranging for the system to detect and warn about conflicts, rather than rejecting messages that would introduce a conflict.

There was no particular urgency about matching picks with identifiers of players in the registry, or in registering players who were not part of the registry.  All of that math could be delayed a hundred milliseconds without anybody noticing.

Failure #3: Temporal queries

The constraints that the system with were trying to enforce the rules that only players in the player registry could be selected, and that each player in the registry could only be selected once.  In addition to the fact that wasn't the responsibility of the system, it was complicated by the fact that the player registry wasn't static.

Because I was trying to track the draft faithfully (not realizing until later that doing so wasn't strictly necessary for my use case), I would stop the program when my registry had a data error.  The registry itself was just dumb bytes on disk; any query I ran against the database was a query against "now".  So changing the entries in the registry would change the behavior of my program during "replay".

Failure #4: Compatibility

Somewhat related to the above - I wasn't always being careful to ensure that the domain logic was backwards compatible with the app that wrote the messages, nor did my message journal have any explicit markers in it to track when message traffic should switch to the new handlers.

So old messages would break, or do something new, screwing up the replay until I went into the "immutable" journal to fix the input errors by hand.

Failure #5: Messages

My message schemas, such as they were, were just single lines of text - really just a transcript of what I was typing at the interactive shell.  And my typing sucks, so I was deliberately making choices to minimize typing.  Which again made it harder to support change.