Friday, October 12, 2018

Event Sourcing: lessons in failure, part 2

I've written a couple of solo projects using "event sourcing", failing miserably at them because I failed to properly understand how to properly apply that pattern to the problem I was attempting to solve.

Part 2: Fantasy League Scoring

Our fantasy league used a bespoke scoring system, so I decided to try my hand at creating a report for myself to track how every player in baseball was doing.  This gave me extra insights about how I might improve my team by replacing players in the middle of the season.

And to a large extent, it worked - I was able to pick up a number of useful pieces that would otherwise have slipped under the radar, turning over my team much more aggressively than I would have otherwise.

It was still pretty much a crap shoot -- past performance does not promise future results.  But it did have the benefit of keeping me more engaged.

Failure #1: Where are the events?

Again, I had a book of record issue - the events were things happening in the real world, and I didn't have direct access to them.  What I had was a sort of proxy - after a game ended, a log of the game would become available.  So I could take that log, transform it into events, and the proceed happily.

Well, to be honest, that approach is pretty drunk.

The problem is that the game log isn't a stream of events, it is a projection.  Taking what is effectively a snapshot and decomposing it reverses cause and effect.  There were two particular ways that this problem would be exposed in practice.

First, it would sometimes happen that the logs themselves would go away.  Not permanently, usually, but at least for a time.  Alternatively, they might be later than expected.  And so it would happen that data would get lost - because a stale copy of a projection was delivered instead of a live one.  Again, the projections aren't my data, they are cached copies of somebody else's data, that I might use in my own computations.

Second, when these projections appear, they aren't immutable.  That's a reflection of both problems with data entry upstream (a typo needs to be fixed), and also the fact that within the domain, the interpretation of the facts can change over time -- the human official scorers will sometimes reverse an earlier decision.

In other words, for what I was doing, events didn't gain me anything over just copying the data into a RDBMS, or for that matter writing the data updates into a git repository on disk.

Failure #2: Caching

The entire data processing engine depended on representations of data that changed on a slow cadence (once a day, typically) and I wasn't tracking any meta data about how fresh the data was, how stable it ought to be, whether the new data in question was a regression from what had been seen earlier, and so on.

In an autonomous system, this is effectively a sort of  background task - managing local static copies of remote data.

To make this even more embarrassing; I was of course downloading this data from the web, and the HTTP specification has a lot to say about caching, that I didn't even consider.

(I also failed to consider the advantages I might get from using a headless browser, rather than just an html parser.  This bit me hard, and made a big contribution toward the abandoning of the project.)

Failure #3: What's missing?

The process I was working from only considered logs that were available; there was no monitoring of logs that might be missing, or that might have been removed.  This introduced small errors in data comparisons.

I needed to be able to distinguish "here are Bob Shortstop's 6 scores from last week" from "here are Bob Shortstop's 5 scores from last week, and there is a game unaccounted for".

Again, I was thinking of events as things that happen, rather than as events as a way of representing state over time.

Failure #4: Process telemetry

What I wanted, when the wheels were done turning, was the collection of reports at the end.  And that meant that I wasn't paying enough attention to the processes I was running.  Deciding when to check for updated data, on what grain, and orchestrating the consequences of changes to the fetched representations was the real work, and instead I was thinking of that as just "update files on disk".  I didn't have any reports I could look at to see if things were working.

Again, everything was flat, everything was now, absolutely no indications of when data had appeared, or which representation was the source of a particular bit of data.

Solving the Wrong Problem

In effect, what happened is that I would throw away all of my "events" every morning, then regenerate them all from the updated copies of the data in the cache.  If all of your events are disposable, then something is going badly wrong.

The interesting things to keep track of were all related to the process, and discovering that I wanted to refresh the caches slowly, but in a particular priority.

What I should have been looking toward was Rinat's model of a process manager; how would I support a dashboard showing a list of decisions to be made, could I then capture the priorities of the "domain expert" and automate the work.  Could I capture time as a first class concern driving the elements of the system forward?

Some of the reason that I missed this is that I had too much time -- I was deliberately hobbling the fetch of the data, which meant that the cost of redoing all of the work was lost in the noise.  On the other hand, that doubly emphasizes the point that all of the value add was in the bookkeeping, which I never addressed.

Key Question:

Did I need temporal queries?  No.






Wednesday, October 10, 2018

Event Sourcing: Lessons on failure, part one.

I've written a couple of solo projects using "event sourcing", failing miserably at them because I failed to properly understand how to properly apply that pattern to the problem I was attempting to solve.

Part 1: Fantasy Draft Automation 

I came into event sourcing somewhat sideways - I had first discovered the LMAX disruptor around March of 2013.  That gave me my entry into the idea that state could be message driven.  I decided, after some reading and experimenting, that a message driven approach could be used to implement a tool I needed for my fantasy baseball draft.

My preparation for the draft was relatively straight forward - I would attempt to build many ranked lists of players that I was potentially interested in claiming for my team, and during the draft I would look at these lists, filtering out the players that had already been selected.

So what I needed was a simple way to track lists of all of the players that had already been drafted, so that they could be excluded from my lists.  Easy.

Failure #1: Scope creep

My real ambition for this system was that it would support all of the owners, including helping them to track what was going on in the draft while they were away.  So web pages, and twitter, and atom feeds, and REST, and so on.

Getting all of this support right requires being able to accurately report on all of the players who were drafted.  Which in turn means managing a database of players, and keeping it up to date when somebody chooses to draft a player that I hadn't been tracking, and dealing with the variations in spellings, and the fact that players change names and so on.

But for MVP, I didn't need this grief.  I had already uniquely identified all of the players that I was specifically interested in.  I just needed to keep track of those players; so long as I had all of the people I was considering in the player registry, and could track which of those had been taken (no need to worry about order, and I was tracking my own choices separately anyway).

Failure #2: Where is the book of record?

A second place where I failed was in understanding that my system wasn't the book of record for the actions of the draft.  I should have noticed that we had been drafting for years without this database.  And over the years we've worked out protocols for unwinding duplicated picks, and resolving ambiguity.

What I was really doing was caching outcomes from the real world process into my system.  In other words, I should have been thinking of my inputs as a stream of events, not commands, and arranging for the system to detect and warn about conflicts, rather than rejecting messages that would introduce a conflict.

There was no particular urgency about matching picks with identifiers of players in the registry, or in registering players who were not part of the registry.  All of that math could be delayed a hundred milliseconds without anybody noticing.

Failure #3: Temporal queries

The constraints that the system with were trying to enforce the rules that only players in the player registry could be selected, and that each player in the registry could only be selected once.  In addition to the fact that wasn't the responsibility of the system, it was complicated by the fact that the player registry wasn't static.

Because I was trying to track the draft faithfully (not realizing until later that doing so wasn't strictly necessary for my use case), I would stop the program when my registry had a data error.  The registry itself was just dumb bytes on disk; any query I ran against the database was a query against "now".  So changing the entries in the registry would change the behavior of my program during "replay".

Failure #4: Compatibility

Somewhat related to the above - I wasn't always being careful to ensure that the domain logic was backwards compatible with the app that wrote the messages, nor did my message journal have any explicit markers in it to track when message traffic should switch to the new handlers.

So old messages would break, or do something new, screwing up the replay until I went into the "immutable" journal to fix the input errors by hand.

Failure #5: Messages

My message schemas, such as they were, were just single lines of text - really just a transcript of what I was typing at the interactive shell.  And my typing sucks, so I was deliberately making choices to minimize typing.  Which again made it harder to support change.


Refactoring toward stateless systems?

The other day, I skimmed the video's of J.B. Rainsberger's Point of Sale Exercise.

In the videos, he talks about the design dynamo, and made a point in passing that removing duplication pushes the data toward the test.

For testing pure functions, that's clearly true - once you map the outputs to the inputs, then the function is fully general, and the specific examples to be tried, along with the answer crib, lives in the test code.  Fine.

When testing a stateful system, it's a similar idea.  The system under test is composed in its ground state, and then the test pushes additional data in to drive the system to the target state, and then we ask the big question.  But looked at from the high level, we're still dealing with a function.

But there are a number of cases where it feels natural to keep internal state within the object; especially if that state is immutable, or deliberately excluded from the API.  Wumpus has examples of each of these.  99 Bottles has duplicated immutable state, in the form of the verse templates.  Horses for courses, I suppose.  Plus the ratchet that teaches us that test data should not flow toward the production code.

But it kicked an idea loose...

If we are moving the state out of the production code, then we are largely moving it toward a database.  The composition root is responsible for wiring up the system to the correct database; in our tests, the test itself takes on this responsibility.

That in turn got me thinking about objects versus "APIs".  When we ported our systems to the web, sessions became a lot shorter - the REST architectural constraint calls for sessions that are but a single request long.

So testing such a system, where our domain starts in its default state, and then we "arrange" the preconditions of our test; this is analogous to a sequence of sessions, rather than one single session that handles multiple messages.

If you were to reflect that honestly in your test, you would have a lot of code in the test reading state out of the "objects" and copying it to the database, then pulling it out again for the next session, and so on.

I wonder if that would break us out of the object framing?

Kata: Refactor the Wumpus

I've shared a Java port of Hunt the Wumpus on Github.

https://github.com/DanilSuits/refactor-the-wumpus

The port is deliberately dreadful -- I tried to simulate a legacy code base by adhering as closely as I could manage, both in structure and in style, to the original.

Java doesn't have a useful goto, and I needed line number hints to keep track of where I was in the source code, so I've introduced a few awful names as well.

But there is a battery of tests available: record-and-playback of my implementation through a number of potentially interesting transcripts.
The key thing is that correct behavior is defined by what the set of classes did yesterday -- Michael Feathers
Producing stable, repeatable behaviors took a bit of thinking, but I was able to work out (eventually) that feature flags were the right approach.  By accident, I got part of the way there early; I had elected to make the record-and-playback tests an optional part of the build via maven profiles.

The argument for a feature flag goes something like this: what I'm really doing is introducing a change in behavior that should not be visible at run time - a dark deploy, so to speak.  Therefore, all of the techniques for doing that are in bounds.

It took a bit of trial and error before I hit on the right mechanism for implementing the changed behavior.  The code in the repository is only a sketch (the object of this exercise is _wumpus_, not feature flags), but if you squint you may be able to see Mark Seemann's ideas taking form.

With the tests in place to manage expectations, you can then engage the Simple Design Dynamo and get to work. I think in practice this particular exercise is biased more toward improve names than it is toward remove duplication, because of the style of the original.
Make the change easy, then make the easy change.  -- Kent Beck
My guess is that rather than trying to attack the code base as a whole, that it may be more effective to work toward particular goals.  Parnas taught us to limit the visibility of design decisions, so that we might more easily change them.  So look through the code for decisions that we might want to change.
  • The existing code implements its own interactive shell; how would we change the code to replace it with a library?
  • The interface for making a move or shooting an arrow is a bit clumsy, can it be replaced?
  • What changes do we need to support a web version, where each input from the player occurs in its own session.
  • Knowing the layout of the tunnel system gives the hunter a significant advantage when shooting arrows.  We could disguise the hunter's location by using randomized names.
  • Can we change the system to support mazes of different size? Mazes with more tunnels?  Mazes where the rooms are not symmetric, with "missing" tunnels? Mazes with more/fewer hazards?
  • Does the language need to be English? 

Monday, September 24, 2018

TDD: Lighting Talk

What follows is a rough sketch of the lightning talk that I gave at the Boston Software Coders meetup.

Tim Ottinger's image, hosted by Uncle Bob.

The Red Green Refactor mantra has been around for a long time, and the cycle shown above is a familiar one.  But if you look carefully, it's not really an accurate representation of what you are doing.

Let's borrow Tony Hoare's notion of preconditions and postcondions, and apply them to the various stages of the Red Green Refactor cycle.

The precondition for entering the RED stage is that we have an empty set of failing tests, and a set P (which may or may not be empty) of passing tests. The post condition is that we have a single element set of failing tests (t), and the same set P of passing tests. (If introducing a new test causes other tests to fail, then we have some undesirable coupling that needs to be addressed).

The precondition for entering the GREEN stage is that we have a single element set of failing tests (t), and a set P of passing tests. The postcondition is that the set of failing tests in empty, and the set of passing tests is the union of (t) and P. Which is to say, we've moved test (t) from the failing column to the passing column, without breaking anything else.

The precondition for entering the REFACTORING stage is that we have an empty set of failing tests (), and a set P of passing tests. The postcondition is that we have an empty set of failing tests (), and the same set P of passing tests.

What we really have here are two different cycles that happen to share starting and terminal states. The upper cycle is used to add more tests, which is to say to add constraints on the behavior of the system. The lower cycle is used to improve the implementation.

On the happy path, the upper cycle has two distinct movements; first, we introduce a new constraint by extending only the test code. Then we satisfy the constraint by modifying only the production code. This is our test calibration, we've proven that the test is actually measuring the production behavior.

But there are unhappy paths - for instance, we introduce a new test, but the test passes when first run. That puts us in an unwelcome state of green, where we haven't yet demonstrated that the code is measuring production. So first we have to drive ourselves to a red state by changing the production code, before we revert back to a true green.

If you are practicing test driven development, you are already doing these things. Take some time to observe yourself doing them, and I promise that you'll make some interesting discoveries.

Sunday, September 23, 2018

TDD: What do tests describe?

The second thing I want to highlight is that refactoring does not change the observable behavior of the software. The software still carries out the same function that it did before. Any user, whether an end user or another programmer, cannot tell that things have changed. -- Martin Fowler.
Most TDD kata feature functions as the system under test. Sure, the underlying implementation might be classes, or monads, or whatever the flavor of the month happens to be. But in the Bowling Game, or Fizz Buzz, or Mars Rover, the inputs completely determine the output.

"1", "2", "Fizz"... each question has one and only one right answer. There is precisely one observable behavior that satisfies the requirements.

But that's not generally true - there are a lot of ways that the requirements may not completely constrain the system. For instance, in the fractions kata, unless you introduce a constraint that further restricts the behavior in some way, adding two fractions can produce any of a number of distinguishable denominators.

Systems with "random" behaviors, or hidden information, face this complexity. My approach to the Fischer Chess kata usually involves isolating the random number generator from a function -- but there are still 960 different layouts that could count as row(0).

So - what's really going on?

Friday, September 14, 2018

On aggregates: values, references, and transactions.

Gods help me, I'm thinking about aggregates again.

I think aggregates, and the literature around aggregates, deserve a poor reputation.

Part of the problem is that the early descriptions of the concepts have implicit in them a lot of the assumptions of enterprise solutions written in Java circa 2003.  And these assumptions haven't aged very well - we are writing a lot of software today that has different design considerations.