Cascade Faliure: October 2017

Sunday, October 29, 2017

Aggregate Semantics

First, we need an abstraction for encapsulating references within the model.

This past week, I had a brief, on-line discussion with Yves Reynhout about repositories. He pointed out that there is a difference between persistence oriented repositories and collection oriented repositories.

A REPOSITORY represents all objects of a certain type as a conceptual set. It acts like a collection....

(Repositories) present clients with a simple model for obtaining persistent objects and managing their life cycle.

The invocation of the repository might be as simple as

The key insight is that this is just a semantic; behind the interface, the domain model is free to choose its implementation.

We're not really -- at least, not necessarily -- "invoking a method on an entity in the model". What we are actually doing in this design is capturing two key pieces of information

Which document in our data model are we trying to modify
Which behavior in our domain model are we modifying that document with

In the usual pattern, the underlying implementation usually looks like "an object"; we acquire some state from the data store, wrap it with domain methods, let the application invoke one, extract the state back out of the object, and write it to the store.

This pattern bothered me for a long time -- we shouldn't "need" getters and setters on the domain objects, yet somehow we need to get the current state into and out of them. Furthermore, we've got this weird two phase commit thing going on, where we are mutating an entity in memory and the book of record.

A different analogy to consider when updating the model is the idea that we are taking a command -- sent to the domain model -- and transforming it into a command sent to the data. In other words, we pass the behavior that we want to the data model, saying "update your state according to these business rules".

Tell, don't ask.

Sunday, October 22, 2017

Test Driven Diamonds

I've been reviewing, and ruminating upon, a bunch of experiences folks have reported with the diamond kata. A few ideas finally kicked free.

Overview

Seb noted this progression in the tests

The usual approach is to start with a test for the simple case where the diamond consists of just a single ‘A’
The next test is usually for a proper diamond consisting of ‘A’ and ‘B’; It’s easy enough to get this to pass by hardcoding the result.
We move on to the letter ‘C’: The code is now screaming for us to refactor it, but to keep all the tests passing most people try to solve the entire problem at once.

Alistair expressed the same conclusion more precisely

all the work that they did so far is of no use to them whatsoever when they hit ‘C’, because that’s where all the complexity is. In essence, they have to solve the entire problem, and throw away masses of code, when they hit ‘C’

George's answer is "back up and work in smaller steps", but I don't find that answer particularly satisfactory. I believe that there's something deeper going on.

Background

Long ago, I was trying to demonstrate simplest thing that could possibly work on the testdrivendevelopment discussion group. I had offered this demonstration:

And Kent Beck in turn replied

Do you have some refactoring to do first?

Sadly, I wasn't ready for that particular insight that day. But I come back to it from time to time, and am starting to feel like I'm beginning to appreciate the shape of it.

On Unit Tests

Unit tests are a check that the our latest change didn't break anything -- the behavior specifications that were satisfied before the change are still satisfied after the change.

Big changes can be decomposed into many small changes. So if we are working in small increments, we are going to be running the tests a lot. Complex behaviors can be expressed as many simple behaviors; so we expect to have a lot of tests that we run a lot.

This is going to suck unless our test suite has properties that make it tolerable; we want the tests to be reliable, we want the interruption of running the tests to be as small as possible -- we're running the automated checks to support our real work.

One way to ensure that the interruption is brief is to run the tests in parallel. Achieving reliability with concurrently executing tests introduces some constraints - the behaviors of the tests need to be isolated from one another. Any assets that are shared by concurrently executing tests have to be immutable (otherwise you are vulnerable to races).

This constraint effectively means that the automated checks are pure functions; given-when-then describes the inputs: a deterministic representation of the initial configuration of the system under test, a message, a representation of the specification to be satisfied, and the output is a simple boolean.

This test function is in turn composed of some test boiler plate and a call to some function provided by the system under test.

Implications

... since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation — every sun, every planet, their orbits, their composition and their economic and social history from, say, one small piece of fairy cake.

If we are, in fact, testing a function, then all of the variation in outputs must be encoded into the inputs. That's the duplication Beck was describing.

Beck's ability to discover duplication has been remarked on elsewhere, and I suspect that this principle is a big part of it. Much of the duplication is implicit -- you need to dig to make the implicit explicit.

Example

To show what I mean, let's take a look at the production code for the Moose team after the first passing test.

We've got a passing test, so before we proceed: where is the duplication? One bit of duplication is that the "A" that appears in the output is derived from the `A` in the input. Also, the number of lines in the output is derived from the input, as is the relative position of the A in that output.

There's nothing magic about the third test, or even the second, that removes the shackles - you can attack the duplication as soon as you have a green bar. All of the essential complexity of the space is included in the first problem, so you can immediately engage the design dynamo and solve the entire problem.

So when do we need more than one test? I think of the bowling game, where the nature of the problem space allows us to introduce the rules in a gradual way. The sequencing of the tests in the bowling game allows us to introduce the complexity of the domain in a controlled fashion - marks, and the complexities of extra throws in the final frame are deferred.

But it doesn't have to be that way - the mars rover kata gives you a huge slice of navigation complexity in the first example; once you have eliminated the duplication between the input and the output, you are done with navigation, and can introduce some other complexity (wrapping, or collisions).

Of course, if the chunk of complexity is large, you can easily get stuck. Uncle Bob touched on this in his write up of the word wrap kata; passing the next test can be really difficult if you introduce too many new ideas at once. Using a test order that introduces new ideas one at a time is an effective way to build up a solution. Being willing to suspend complex work to find a more manageable slice is an important skill.

And, in truth, I did the same thing in the rover; essentially pushing as far as I could with the acceptance test alone, then suspending that work in favor of a new test isolating the next behavior I needed, refactoring my solution into my acceptance work, and continuing on until I next got stuck.

In an exercise dedicated to demonstrating how TDD works, that's an important property for a problem to have: a rule set with multiple elements that can be introduced into the solution separately.

Of course, that's not the only desirable property; you also need an exercise that can be explained in a single page, one in which you can make satisfactory progress with useful lessons in a single session. That introduces a pressure to keep things simple.

But there's a trap: if you strip down the problem to a single rule (like the diamond, or Fibonacci), then you lose the motivation for the increments - the number of test cases you introduce before actually solving the problem becomes arbitrary.

On the other hand, if simply returning a hard coded value is good enough to pass all of our acceptance tests, then why are we investing design capital in refactoring the method? The details clearly aren't all that important to the success of the project yet.

My mental model right now is something along the ideas of a budget; each requirement for a new variation in outputs gives us more budget to actually solve the problem, rather than just half assing it. For a problem like Fibonacci, teasing out the duplication is easy, so we don't need many iterations of capital before we just solve the damn thing. Teasing the duplication out of diamond is harder, so we need a commensurately larger budget to justify the investment.

Increments vs Iterations

I found the approaches that used property based testing to be absolutely fascinating. For quite a while I was convinced that they were wrong; but my current classification of them is that they are alien.

In the incremental approach, we target a specific example which is complete, but not general. Because the example is complete, it necessarily satisfies all invariants that are common to all possible outputs. Any acceptance test that only needs the invariant satisfied will be able to get buy with this single solution indefinitely.

The property based approach seems to explore the invariant generally, rather than targeting a specific example. For diamond

The output is a square
With sides of odd length
With the correct length
With top-bottom symmetry
With left-right symmetry
With the correct positions occupied by symbols
With the correct positions occupied by the correct symbols

This encourages a different way of investing in the solution - which properties do we need right now? For instance, if we are working on layout, then getting the dimensions right is urgent (so that we can properly arrange to fit things together), but we may be able to use a placeholder for the contents.

For the mars rover kata, such an approach might look like paying attention first to the shape of the output; that each row of output describes a position and an orientation, perhaps a rule that there are no collisions in the grid, that all of the positions are located within the input grid, that there is an output row for each input row, and so on.

I don't think property based tests really help in this kata. But I think they become very interesting if the requirements change. To offer a naive example, suppose we complete the implementation, and the acceptance tests are passing, and then during the playback the customer recognizes that the requirements were created incorrectly. The correct representation of the diamonds should be rotated 90 degrees

With a property driven test suite, you get to keep a lot of the tests; you retire the ones that conflict with the corrected requirements, and resume iterating until the new requirements are satisfied? With the example based? approach, the entire test suite gets invalidated in one go -- except for the one trivial example where the changed requirements didn't change the outcome.

Actually, that's kind of interesting on its own. The trivial example allows you to fully express, by means of aggressively eliminating the duplication, a complete solution to the diamond problem in one go. But what it doesn't give you is an artifact that documents for your successor that the original orientation of the diamond, rather than its successor, is the "correct" one.

The more complicated example makes it easier to distinguish which orientation is the correct one, but is more vulnerable to changing requirements.

I don't have an answer, just the realization that I have a question.

Wednesday, October 4, 2017

Value Objects, Events, and Representations

Nick Chamberlain, writing at BuildPlease:

Domain Models are meant to change, so changing the behavior of a Value Object in the Domain Model shouldn’t, then, affect the history of things that have happened in the business before this change.

Absolutely right.

The Domain Model is mutable, therefore having Events take dependencies on the Domain Model means that Events must become mutable - which they are not.

Arrrg.

As your domain model evolves, you may add new invariants to be checked, or change existing ones on the Value Object that you’re serializing to the Event Store.

Fundamentally, what Chamberlain is suggesting here is that you may want to replace your existing model with another that enforces stricter post conditions.

That's a backwards incompatible change - in semantic versioning, that would call for a major version change. You want to be really careful about how you manage that, and you want to be certain that you design your solution so that the costs of doing that are in the right place.

If the Domain Model was mutable, we’d also need versioning it - having classes like RoastDate_v2… this doesn’t match the Ubiquitous Language.

Right - so that's not the right way to version the domain model. The right way is to version the namespace. The new post condition introduces a new contract, both at the point of change, and also bubbling up the hierarchy as necessary. New implementations for those new contracts are introduced.
The composition root chooses the new implementations as it wires everything together.

"Events are immutable" is a design constraint, not an excuse to bathwater the baby.

Yes, we may need to evolve our understanding of the domain in the future that is incompatible with the data artifacts that we created in the past. From this, it follows that we need to be thinking about this as a failure mode; how do we want that failure to manifest? what options can we support for recovery?

Presumably we want the system to fail safely; that probably means that we want a fail on load, an alert to a human operator, and some kind of compensating action that the operator can take to correct the fault and restore service.

For instance, perhaps the right kind of compensating action is correcting entries. If your history is tracked an an append only collection of immutable events, then the operator will need to append the compensating entries to the live end of the stream. So your history processing will need to be aware of that requirement.

Another possibility would be to copy the existing history into a new stream, fixing the fault as you go. This simplifies the processing of the events within the model, but introduces additional complexity in discovering the correct stream to load.

We might also want to give some thought to the impact of a failure; should it clobber all use cases that touch the broken stream? That maximizes your chance of discovering the problem. On the other hand, being more explicit about how data is loaded for each use case will allow you to continue to operate outside of the directly impacted areas.

My hunch is that investing early design capital to get recovery right will also ease the constraints on how we represent data within the domain model. At the boundaries, the events are just bytes; but within the domain model, where we are describing the changes in the business, the interface of events is described in the ubiquitous language, not in primitives.

ps: Versioning in an Event Sourced System (Young, 2017) is an important read when you are thinking about messages that evolve as your requirements change.

Monday, October 2, 2017

A not so simple trick

Pawel Pacana

Event Sourcing is like having two methods when previously there was one.

Noooooooo

In fairness, the literature is a mess. Let's see what we can do about separating out the different ideas.

Data Models

Let's consider a naive trade book as an example; it's responsible for matching sell orders and buy orders when the order prices match. So the "invariant", such as it is, is that we never are holding unmatched buy orders and sell orders at the same price.

Let's suppose we get three sell orders; two offering to sell 100 units at $200, and between them a third offer to sell 75 units at $201. At that point in the action, the data model might be represented this way.

The next order wants to buy 150 units at $200, and our matching algorithm goes to work. The resulting position might have this representation.

And after another buy order arrives, we might see a representation like

After each order, we can represent the current state of the trade book as a document.

There is an alternative representation of the trade book; rather than documenting the outcome of the changes, we document the changes themselves. Our original document might be represented this way

Then, when the buy order arrives, you could represent the state this way

But in our imaginary trade book business, matches are important enough that they should be explicit, rather than implicit; so instead, we would more likely see this representation

And then, after the second buy order arrives, we might see

The two different models both suffice to describe equivalent states of the same entity. There are different trade offs involved, but both approaches provide equivalent answers to the question "what is the state of the trade book right now".

Domain Models

We can wrap either of these representations into a domain model.

In either case, the core interface that the application interacts with is unchanged -- the application doesn't need to know anything about how the underlying state is represented. It just needs to know how to communicate changes to the model. Thus, the classes playing the role of the "aggregate root" have the same exposed surface. It might look like

The underlying implementations of the trade book entity is effectively the same. Using a document based representation, we would see an outline like

Each time an order is placed, the domain model updates the local copy of the document. We get the same shape if we use the event backed approach...

Same pattern, same shape. When you introduce the idea of persistence -- repositories, and copying the in memory representation of the data to a durable store, the analogs between the two hold. The document representation fits well with writing the state to a document store, and can of course be used to update a relational model; perhaps with assistance of an ORM. But you could just as easily copy the event "document" into the store, or use the ORM to transform the collection of events into some relational form. It's just data at that point.

There are some advantages to the event representation when copying state to the durable store. Because the events are immutable, you don't need to even evaluate whether the original entries in the list have changed. You don't have to PUT the entire history; you can simply PATCH the durable store with the updates. These are optimizations, but they don't change the core of the patterns in any way.

Projections

Event histories have an important property, thanks to their append only nature -- updates are non-destructive. You can create from the event history any document representation you like; you only need to have an understanding of how to represent a history of no events, and how each event type in turn causes the document representation to change.

What we have here is effectively a state machine; you load the start state and then replay each of the state transitions to determine the final state.

This is a natural approach to take when trying to produce "read models", optimized for a particular search pattern. Load the empty document, replay the available events into it, cache the result, obtain more events, replay those, cache this new result, and so on. If the output representation is lost or corrupted, just discard it and replay the complete history of the model.

There are three important facets of these projections to pay attention to

First, the motivation for the projections is that they serve queries much more efficiently than trying to work with the event history directly.

Second, that because replaying an entire event history can be time consuming, the ability to resume the projection from a previously cached state is a productivity win.

Third, that a bit of latency in the read use case is typically acceptable, because there is no risk that querying stale data will corrupt the domain.

The Tangle

Most non-trivial domain models require some queries when updating the model. For instance, when we are processing an order, we need to know which previously unmatched orders were posted with a matching price. If the domain requires first in first out processing, then you need the earliest unmatched order.

Since projections are much better for query use cases than the raw event stream, the actual implementation of our event backed model probably cheats by first creating a local copy of a suitable projection, and then using that to manage the queries

That "solves" the problem of trying to use the event history to support queries directly, but it leads directly into the second issue listed above; processing the entire event history on demand is relatively expensive. You'd much prefer to use a cached copy.

And while using a cached copy for a write is fine, using a stale copy for a write is not fine. The domain model must be working with representations that reflect the entire history when enforcing invariants. More particularly, if a transaction it going to be consistent, then the events calculated at the conclusion of the business logic must take into account earlier events in the same transaction. In other words, the projection needs to be continuously updated by the work in progress.

This leads to a design where we are using two coordinated data models to support writes: the event backed representation that will eventually be used to update the durable store, and the document backed representation this is used to support the queries needed to enforce the invariant. The trade book, in effect, becomes its own cache.

We could, of course, mutate the document directly, rather than projecting the new events into it. But that introduces a risk that the document representation we have now won't match the one we create later when we have only the events to work from. It also ensures that any projections we make for supporting reads will have the same state that was used while performing the writes.

To add to the confusion: once the document representation of the model has been rehydrated, the previously committed events don't contribute; they aren't going to be changed, the document supports the queries, updating the event store is only going to append the new information.... Consequently, the existing history gets discarded, and the use case only tracks the new events that have been discovered locally in this update.