Cascade Faliure: September 2016

Friday, September 30, 2016

Explicit provenance

I'm working on a rewrite of some existing automation - never where you want to be. My justification is that the solution we have in place is over fit -- it deliciously low maintenance on the happy path, but absurd to work with in all other circumstances. Extending it to meet my current requirements promises to absolutely suck, so I'm going the other way around, leave the existing system untouched, prove out the new approach on the new problem, migrate the old approach as circumstances demand.

One of the problems I'm trying to address; in the current implementation, it's very difficult to see the motivation behind what's going on. In my new work, I'm making a deliberate effort to write things out long hand.

Making things more interesting is the fact that I've chosen to implement the "human operator override" use cases first. Motivation: if a human must be involved, that makes the exercise appreciably more expensive. So smoothing that path out -- in particular, making it as easy as possible to remember how the contingency designs work -- is a priority even though the use is rare.

In a manner of speaking, in the first use case the human operator is acting as a surrogate for the automation to follow. As this exercise is intended to provide a surface for the user, I start with a single input field and a button with VERB written on it in large friendly letters.

I then proceed to hard code into the handler the remainder of the state required to achieve the desired result. Test. Green. Moving on...; there are two problems with the hard coded state that need to be addressed. First, it needs to be exposed to give the human operator additional control when required. I had, in fact, already run into problems with this when I tried to use as my input data that was not representative of the happy path. One simple input field and I had already managed to over fit my solution.

The second is to provide the correct provenance of the data. This begins with meaningful names, but also includes faithfully representing where in the business the data is coming from. The argument from the input control is a String, but that String is really just a representation of a URL, which provides an address for the projection I'm working with. But the spelling of the URL is not arbitrary: it in fact expresses within it details about the source that was used to create the projection, or more specifically about the instance of the process that created the source....

With the provenance written out long hand, it becomes much easier to see where the future seams might lie.

It also exposes that the existing process isn't particularly well thought out, that names are not preserved when crossing from one context to another, and quite honestly that the don't align particularly well with the language of the business as we speak it today.

Thursday, September 22, 2016

Set Validation

Vladimir Khorikov wrote recently about enforcing uniqueness constraints, which is the canonical example of set validation. His essay got me to thinking about validation more generally.

Uniqueness is relatively straight forward in a relational database; you include in your schema a constraint that prevents the introduction of a duplicate entry, and the constraint acts as a guard to protect the invariant in the book of record itself -- which is, after all, where it matters.

But how does it work? The constraint is effective because it blocks the write to the book of record. In the abstract, the constraint gets tested within the database while the write lock is held; the writes themselves have been serialized and each write in turn needs to be consistent with its predecessors.

If you try to check the constraint before obtaining the write lock, then you have a race; the book of record can be changed by another transaction that is in flight.

Single writer sidesteps this issue by effectively making the write lock private.

With multiple writers, each can check the constraint locally, but you can't prove that the two changes in flight don't conflict with each other. The good thing is that you don't need to - it's enough to know that the book of record hasn't changed since your checked it. Logically, each write becomes a compare and swap on the tail pointer of the model history.

Of course, the book of record has to trust that the model actually performed the check before attempting the write.

And implementing the check this way isn't particularly satisfactory. There's not generally a lot of competition for email addresses; unless your problem space is actually assignment of mail boxes, the constraint has generally been taken care of elsewhere. Introducing write contention (by locking the entire model) to ensure that no duplicate email addresses exist in the book or record isn't particularly satisfactory.

This is already demonstrated by the fact that this problem usually arises after the model has been chopped into aggregates; an aggregate, after all, is an arbitrary boundary drawn within the model in an attempt to avoid unnecessary conflict checks.

But to ensure that the aggregates you are checking haven't changed while waiting for your write to happen? That requires locking those aggregates for the duration.

To enforce a check across all email addresses, you also have to lock against the creation of new aggregates that might include an address you haven't checked. Effectively, you have to lock membership in the set.

If you are going to lock the entire set, you might as well take all of those entities and make them part of a single large aggregate.

Greg Young correctly pointed out long ago that there's not a lot of business value at the bottom of this rabbit hole. If the business will admit that mitigation, rather than prevention, is a cost effective solution, the relaxed constraint will be a lot easier to manage.