Cascade Faliure: January 2016

Friday, January 22, 2016

REST: on Resources

I've been trying to make more progress with REST by reviewing the questions that other programmers are asking about it

http://programmers.stackexchange.com/questions/tagged/rest

Sidestepping for the moment the whole riddle of hypermedia representations, the most common issue seems to come about when the programmer assumes that an entity in the domain must have a single URI that does everything with nothing but Four Verbs and the Truth.

The internal mapping seems to be that entities are nouns, and resources are nouns, and therefore entities are resources -- since the URI is "the" identifier for the resource, it must also be "the" identifier for the entity, and suddenly everything looks like a nail.

Clearly, identifiers should be 1-to-1 with resources -- you really need to pervert the definition of "uniform resource identifier" to reach any other conclusion. Although, it turns out that is something that we had to learn over time.

For instance, in 1999, the definition of the PUT method described resources this way

A single resource MAY be identified by many different URIs. For example, an article might have a URI for identifying "the current version" which is separate from the URI identifying each particular version. In this case, a PUT request on a general URI might result in several other URIs being defined by the origin server.

In 2014, the definition of the PUT method changed.

A PUT request applied to the target resource can have side effects on other resources. For example, an article might have a URI for identifying "the current version" (a resource) that is separate from the URIs identifying each particular version (different resources that at one point shared the same state as the current version resource). A successful PUT request on "the current version" URI might therefore create a new version resource in addition to changing the state of the target resource, and might also cause links to be added between the related resources.

The latter interpretation allows you to sidestep one of the complicating issues with interpreting a PUT method -- the body is supposed to be a replacement for the resource, but nothing in "the rules" prevents you from inventing a resource with the specific purpose of being replaced.

Jim Webber hinted at the same thing:

You should expect to have many many more resources in your integration domain than you do business objects in your business domain.

I think the CQRS language helps here -- from our event history, we build lots of different projections that are each especially suitable to a particular use case.

Another idea that helped clear things for me is that a view of an immutable resource can be cached. Trying to cache a projection of a mutable entity brings you face to face with one of the two hard problems, but caching the immutable history of a mutable entity is fine. In other words, we have different resources that describe the state of an entity at different times. Couple that with the idea that there are many ways of expressing a moment of time, that each in turn maps to a different resource, and you get an explosion of possible resources that you can exploit.

Friday, January 8, 2016

Coordinated Entities

Question: how do you decide if two different entities belong in the same aggregate?

I've been puzzling over this for a while now, looking for the right sorts of heuristics to apply.

The book answer is straight forward, without being useful. The aggregate boundary encompasses all of the state required to maintain the business invariant. So if you know what the business invariant, then the problem is easy. You start with an aggregate of a single entity, then you fold in all of the business rules that reference the state of the entity, then you fold in all of the entities touched by those rules, and then fold in more rules... it's turtles until you reach a steady state. Then that aggregate, at least, is complete. You set it aside, pick a new entity, and repeat the process until all the entities in the domain have been assigned to an aggregate.

In any interesting problem space, the invariant is not so clearly defined. Most of the discussions describing the evolution of a model talk about the discovery that the model is missing some element of the Ubiquitous Language, and that inspires someone to recognize why some use case has been broken, or incredibly difficult to implement. Or that the Ubiquitous Language has actually been missing some important concept, that -- one expressed -- brings new clarity to the actual requirements of the business. Most of the refactoring exercises I have seen have described cases where entities were too tightly coupled; contention between unrelated entities was making the system harder to use.

Lesson I learned today:

Thinking about the happy path doesn't inform anything. Any composition of the objects will do when the command history never violates any business rules. The interesting cases are partial failures.

Contention, as noted previously, is a primary pressure to separate entities. Commands are being applied to different entities, where there should be no interplay between the affected states. Yet if both commands are being run through the same aggregate root, then one otherwise satisfactory command will fail because it happened to be trying to commit after a different command has already advanced the history of the aggregate. This is a failure of interference between uncoordinated commands. The inverse problem are two coordinated commands are broadcast to separate entities, where one command succeeds and the other fails.

Thought experiment: suppose that we were to model these two entities in separate aggregates, so that they are participating in different transactions. What would this coordination failure look like in the event stream? Well, you would be watching the events go by, and you would see the history of the successful command, and then you would wait, and wait, and you would never see the history from the other aggregate.

Let's put a scope on it - we have a coordination contingency if some specified amount of time passes without seeing the missing history. That we are watching the event history, and thinking about the passage of time, announces at once that we are considering a process manager; which is an entity that implements a state machine. Within their own transactions, a process manager will emit events describing the changes to the state machine, asynchronously schedule calls to itself (a time trigger), and perhaps dispatch asynchronous commands to the domain model.

There's some block and tackle to be done at this point -- the processManager is an entity in its own right, and we need to be sure that the observed events are dispatched to "the right one". We're going to need some meta data in the events to ensure that they are all going to the right destination.

Back to our experiment; the history of the first command arrives. We load a process manager and pass the event to it. The process manager uses its SLA to schedule a message to itself at some time in the future. Time passes; the scheduled message is delivered. The process manager fires the timeout trigger into its state machine, arrives at the Contingency state, and writes that event into the log.

How does that help?

It gives us something to look for in the Ubiquitous Language. If the coordinated entities really do belong in separate aggregates, then this contingency is a thing that really happens in the business, and so somebody should know about it, know the requirements for mitigating the contingency, what events should appear in the log to track the mitigation progress, and so on.

On the other hand, if the domain experts begins saying "that can't happen", "that MUST NOT happen", "that is too expensive when it happens, which is why we have you writing software to prevent it", and so forth; then that is strong evidence that the two entities in question need to be modeled as part of the same aggregate.