Cascade Faliure: February 2018

During a discussion today on the DDD/CQRS slack channel, I remembered Rinat Abdullin's discussion of evolving process managers, and it led to the following thinking on events and boundaries.

Let us begin with out simplified matching service; the company profits by matching buyers and sellers. To begin, the scenario we will use is that Alice, Bob, and Charlie place sell orders, and David places a buy order. When our domain expert Madhav receives this information, he uses his knowledge of the domain and recognizes that David's order should be matched with Bob's.

Let's rephrase this, using the ideas provided by Rinat, and focusing on the design of Madhav's interface to the matching process. Alice, Bob, Charlie, and David have placed their orders. Madhav's interface at this point shows the four unmatched orders; he reviews them, decides that Bob and David's orders match, and sends a message describing this decision to the system. When that decision reaches the projection, it updates the representation, and now shows two unmatched orders from Alice and Charlie, and the matched orders from Bob and David.

Repeating the exercise: if we consider the inputs of the system, then we see that Madhav's decision comes after four inputs.

So the system uses that information to build Madhav's view of the system. When Madhav reports his decision to the system, it rebuilds his view from five inputs:

With all five inputs represented in the view, Madhav can see his earlier decision to match the orders has been captured and persisted, so he doesn't need to repeat that work.

At this point in the demonstration, we don't have any intelligence built into the model. It's just capturing data; Udi Dahan might say that all we have here is a database. The database collects inputs; Madhav's interface is built by a simple function - given this collection of inputs, show that on the screen.

Now we start a new exercise, following the program suggested by Rinat; we learn from Madhav that within the business of matching, there are a lot of easy cases where we might be able to automate the decision. We're not trying to solve the whole problem yet; in this stage our ambitions are very small: we just want the interface to provide recommendations to Madhav for review. Ignore all the hard problems, don't do anything real, just highlight a possibility. We aren't changing the process at all - the inputs are the same that we saw before.

We might iterate a number of times on this, getting feedback from Madhav on the quality of the recommendations, until he announces that the recommendations are sufficiently reliable that they could be used to reduce his workload.

Now we start a new exercise, where we introduce time as an element. If there is a recommended match available, and Madhav does not override the recommendation within 10 seconds, then the system should automatically match the order.

We're introducing time into the model, and we want to do that with some care. In 1998, John Carmack told us

If you don't consider time an input value, think about it until you do -- it is an important concept

This teaches us that we should be thinking about introducing a new input into our process flow

Let's review the example, with this in mind. The orders arrive as before, and there are no significant changes to what we have seen before

But treating time as an input introduces an extra row before the match is made

While that seems simple enough, something arbitrary has crept in. For example, why would time be an input in only 10 second bursts?

Or perhaps it's a better separation of concerns to use a scheduler

And we start to notice that things are getting complicated; what's worse, they are getting complicated in a way that Madhav didn't care about when he was doing the work by hand.

What's happening here is that we've confused "inputs" with "things that we should remember". We need to remember orders, we need to remember matches -- we saw that when Madhav was doing the work. But we don't need to remember time, or scheduling; those are just plumbing constructs we introduced to allow Madhav to intercede before the automation made an error.

Inputs and thing we should remember were the same when our system was just a database. No, that's not quite right; they weren't the same, they were different things that looked the same. They were different things that happened to have the same representation because all of the complicated stuff was outside of the system. They diverged when we started making the system more intelligent.

With this revised idea, we have two different ways of thinking about the after situation; we can consider its inputs

Or instead we can think about its things to remember

"Inputs" and "Things to Remember" are really really lousy names, and those spellings don't really lend themselves well to the ubiquitous language of domain modeling. To remain in consistent alignment with the literate, we should use the more common terminology: commands and events.

In the design described so far, we happen to have an alignment of commands and events that is one to one. To illustrate, we can think of the work thus far as an enumeration of database transactions, that look something like:

In the next exercise, consider what would happen if we tried to introduce as an invariant that there should be no easily matched items (following the rules we were previously taught by Madhav) left unmatched. In other words, when David's order arrives, it should be immediately matched with Bob's, rather than waiting 10 seconds. That means that one command (the input of David's order) produces two different events; one to capture the observation of David's order, the other to capture the decision made by the automation as a proxy for Madhav.

The motivation for treating these as two separate events is this: it most closely aligns with the events we were generating when Madhav was making all of the decisions himself. Whether we use Madhav in the loop making the decisions, or simply reviewing scheduled decisions, or leaving all of the decisions to the automation, the captured list of events is the same. That in turn means that these different variations in implementation here do not impact the other implementations at all. We're limiting the impact of the change by ensuring the the observable results are consistent in all three cases.

Cascade Faliure

Monday, February 5, 2018

Events in Evolution