Cascade Faliure: June 2018

Thursday, June 14, 2018

CQRS Meetup

Yesterday's meetup of the Boston DDD/CQRS/ES group was at Localytics, and featured a 101 introduction talk by James Geall, and a live coding exercise by Chris Condon.

CQRS is there to allow you to optimize the models for writing and reading separately. NOTE: unless you have a good reason to pay the overhead, you should avoid the pattern.

James also noted that good reasons to pay the overhead are common. I would have liked to hear "temporal queries" here - what did the system look like as-at?

As an illustration, he described possibilities for tracking stock levels as a append only table of changes and a roll-up/view of a cached result. I'm not so happy with that example in this context, because it implies a coupling of CQRS to "event sourcing". If I ran the zoo, I'd probably use a more innocuous example: OLTP vs OLAP, or a document store paired with a graph database.

The absolute simplest example I've been able to come up with is an event history; the write model is optimized for adding new information to the end of the data structure as it arrives. In other words, the "event stream" is typically in message order; but if we want to show a time series history of those events, we need to _sort_ them first. We might also change the underlying data structure (from a linked list to a vector) to optimize for other search patterns than "tail".

Your highly optimized model for "things your user wants to do" is unlikely to be optimized for "things your user wants to look at".

This was taken from a section of James's presentation explaining why the DDD/CQRS/ES tuple appear together so frequently. He came back to this idea subsequently in the talk, when responding to some confusion about the read and write models

You will be doing roll ups in the write model for different reasons than those which motivate the roll ups in the read model.

A lot of people don't seem to realize that, in certain styles, the write model has its own roll ups. A lot of experts don't seem to realize that there is more than one style -- I tried to give a quick calca on an alternative style at the pub afterwards, but I'm not sure how well I was able to communicate the ideas over the background noise.

The paper based contingency system that protects the business from the software screwing up is probably a good place to look for requirements.

DDD in a nut shell, right there.

That observation brings me back to a question I haven't found a good answer to just yet: why are we rolling our own business process systems, rather than looking to the existing tooling for process management (Camunda, Activiti and the players in the iBPMS Magic Quadrant)? Are getting that much competitive advantage from rolling our own?

Event sourcing gives you a way to store the ubiquitous language - you get release from the impedance mismatch for free. A domain expert can look at a sequence of events and understand what is going on.

A different spelling of the same idea - the domain expert can look at a given set of events, and tell you that the information displayed on the roll up screen is wrong. You could have a field day digging into that observation: for example, what does that say about UUID appearing in the event data?

James raised the usual warning about not leaking the "internal" event representations into the public API. I think as a community we've been explaining this poorly - "event" as a unit of information that we use to reconstitute state gets easily confused with "event" as a unit of information broadcast by the model to the world at large.

A common theme in the questions during the session was "validation"; the audience gets tangled up in questions about write model vs read model, latency, what the actual requirements of the business are, and so on.

My thinking is that we need a good vocabulary of examples of different strategies for dealing with input conflicts. A distributed network of ATM machines; both in terms of the pattern of a cash disbursement, and also reconciling the disbursements from multiple machines when updating the accounts. A seat map on airline, where multiple travelers are competing for a single seat on the plane.

Chris fired up an open source instance of Event Store, gave a quick tour of the portal, and then started a simple live coding exercise: a REPL for debits and credits, writing changes to the stream, and then then reading it back. In the finale, there were three processes sharing data - two copies of the REPL, and the event store itself.

The implementation of the logic was based on the Reactive-Domain toolkit; which reveals its lineage, as it is an evolution of ideas acquired from working with Jonathan Oliver's Common-Domain and with Yves Reynhout, who maintains AggregateSource.

It's really no longer obvious to me what the advantage of that pattern is; it always looks to me as though the patterns and the type system are getting in the way. I asked James about this later, and he remarked that no, he doesn't feel much friction there... but he writes in a slightly different style. Alas, we didn't have time to explore further what that meant.

Sunday, June 10, 2018

Extensible message schema

I had an insight about messages earlier this week, one which perhaps ought to have been obvious. But since I have been missing it, I figured that I should share.

When people talk about adding optional elements to a message, default values for those optional elements are not defined by the consumer -- they are defined by the message schema.

In other words, each consumer doesn't get to choose their own preferred default value. The consumer inherits the default value defined by the schema they are choosing to implement.

For instance, if we are adding a new optional "die roll" element to our message, then consumers need to be able to make some assumption about the value of that field when it is missing.

But simply rolling a die for themselves is the "wrong" answer, in the sense that it isn't repeatable, and different consumers will end up interpreting the evidence in the message different ways. In other words, the message isn't immutable under these rules.

Instead, we define the default value in the schema - documenting that the field is xkcd 221 compliant; just as every consumer that understands the schema agrees on the semantics of the new value, they also agree on the semantic meaning of the new value's absence.

If two consumers "need" to have different default values, that's a big hint that you may have two subtly different message elements to tease apart.

These same messaging rules hold when your "message" is really a collection of parameters in a function call. Adding a new argument is fine, but if you aren't changing all of the clients at the same time then you really should continue to support calls using the old parameter list.

In an ideal world, the default value of the new parameter won't surprise the old clients, by radically changing the outcome of the call.

To choose an example, suppose we've decided that some strategy used by an object should be configurable by the client. So we are going to add to the interface a parameter that allows the client to specify the implementation of the strategy they want.

The default value, in this case, really should be the original behavior, or it semantic equivalent.

Wednesday, June 6, 2018

Maven Dependency Management and TeamCity

A colleague got bricked when an engineer checked in a pom file where one of the dependencyManagement entries was missing a version element.

From what I see in the schema, the version element is minOccurs=0, so the pom was still valid.

Running the build locally, the build succeeded. A review of the dependency:tree output was consistent with an unmanaged dependency -- two of the submodules showed different resolved versions (via transitive dependencies).

Running the build locally, providing the version element, we could see the dependencies correctly managed to the same version in each of the submodules.

But in TeamCity? Well, the version of the project with the corrected pom built just fine. Gravity still works. But the bricked pom, produced this stack trace.

[18:28:22]W:  [Step 2/5] org.apache.maven.artifact.InvalidArtifactRTException: For artifact {org.slf4j:slf4j-log4j12:null:jar}: The version cannot be empty.
 at org.apache.maven.artifact.DefaultArtifact.validateIdentity(DefaultArtifact.java:148)
 at org.apache.maven.artifact.DefaultArtifact.(DefaultArtifact.java:123)
 at org.apache.maven.bridge.MavenRepositorySystem.XcreateArtifact(MavenRepositorySystem.java:695)
 at org.apache.maven.bridge.MavenRepositorySystem.XcreateDependencyArtifact(MavenRepositorySystem.java:613)
 at org.apache.maven.bridge.MavenRepositorySystem.createDependencyArtifact(MavenRepositorySystem.java:120)
 at org.apache.maven.project.DefaultProjectBuilder.initProject(DefaultProjectBuilder.java:808)
 at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:617)
 at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:405)
 at org.apache.maven.DefaultMaven.collectProjects(DefaultMaven.java:663)
 at org.apache.maven.DefaultMaven.getProjectsForMavenReactor(DefaultMaven.java:654)
 at sun.reflect.GeneratedMethodAccessor1975.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl$4.run(MavenEmbedderImpl.java:447)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl.executeWithMavenSession(MavenEmbedderImpl.java:249)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl.readProject(MavenEmbedderImpl.java:430)
 at org.jetbrains.maven.embedder.MavenEmbedderImpl.readProjectWithModules(MavenEmbedderImpl.java:336)
 at jetbrains.maven.MavenBuildService.readMavenProject(MavenBuildService.java:732)
 at jetbrains.maven.MavenBuildService.sessionStarted(MavenBuildService.java:206)
 at jetbrains.buildServer.agent.runner2.GenericCommandLineBuildProcess.start(GenericCommandLineBuildProcess.java:55)

It looks to me like some sort of pre-flight check by the MavenBuildService before it surrenders control to maven.

If I'm reading the history correctly, the key is https://issues.apache.org/jira/browse/MNG-5727

That change went into 3.2.5; TeamCity's mavenPlugin (we're still running 9.0.5, Build 32523) appears to be using 3.2.3.

A part of what was really weird? Running the build on the command line worked; In my development environment, I had been running 3.3.9. So I had this "fix"; and everything was groovy. When I sshed into the build machine, I was running... 3.0.4. Maybe that's too early for the bug to appear? Who knows - I hit the end of my time box.

If you needed to read this... good luck.