Friday, May 11, 2018

Tests, Adapters, and the lifecycle of an API Contract

The problem that I faced today was preparing for a change of an API; the goal is to introduce new interfaces with new spellings that produce the same observable behaviors as the existing code.

Superficially, it looks a bit like paint by numbers.  I'm preparing for a world where I have two different implementations to exercise with the same behavior, ensuring that the same side effects are measured at the conclusion of the automated check.

But the pattern of the setup phase is just a little bit different.  Rather than wiring the automated check directly to the production code, we're going to wire the check to an adapter.


The basic principles at work are those of a dependency injection friendly framework, as described by Mark Seemann.
  • The client owns the interface
  • The framework is the client
In this case, the role of the framework is played by the scenario, which sets up all of the mocks, and verifies the results at the conclusion of the test.  The interface is a factory, which takes a description of the test environment and returns an instance of the system under test.

That returned instance is then evaluated for correctness, as described by the specification.

Of course, if the client owns the interface, then the production code doesn't implement it -- the dependency arrow points the wrong direction.

We beat this with an adapter; the automated check serves as a bridge between the scenario specification and a version of the production code.  In other words, the check stands as a demonstration that the production code can be shaped into something that satisfies the specification.

This pattern gives us a reasonably straight forward way to push two different implementations through the same scenario, allowing us to ensure that the implementation of the new api provides equivalent capabilities to its predecessor.

But I didn't discover this pattern trying to solve that problem...

The problem that I faced was that I had two similar scenarios, where the observable outcome was different -- the observable behavior of the system was a consequence of some configuration settings. Most of my clients were blindly accepting the default hints, and producing the normal result. But in a few edge cases, a deviation from the default hints produced a different result.

The existing test suite was generally soft on this scenario. My desired outcome was two fold -- I wanted tests in place to capture the behavior specification now, and I wanted artifacts that would demonstrate that the edge case behavior needed to be covered in new designs.

We wouldn't normally group these two cases together like this. We're more likely to have a suite of tests ensuring that the default configuration satisfies its cases, and that the edge case configuration satisfies a different suite of results.

We can probably get closer to the desired outcome by separating the scenario and its understanding of ExpectedResult from the specifications.

And likewise for the edge case.

In short, parallel suites with different expected results in shared scenarios, with factory implementations that are bound to a specific implementation.

The promise (actually, more of a hope) is that as we start moving the api contracts through their life cycles -- from stable to legacy/deprecated to retired -- we will along the way catch that there are these edges cases that will need resolution in the new contracts.  Choose to support them, or not, but that choice should be deliberate, and not a surprise to the participants.

Tuesday, May 1, 2018

Ruminations on State

Over the weekend, I took another swing at trying to understand how boundaries should work in our domain models.

Let's start with some assumptions.

First, we capture information in our system because we think it is going to have value to us in the future.  We think there is profit available from the information, and therefore we capture it.  Write only databases aren't very interesting, we expect that we will want to read the data later.

Second, that for any project successful enough to justify additional investment, we are going to know more later than we do today.

Software architecture is those decisions which are both important and hard to change.
We would like to defer hard to change decisions as late as possible in the game.

One example of such a hard decision would be carving up information into different storage locations.  So long as our state is ultimately guarded by a single lock, we can experiment freely with different logical arrangements of that data and the boundaries within the model.  But separating two bounded sets of information into separate storage areas with different locks, then discovering the logical boundaries are faulty makes a big mess.

Vertical scaling allows us to concentrate on the complexity of the model first, with the aim of carving out the isolated, autonomous bits only after we've accumulated several nines of evidence that it is going to work out as a long term solution.

Put another way, we shouldn't be voluntarily entering a condition where change is expensive until we are driven there by the catastrophic success of our earlier efforts.

With that in mind, let's think about services.  State without change is fundamentally just a cache.   "Change" without state is fundamentally just a function.  The interesting work begins when we start combining new information with information that we have previously captured.

I find that Rich Hickey's language helps me to keep the various pieces separate.  In easy cases, state can be thought of as a mutable reference to a value S.  The evolution of state over time looks like a sequence of updates to the mutable reference, where some pure function calculates new values from their predecessors and the new information we have obtained, like so


Now, this is logically correct, but it is very complicated to work with. op(), as shown here, is made needlessly complicated by the fact that it is managing all of the state S. Completely generality is more power than we usually need. It's more likely that we can achieve correct results by limiting the size of the working set. Generalizing that idea could look something like


The function decompose and its inverse compose allow us to focus our attention exclusively on those parts of current state that are significant for this operation.

However, I find it more enlightening to consider that it will be convenient for maintenance if we can re-use elements of a decomposition for different kinds of messages. In other words, we might instead have a definition like


In the language of Domain Driven Design, we've identified re-usable "aggregates" within the current state that will be needed to correctly calculate the next change, while masking away the irrelevant details. New values are calculated for these aggregates, and from them a new value for the state is calculated.

In an object oriented domain model, we normally see at least one more level of indirection - wrapping the state into objects that manage the isolated elements while the calculation is in progress.


In this spelling, the objects are mutable (we lose referential transparency), but so long as their visibility is limited by the current function the risks are manageable.