Saturday, July 29, 2017

Testing in Threes

One of the reasons that I find immutable tests intriguing as an idea: if you don't change them, you can't break them.

What does it mean for a test to break?

There are two failure modes; a test can fail even though the implementation satisfies the specification that the test is supposed to evaluate, or the test can pass even though the implementation does not satisfy the specification.

If we want to refactor tests safely, then we really need to have checks in place to protect against these failure modes.

My first thought was that we need to keep the old implementation around.  For instance, if we are trying to fix a bug in a released library, then we write a new test, verify that the new test fails when bound to the broken implementation, then bind the test to the current implementation, do the simplest thing that could work, and so on.

Kind of heavy, but we could make that work.  I don't think it holds up very well for the ephemeral versions of code that have between releases.

What we really want are additional checks that are part of the specification of the test.  Turtles all the way down!  Except that we don't need to recurse very far, because the additional tests never need to complicated.  Throughout their lifetime, they are "so simple that there are obviously no deficiencies."

Today's insight is that we are already creating those checks.  Red Green Refactor is an recipe for implicitly creating, in order
  1. The simplest thing that could possibly break
  2. The simplest thing that could possibly work
  3. The production thing.
So at the end, we had all three of these, but because they were the same mutable entity the intermediate stages are no longer at ready hand.

I haven't finished untangling this snarl yet.  My best guess is that it is leading toward the idea that test driving the implementation is a spike for the specification, and that we later come back and make that robust.

Friday, July 14, 2017

On HTTP Status Codes

Originally written in response to a question on Stack Overflow; the community seemed to think the question wasn't appropriate for the site.

Overview of status codes

I'm designing a RESTful API and I have an endpoint responsible for product purchase. What HTTP status code should I return when user's balance is not enough to purchase the specified product (i.e. insufficient funds)?

The most important consideration is that you recognize who the audience of the status-code is: the participants in the document transport application.  In traditional web apis (which is to say, web sites), the audience would be the browser, and any intermediaries along the way.

For example, RFC 7231 uses status codes as a way to resolve implicit caching semantics
Responses with status codes that are defined as cacheable by default (e.g., 200, 203, 204, 206, 300, 301, 404, 405, 410, 414, and 501 in this specification) can be reused by a cache with heuristic expiration unless otherwise indicated by the method definition or explicit cache controls [RFC7234]; all other status codes are not cacheable by default.

If you think of the API consumer (aka the human being) and the API client (aka the web browser) as separate: then the semantics of the status codes are directed toward the API _client_.  This is what tells the client that it can just follow a redirect (the various 3xx headers), that it can simply reset the previous view (205), that it should throw up a dialog asking that the consumer identifier herself (401) and so on.

The information for the consumer is embedded in the message-body.

402 Payment Required

402 Payment Required, alas, is reserved.  Which is a way of saying that it doesn't have a standard meaning.  So you can't deliver a 402 in the expectation that the API client will be able to do something clever -- it's probably just going to fall back to the 4xx behavior, as described by RFC 7231
a client MUST understand the class of any status code, as indicated by the first digit, and treat an unrecognized status code as being equivalent to the x00 status code of that class, with the exception that a recipient MUST NOT cache a response with an unrecognized status code.

I wouldn't bet hard on 402; it was also reserved in RFC 2616, and there's a big gap in RFC 1945 where it should have been.

My guess would be that a 402 specification would be analogous to the requirements for 401, with additional standard headers being required to inform the client of payment options.

As we don't know what headers those would be.  Taler's approach was to stick in a custom header, for instance.  If you control the client, wiring in your own understanding of what 402 might someday be could be a reasonable option.

Protocol alternatives

Another option of good pedigree is to consider that collecting a payment is just another step in the integration protocol.

So, from that perspective, it's perfectly reasonable to say that the request was processed successfully, but the returned representation, rather than providing a link to the cake, provides a link to the billing system.

This is the approach described by Jim Webber when he talks about RESTBucks.  Needing to make a payment is a perfectly normal thing to do in a purchasing protocol, so there's no need to throw an exception when money is due.  Thus, 2xx Success is still a reasonable choice:
The 2xx (Successful) class of status code indicates that the client's request was successfully received, understood, and accepted.
So the _client_ knows that everything went well; and the consumer needs to review the semantics of the message in the message-body to proceed toward her goal.  This is how hypermedia is intended to work -- the current application state is described by the message.

Protocol violations

Now, if instead of proceeding to the payment system as directed, the consumer tries to skip past the purchasing system onto the good bits; that's not so much part of the protocol, so you needed feel compelled to continue to provide a good experience.  400 Bad Request or 403 Forbidden are your go to choices here.

412 Precondition Failed is just wrong; it means that the preconditions provided in the request headers were not met when the server processed the request.  Unless you've got the client providing some extra headers, it's not a fit.
409 Conflict... I believe that one is wrong, but its less clear.  From what I can see in the literature, 409 is primarily a remote editing response -- we tried to apply some change to a resource, but our edit lost some sort of conflict battle with other changes in the system.  For instance, Amazon associates that status-code with BucketAlreadyExists; the problem with the request to create a bucket with that name is that the name has already been taken (and it is a client error, because the client didn't check first).

Sunday, July 9, 2017

Observations on Repositories

During a long brain storming session, I finally had an important breakthrough on the role of repositories in Domain Driven Design.

In short, the repository is a seam between the application component (acting as the client) and the domain component (acting as the provider).  Persistence concerns and the business logic live within the implementation of the repository.

In the literature, I usually find examples where there is just a single implementation of "the aggregate" that is visible everywhere.  But if we think in terms of evolving the model -- in particular
of being able to replace the model with an improved version easily, then we need to be thinking in terms of interfaces, and service providers.

When Evans was first writing of aggregates, the lines between read and write were somewhat blurred; it wasn't unreasonable to expect that your repository could read state out of the aggregate interface..  With the introduction of CQRS, things get more complicated.  If the use case only calls for the application to modify some aggregate, then the interface that represents that aggregate should only have commands in it that are specific to that case.  In other words, the interface provided to the application doesn't need to have any affordances for reading the current state -- it can be tightly tailored to the specific use case.

Sidebar: this is what ensures that we end up with a "rich" domain model; because the application can't get at the state, it has no recourse except to invoke the provided command method and allow the aggregate to implement the change as it chooses.  The query/calculate/update protocol doesn't work if no queries are accessible.

For the repository to save the state of the object, it needs access to the state captured within it.  Which means that the repository needs more intimate familiarity with the aggregate than what was shared with the application.  We can achieve that in a strong typing system, using generics.

The application no longer knows the exact type of the TradeBook it has retrieved; however, the compiler can verify that the argument passed to the repository matches the implementation that was retrieved from the repository.

All of the domain logic -- what changes when we place an order, how is that change represented in memory, how is that change durably stored -- all of those decisions live within the model, somewhere behind the repository interface.  The repository understands how this model represents all of its data because the repository is of the model.  When we swap out the domain model, the repository is exchanged as well.

Most importantly, the question of whether current state is represented as a collection of events, or as an aggregate document, that decision is answered within the domain model, behind the repository facade.

Expressed another way, the composition root will wire up a persistent store, and then inject that store into a domain model that understands the store, and then will wire up the data model and its repositories with the application (as opposed to wiring the application to the persistence store and the domain model independently).

Friday, July 7, 2017

Demonstration: REST is spelling agnostic

An illustration of the power of REST.

I can search google for google
To the surprise of absolutely no one, the top hit is google
If I click on, or copy, the link, it takes me to something like
Which in turn _redirects_ me to
And my HTTP client, which knows absolutely nothing about Google, manages just fine.  Google can change their URI space any way they like, and the client just follows its nose.

My carbonware HTTP agent, doesn't notice, because its looking at the links, and the semantic cues, not at the spelling of the underlying identifiers.

As far as the client and the agent are concerned, all of those URI are opaque.  The only things that we can do with them is use them for cache lookups.  The meaning of anything that happens to be encoded in those sequence of bytes is private to the server.
This one isn't quite opaque; this URI was constructed by the HTTP client from the data in the submitted form; the pair q=google is a representation of the data entered into the form by the agent, the rest were provided by the server in its representation of the form.

The client and agent have a common understanding of form as a thing of images and UI affordances; the client and the server share a different understanding of form, derived from their common understanding of the HTML media type.

The agent and the server have a common understanding of semantics -- I understand the form from the labels; the client knows how to render the labels, and what to render in them (from parsing the HTML) but the client has no understanding that those labels _mean_ anything.

And it all "just works".

TDD and Immutable Tests

I was working through the bowling game kata again recently, and discovered that there are three distinct moves that we make when playing.  In any given move, we should see either a compile error or a test failure, but never both.

Legal TDD Moves


The most commonly discussed move is refactoring; we change the implementation of the production code, rerun the tests to ensure that the change hasn't broken any tests, and then commit/revert the change that we have made based on the outcome of the test.

The important property here, is that refactoring has a pre-condition that all of the tests are passing, and the post condition that all of the tests are passing.  In other words, the bar is always GREEN when you enter or leave the protocol.

It's during the refactoring move that the production code tends to evolve toward the more generic.


New behaviors are introduced by specifying.  What we are doing in this phase is documenting additional constraints on the outcome (by creating a new test), and then hacking the production code to pass this new test in addition to all of the others.

Make the test work quickly, committing whatever sins are necessary in the process.
 In this protocol, we start from a GREEN state, and proceed from there to RED and then to GREEN.  This has the same cadence as the usual "[RED, GREEN], REFACTOR" mantra, but there's an additional restriction -- the new test we introduce is restricted to using the existing public API of the system under test.


New interfaces are introduced by extending.  During this phase, no new constraints (no asserts) on behavior are introduced, only new affordances for working with the production code.  The key idea in extending is this: you get from the RED bar to the GREEN bar by auto generating code (by hand, if necessary).

Because of these restricted rules, we see only compile errors in this move, no runtime errors; there can't be any runtime errors, because (a) the new interface is not yet constrained by specifications and (b) the code required to reach the new interface already satisfies its own specification.

Extension is almost purely a discovery exercise -- just sit down and write code to the API you wish that you had, then pass the test with automatically generated code.

Immutable Tests

My discovery of extending came about because I was trying to find a protocol that would support the notion of an immutable test.  I've never been particularly comfortable with "refactoring tests", and Greg Young eventually convinced me to just go with it.

The high level argument goes something like this: the tests we write are messages to future developers about the intended behavior of the production code.  As such, the messages should have semantics that be used in a consistent way over time.  So the versioning guidelines apply.

So "refactoring" a test really looks like adding a new test that specifies the same behavior as the old test, checking that both tests are passing, committing, and then in a separate step removing the unwanted, redundant check.

Note that removing the accumulated cruft is optional; duplicate legacy tests are only a problem in two circumstances, analogous to the two phases that created them above.  If you've learned that a specification is wrong, then you simply elide the tests that enforce that specification, and replace them with new tests that specify the correct behaviors. Alternatively, you've decided that the API itself is wrong -- which is a Great Big Deal in terms of backwards compatibility.

Experiences and Observations

Bootstrapping a new module begins with Extension.  Uncle Bob's preference seems to be discovering the API in small steps, implementing enough of the API to pass each time a compile error appears.  And that's fine, but in an immutable test world produces a lot of cruft.

There are two reasonable answers; one is to follow the same habits, creating a _new_ test for each change to the API, and then remove the duplication in the test code later.  It feels kind of dumb, to be honest, but not much more dumb then stopping after each compiler error.  Personally, I'm OK with the idea of writing a single long test that demonstrates the API, then adding afterwards a bunch of checks to specify the behavior.

I'm less happy about trying to guess the API in advance, however.  Especially if we haven't done a spike first -- we're just guessing at the names of things, and the wrong guess means a bunch of test redaction later.

At this point, I'm a big fan of hiding design decisions, so I would rather be conservative about the API.  This means I tend to think about proceeding from supporting specific use cases to supporting a general API.

More precisely, unit tests are dominated by a single shape:
Given this function
When I provide this input
Then I expect the output to satisfy these constraints.
So to my mind, the seed that is in the middle of the first extension is a function.  Because we are operating at the boundary, the input and the constraints will typically be represented using primitives.  I don't want those primitives leaking into my API until I've made a specific choice to lift them there.  So rather than extending a specific type into the test, I'll drop in a function that describes the capability we are promising.  For the bowling game, that first test would look something like

Simplest thing that could possibly work.  If this test passes, I've got code I can ship, that does exactly what it says on the tin.  Then I add a second test, that checks the result.  Then I can engage the design dynamo ; and perhaps discover other abstractions that I want to lift into the API.

The same trick works when I have a requirement for a new behavior, but no clear path to get there with the existing API; just create a new function, hard code in the answer, and then start experimenting with ideas within the module boundary until the required interface becomes clear.  Behind the module, you can duplicate existing implementations and hack in the necessary changes under the green bar, and then run the dynamo from there.

And if the deadline comes crashing down on you before the refactoring is done?  SHIP IT.  All of the hacks are behind the module, behind the API, so you can clean up the code later without breaking the API.

Naming gets harder with immutable tests, because the extension needs a name separate from the specification, and you need two test names to refactor, and so on.

In the long run, extensions are just demonstrations -- here's an example of how the API can be invoked.  It's something that you can cut and paste into production code.  They are perhaps more of a thought construct than an actual artifact that should appear in the code base.

When using functions, the extension check becomes redundant as soon as you introduce the first constraint on its output.

There are some interesting similarities with TDD as if you meant it.  In my mind, green bar means you can _ship_, so putting implementation code in the test method doesn't not appeal at all.  But if you apply those refactoring disciplines behind the module, being deliberately stingy about what you lift into the API, I think you've got something very interesting.

It shouldn't ever be necessary to lift an implementation into the API, other than the module itself.  Interfaces only.

Eventually, the interesting parts of the module get surfaced as an abstraction itself, so that you can apply the same checks to a family of implementations.