Cascade Faliure: Thoughts on an Acceptance Tests

In my recent experiments with Hunt the Wumpus, I started thinking about what an "acceptance" test might look like.

To get started, I reviewed the walking skeleton example in Growing Object Oriented Software. Freeman and Pryce wrote that the initial iteration should include delivery of a completely automated checkout/build/deploy/test pipeline, front loading the work of solving a number of critical system and political issues. The acceptance test, in their example, launches the application and uses the user interface to probe and measure the app.

For an interactive shell app like Wumpus, the test harness is relatively straight forward; we control stdin to pass data to the app and control stdout to read data from the app.

What I struggle with, at this point in the narrative, is the amount of work required to create stable acceptance tests.

A point of view: automated checks are mistake detectors. They don't provide value to the user - you can delete all of your automated checks and the behavior of your production code doesn't change. Economically, the justification for the tests is that they reduce the costs of future work. More precisely, we adopt processes that shutdown when a mistake is detected, ensuring that the mistakes cannot be overlooked, and that we don't expose our test subjects to expensive evaluation when the more cost effective checks have already detected problems.

There's another potential benefit to checks, which the TDD ritual seeks to exploit: thinking about how the checks will validate the behavior of your application creates space to discover important ideas in your app before you start coding it.

The acceptance tests, what with all the work we need to do to set them up, are expensive relative to other mechanisms for checking the correctness of the program. In the case of the Auction Sniper, those tests included measuring that the app could talk to other processes.

In the case of Wumpus, there really aren't other processes to talk to unless we choose a particularly contrived design. Only the interface to the user is interesting. So there isn't a lot of complexity that
needs to be evaluated from the outside.

Which is good, because that evaluation is painful.

Wumpus has three awkward aspects to it; hidden information, non-deterministic behavior, and message schema.

The hidden information aspect is what introduces uncertainty in the game - with complete knowledge of the hazards in the maze, the game can be won trivially by shooting the wumpus in its lair. But without that hidden information, one cannot know the correct outcome of any action by the hunter.

The location of the hazards in the game is non-deterministic - that's part of the mechanism for hiding that information from the player. In addition, each of the hunters actions can induce random behavior by the hazards in the game. These random effects mean that any given action by the player can have multiple candidate responses, depending on how the dice fall.

The feedback from the game to the player is all via messages written to the console. Those messages were designed (such as it is) for human readability, rather than machine readability. Understanding the semantics of those messages requires introducing a parser into the acceptance test.

What this means is that we have some work cut out for us if we want anything more than a trivial verification that some message was written to standard out.

One possibility is that we can introduce the idea of specifying a seed for the non-deterministic behavior from outside the program. The acceptance test can fix the seed, then perform a domain agnostic comparison of the output to some golden master that we specify. This is somewhat brittle: the current mapping of random values to representations is arbitrary, and the domain agnostic match over fits the representation of the messages.

Another possibility is to introduce an affordance that allows the specification of a message schema to use; the acceptance test simply switches the application into a mode where the responses are easy to parse, much like an http request might distinguish between text/plain and application/json. Even without fixing the seed, our acceptance test can still easily identify that all of the messages are well formed.

The schema approach, while straight forward, feels like a lot of work that will not pay off. I think the issue here is that, while wumpus is a more interesting toy exercise than the bowling game or a Fibonacci calculator, it is still fundamentally a toy problem -- one with an arbitrary and limited scope.

My null-design port of Wumpus from basic to Java is only 375 lines long; it's hard to envision that project having a lifetime that justifies heavy upfront investment in acceptance tests.

What we can do, from the outset, is decide that the behaviors that the acceptance test needs to control - the random seed, the interpretation of the random values, the message schema - can be controlled from the outside, and that the idiom for changing those behaviors in the future is to extend the application with new selectable behaviors, rather than replacing the existing behaviors.

Cascade Faliure

Monday, September 2, 2019

Thoughts on an Acceptance Tests

No comments:

Post a Comment