Monday, August 27, 2018

TDD: Writing Bad Code

David Tanzer writes of his students asking why they are supposed to write bad code.

He attributes the phenomena as a consequence of the discipline of taking small steps.  Perhaps that is true, but I think there is a more exact explanation for the symptoms that he describes.

After creating a failing test, our next action is to modify the test subject such that the new test passes.  This is, of course, our red/green transition.  Because green tests give us a measure of security against various common classes of mistakes, we want to transition out of the deliberately red state as quickly as possible. 

Quickly, here, is measured in wall clock time, but I think one could reasonably argue that what we really mean is smallest number of code edits.

But I think we get lost in the motivation for the action -- although the edits that take us from red to green are in the implementation of the test subject, the motivation for the action is still the test.  What we are doing is test calibration: demonstrating to ourselves that (in this moment) the test actually measures the production code.

At the point when our test goes green, we don't have a "correct" implementation of our solution.  What we have is a test that constrains the behavior of the solution.

Now we have the option of changing the production implementation to something more satisfactory, confident that if a mistake changes the behavior of our test subject, that the test will notify us when it is next run.  If the tests are fast, then we can afford to run them even as frequently as after each edit, reducing the interval between introducing a mistake and discovering it.

To some degree, I think the real code comes after test calibration.  After the test is calibrated, we can start worrying about out code quality heuristics - transforming the implementation of the test subject from one that meets the minimum standard to one that would survive a code review.

Sunday, August 19, 2018

REST: Fielding 6.5.2

Reviewing Fielding's thesis yet again, I noticed a sentence that I think has not gotten as much exposure as it should

What makes HTTP significantly different from RPC is that the requests are directed to resources using a generic interface with standard semantics that can be interpreted by intermediaries almost as well as by the machines that originate services. -- Fielding
Emphasis added.

If I'm doing it right, I can put a commodity cache in front of my origin server, and it will be able to do useful work because it understands the semantics of the meta data in the HTTP traffic.  The cache doesn't need any special knowledge about my domain, or the payloads being exchanged.

Wednesday, August 15, 2018

Ian Cooper on Aggregates

I'm pretty impressed by the way that Ian Cooper describes "aggregates", and decided to capture his description in the hopes that I can keep it at the forefront of my own thinking

Aggregates in DDD are a way of doing a coarse-grained lock.

Some assertions:
An entity is a row in a relational table i.e. has an unique id
A value type is one or more columns in a row in one or more relational tables.

In order to update an entity I lock its row to ensure ACID properties.
This can scale badly if I need to lock a lot of entitles as we get page and table lock escalation.

If the problem is parent-child e.g. an order and order lines, I could lock the parent row, and not the children to avoid table lock escalation. To make this work, my code has to enforce the rule that no child rows can be accessed, without first taking a lock on the parent entity row.
So my repository needs to enforce a strategy similar to 'lock parent for update' if we succeed, then allow modification of parent and children.
At scale, you may want to turn off table lock escalation on the children at this point. (DANGER WILL ROBINSON, DANGER, DANGER). Because you don't want lock escalation when you lock the object graph.

Aggregates pre-date event sourcing and NoSQL, so its easiest to understand the problem in relational DBs that they were intended to solve.
This is the reason why you don't allow pointers to children, all access has to go through the parent, which must be locked
Usually I don't store anything apart from the ID on the other entity for the root, because I want you to load via the repo, which does the lock for update, give you an object if required
You can also use a pessimistic lock, if you want to report the cause of collisions to a user

Rice and Foemmel, in Patterns of Enterprise Application Architecture, write
Eric Evans and David Siegel define an aggregate as a cluster of associated objects that we treat as a unit for data changes. Each aggregate has a root that provide the only access point to members of the set and a boundary that defines what's included in the set. The aggregate's characteristics call for a Coarse-Grained Lock, since working with any of its members requires locking all of them. Locking an aggregate yields an alternative to a shared lock that I call a root lock. By definition locking the root locks all members of the aggregate. The root lock gives a single point of contention.
To my mind, there are really two distinct ideas in the Evans formulation of the aggregate
  1. we have a lock around a set of data that describes one or more entities
  2. the expression of that lock is implicit in one of the domain entities (the "aggregate root") within that set
To be completely honest, I'm still not entirely convinced that mapping rows to "entities" is the right idea -- rows on disk look to me more like values (state) than entities (behavior).

Finally - I still feel that there isn't enough literature describing change: what are the degrees of freedom supported by this family of designs, do they make common changes easy? do they make rarer changes possible?