Wednesday, August 15, 2018

Ian Cooper on Aggregates

I'm pretty impressed by the way that Ian Cooper describes "aggregates", and decided to capture his description in the hopes that I can keep it at the forefront of my own thinking

Aggregates in DDD are a way of doing a coarse-grained lock.

Some assertions:
An entity is a row in a relational table i.e. has an unique id
A value type is one or more columns in a row in one or more relational tables.

In order to update an entity I lock its row to ensure ACID properties.
This can scale badly if I need to lock a lot of entitles as we get page and table lock escalation.

If the problem is parent-child e.g. an order and order lines, I could lock the parent row, and not the children to avoid table lock escalation. To make this work, my code has to enforce the rule that no child rows can be accessed, without first taking a lock on the parent entity row.
So my repository needs to enforce a strategy similar to 'lock parent for update' if we succeed, then allow modification of parent and children.
At scale, you may want to turn off table lock escalation on the children at this point. (DANGER WILL ROBINSON, DANGER, DANGER). Because you don't want lock escalation when you lock the object graph.

Aggregates pre-date event sourcing and NoSQL, so its easiest to understand the problem in relational DBs that they were intended to solve.
This is the reason why you don't allow pointers to children, all access has to go through the parent, which must be locked
Usually I don't store anything apart from the ID on the other entity for the root, because I want you to load via the repo, which does the lock for update, give you an object if required
You can also use a pessimistic lock, if you want to report the cause of collisions to a user

Rice and Foemmel, in Patterns of Enterprise Application Architecture, write
Eric Evans and David Siegel define an aggregate as a cluster of associated objects that we treat as a unit for data changes. Each aggregate has a root that provide the only access point to members of the set and a boundary that defines what's included in the set. The aggregate's characteristics call for a Coarse-Grained Lock, since working with any of its members requires locking all of them. Locking an aggregate yields an alternative to a shared lock that I call a root lock. By definition locking the root locks all members of the aggregate. The root lock gives a single point of contention.
To my mind, there are really two distinct ideas in the Evans formulation of the aggregate
  1. we have a lock around a set of data that describes one or more entities
  2. the expression of that lock is implicit in one of the domain entities (the "aggregate root") within that set
To be completely honest, I'm still not entirely convinced that mapping rows to "entities" is the right idea -- rows on disk look to me more like values (state) than entities (behavior).

Finally - I still feel that there isn't enough literature describing change: what are the degrees of freedom supported by this family of designs, do they make common changes easy? do they make rarer changes possible?

No comments:

Post a Comment