Thursday, September 22, 2016

Set Validation

Vladimir Khorikov wrote recently about enforcing uniqueness constraints, which is the canonical example of set validation.  His essay got me to thinking about validation more generally.

Uniqueness is relatively straight forward in a relational database; you include in your schema a constraint that prevents the introduction of a duplicate entry, and the constraint acts as a guard to protect the invariant in the book of record itself -- which is, after all, where it matters.

But how does it work? The constraint is effective because it blocks the write to the book of record.  In the abstract, the constraint gets tested within the database while the write lock is held; the writes themselves have been serialized and each write in turn needs to be consistent with its predecessors.

If you try to check the constraint before obtaining the write lock, then you have a race; the book of record can be changed by another transaction that is in flight.


Single writer sidesteps this issue by effectively making the write lock private.

With multiple writers, each can check the constraint locally, but you can't prove that the two changes in flight don't conflict with each other.  The good thing is that you don't need to - it's enough to know that the book of record hasn't changed since your checked it.  Logically, each write becomes a compare and swap on the tail pointer of the model history.

Of course, the book of record has to trust that the model actually performed the check before attempting the write.

And implementing the check this way isn't particularly satisfactory.  There's not generally a lot of competition for email addresses; unless your problem space is actually assignment of mail boxes, the constraint has generally been taken care of elsewhere.  Introducing write contention (by locking the entire model) to ensure that no duplicate email addresses exist in the book or record isn't particularly satisfactory.

This is already demonstrated by the fact that this problem usually arises after the model has been chopped into aggregates; an aggregate, after all, is an arbitrary boundary drawn within the model in an attempt to avoid unnecessary conflict checks.

But to ensure that the aggregates you are checking haven't changed while waiting for your write to happen?  That requires locking those aggregates for the duration.

To enforce a check across all email addresses, you also have to lock against the creation of new aggregates that might include an address you haven't checked.  Effectively, you have to lock membership in the set.


If you are going to lock the entire set, you might as well take all of those entities and make them part of a single large aggregate.

Greg Young correctly pointed out long ago that there's not a lot of business value at the bottom of this rabbit hole.  If the business will admit that mitigation, rather than prevention, is a cost effective solution, the relaxed constraint will be a lot easier to manage.






Wednesday, August 10, 2016

RESTBucks Client

Where are the agents?

Personally, I never understood how overgrowth can be a good thing: 12,000 APIs means 12,000 different ways of engaging in machine-to-machine interaction. -- Ruben Verborgh
 Good to know that other people have tread this ground before I start ranting.

I was introduced to Robert Zubek's needs based ai paper recently, which returned my thinking to a problem I've struggled with in understanding hypermedia.

To wit, what does it mean to have a client that is decoupled from the server?

We know from the experience of the web that hypermedia controls are a big part of it -- allowing the server to communicate the available affordances to the client means that the server can change the how as it likes, and the client doesn't need to survey the possibility space, but can instead limit itself to the immediately relevant set.

So we take a look at a hypermedia driven example -- for instance, Jim Webber's RESTBucks.  The domain application protocol looks something like this:

Riddle: what happens to a client coded to search for these controls if the shop decides to comp your order?  What happens when we shop at Model T Coffee?
You can have your coffee any way you like, as long as it's black.
In either case, if the client is focused on following the protocol, it's screwed.  The trap is the same client/server coupling we fled to hypermedia to escape.  Instead, we should be thinking in terms of multiple protocols -- if RESTBucks happens to be offering a take-free-coffee endpoint today, clients that understand how to negotiate it should be able to understand it.

What I'm beginning to see is that there is a choice; instead of teaching a client that a specific extended protocol will produce a goal, we may be able decompose the protocol into a number of different pieces that can be composed to reach a goal -- if we can manage to describe the affordances in a way that the client can understand, then we can start building complicated protocols out of simpler ones and glue.

Maybe -- we still need a client that can snap together the legos in the right order.

Friday, July 22, 2016

Lessons of Pokémon GO

Chris Furniss
But what Pokémon GO is benefitting from right now is a emergent mentorship that almost completely replaces a traditional tutorial. I would argue that this emergent mentorship is even more powerful than an actual tutorial. Mentorship increases engagement through social bonds. You've probably experienced this already with Pokémon GO: teaching a confused player something about the game that you've figured out already not only makes you feel smart and altruistic, it assuages confusion for the person you've mentored in a way that is highly personalized, and therefore more impactful.
 Interesting to make that behavior part of the interface: "I'm not going to give you a hint, but I will connect you to a couple of nodes in your own social graph that have figured it out."


Friday, June 24, 2016

Data Science in a Nutshell

Data Science in a Nutshell

Generating Type I errors by using second hand data to model third order effects.

The problem with language

The Problem with Language

We only know how to share language with entities that are capable of pattern recognition.

Friday, June 3, 2016

Session Data

A recent question on stack exchange asked about storing session data as a resource, rather than as part of a cookie.  In the past, I had wondered if the transience of the session data is the problem, but that's really only distraction from the real issue

Fielding, in his thesis, wrote
Cookie interaction fails to match REST's model of application state, often resulting in confusion for the typical browser application.
 What kind of confusion is he talking about here?

The first architectural constraint of REST is Client Server (3.4.1).  The client here sends messages to the server there, and gets a message in return.

Riddle - how does the server interpret the messages that it receives?  In a stateless client server architecture, interpreting the client message is trivial, because the message itself is derived from the state of the client application.  Which is to say that the client and server are in agreement, because the client created the message from its local local application state, and the message is immutable in transit, and the server has no application context to add to the mix.

When you drop the stateless architectural constraint, and introduce session data, the client and server no longer necessarily understand the message the same way: the client creates the message from its local application state, the message transits unchanged, but the server applies the application context stored in the session when interpreting the message.

In the happy path, there's no real problem: the session data applied by the server is the same data that would otherwise have been included by the client in the message, and the resulting interpretation of the message is equivalent using either architectural style.

Outside the happy path, sessions introduce the possibility that the actual state of the client and that state assumed by the server drift.  The client sends a message, expecting it to achieve desired end A, but because the server's copy of the session data is not in sync, the message is understood to prefer outcome A-prime.  Furthermore, with the session data "hidden" on the server, you may end up with quite a few mismatched messages going back and forth before the client begins to realize that a disaster is in the making.

The fundamental breakdown is here: the server does not know what the state of the application on the client is.  It can know what messages have been received, it can know what messages have been sent in response, but it doesn't know (without specific correlation identifiers being built into the messages) that the dispatched messages have arrived.

Furthermore
The application state is controlled and stored by the user agent and can be composed of representations from multiple servers. In addition to freeing the server from the scalability problems of storing state, this allows the user to directly manipulate the state (e.g., a Web browser's history), anticipate changes to that state (e.g., link maps and prefetching of representations), and jump from one application to another (e.g., bookmarks and URI-entry dialogs).
Which is to say, the client is explicitly allowed to rewind to any previously available state, cached or reconstructed from its own history, without consulting the server.

The server defines allowed application states, and the hypermedia controls that lead to new application states, but it doesn't control which state in the sea of revealed application states is the current one.

The client, not the server, is the book of record for application state.

Why is it OK to PUT an order for a cup of coffee ?  Because in that case, all of the ambiguity is in the state of the resource, not the state of the message.  The client and the server both share the same precise understanding of what the message means, because all of the required context appears within the message itself.  The client rewinds to some previous state, and fires off an obsolete message, and the server is able to respond "I know precisely what you mean, but you aren't allowed to do that any more".  There's no confusion here; the state of the resource has simply evolved since the hypermedia control was described.

So long as everybody agrees what the message means, there is no confusion.  That's why the happy path looks OK -- if the client and the server are still sharing the same assumptions about message context, the confusion doesn't arise; the client and the server happen to mean the same thing by fortuitous accident, and it all "just works".  Until it doesn't.



 


Friday, May 27, 2016

The name of the URI is callled....

I just found myself proposing the following "RESTful" URI:


/userStories?asA=emailScammer&iWantTo=mineEmailAddresses&soThat=iCanBroadcastMoreSpam 


I'm not sure I was kidding.

Followup: I used query parameters; HTML is a successful implementation of a media-type which supports hypermedia controls.  It's an ideal reference implementation for illustrating RESTful principles.

But it is not without its limitations.  URI Templates offer a lot of flexibility; web forms -- not so much.  You expose query parameters, or you force the client to traverse a graph to find the control that they want.

Digging around for URI design guidelines, I found this summary by K. Alan Bates

Hierarchical data is supposed to be represented on the path and with path parameters. Non-hierarchical data is supposed to be represented in the query. The fragment is more complicated, because its semantics depend specifically upon the media type of the representation being requested.

Moving the parameters of the story from the query string would be preferred, because it correctly represents that this the known identification of a resource, rather than a specification for a resource that may not have any matches.  Furthermore, doing so better conforms to the convention that path segments represent hierarchy.  For instance, it's reasonable to suppose that the URI for the story card ought to look like:


/userStories/asA=emailScammer&iWantTo=mineEmailAddresses&soThat=iCanBroadcastMoreSpam/card

The design guidelines in the RESTful Web Services Cookbook suggest an improvement on the previous design...

Use the comma (,) and semicolon (;) to indicate nonhierarchical elements in the path portion of the URI.
Richardson & Ruby, in Chapter 5 of RESTful Web Services were a bit more specific
I recommend using commas when the order of the scoping information is important, and semicolons when the order doesn't matter.
I'm not actually sure if the template of the user story should be considered ordered or not.  There are a lot of references to the Cohn template, a few that point out that maybe the business value should get pride of place.

Me?  I'm going to represent that the ordering of these elements matters, because that allows me to use the delimiter that looks like the punctuation on the story card itself

/userStories/asA=emailScammer,iWantTo=mineEmailAddresses,soThat=iCanBroadcastMoreSpam/card


Better, but I don't like equals, and none of the other sub-delims improve the outcome.  Today, I learned that RFC 3986 offers me an out -- the production rules for path segments explicitly include colon!  

/userStories/asA:emailScammer,iWantTo:mineEmailAddresses,soThat:iCanBroadcastMoreSpam/card

In all, an educational exercise.  Didn't learn whether or not I was kidding.