Sunday, October 13, 2024

TDDbE: xUnit Patterns

 Much more so than the previous chapters in this book, this one feels like archeology.

Write boolean expressions that automate your judgment about whether the code worked.

Primarily, the point of the automated mistake detectors is that they don't require sapience to reach a conclusion (see Back and Bolton on checking vs testing).  There shouldn't be any maybe or fuzzing of the outcomes; we're on the happy path, or we're not.

The expected value generally goes first....

Today, I read this as an indicator that the right abstractions hadn't yet emerged; we were all still thinking about the "two arguments to assert", rather than say a measurement that needs to satisfy a specification.  And of course some frameworks flipped the arguments, and finding that programmers have chosen an order that is the reverse of what the current framework uses is a common error.

I'm aware that I am swimming against the tide in insisting that all tests be written using on public protocol.

I think is "all" is a bit too extreme; also, this doesn't surprise me.  You can go back and forth on the various trade offs, but ultimately it comes down again to the fact that the tests work for the developers, not the other way around.

I was interested to note the remarks about dropping into the Smalltalk debugger, as this same behavior was an important motivation for Mock Objects.

Each new kind of fixture should be a new subclass of TestCase.  Each new fixture is created in an instance of that subclass, used once, and then discarded.

I haven't used "fixtures", especially fixtures that require explicit setup/teardown, in a long long time.

xUnit circa 2003 implemented fixtures using overridden class methods because... well, because most of the modern alternatives hadn't been realized yet.  I don't love that design, because I don't think it is obvious enough that the fixture is being re-initialized for every test call, so the class design looks like shared potentially mutable state.

Test "outlines" are now more commonly realized via nesting of contexts, so that you get a report that is a hierarchy of test results.

My experiences suggest that TestSuites are extinct in languages where the test running has access to class metadata, because everything is discovered, often with an annotation language that allows for enabling and disabling tests, or mapping tests against a wider variety of inputs, or whatever.

In summary: languages, test frameworks, and development environments have made a lot of real progress over the past 20 years; the details of how we made do without modern conveniences aren't particularly compelling.

Saturday, October 12, 2024

TDDbE: Green Bar Patterns

The chapter opens with a discussion of Fake It, Triangulate, and a light comparison of the two.

Does Fake It violate the rule that says you don't write any code that isn't needed?  I don't think so, because in the refactoring step you are eliminating duplication of data between the test case and the code.

More support for my favorite lesson: duplication of data between the test case and the code is all the permission you need to continue refactoring.

I only use Triangulation when I'm really really unsure about the correct abstraction for the calculation.

I think Triangulation really suffers from the fact that the examples typically used to demonstrate it are contrived.

Beck uses addition here, he used an equality check back in chapter three -- both are cases where you could just type in the correct implementation, rather than faffing about with triangulating.

The sorts of problems that I would expect to benefit from triangulating would be more like sudoku, or code formatting, or even line wrapping, where you may need to work in the code for a bit before you get the key insight that gives you confidence in a specific abstraction.

Sandi Metz demonstrates with "simple green" that you might prefer the duplication to the abstraction anyway.  Her advice was to prefer duplication over the wrong abstraction.

Back in Chapter 17 Kent observed that the money example went in "a completely different direction".  And as I understand it he didn't "triangulate" to choose between candidate designs - he discovered a metaphor and just went where it lead him.  This tells me that triangulation is more about the uncertainty that the abstraction will in fact be better than the duplication.

Of course, you could also try the abstraction, and then use the source control system to revert to a checkpoint if you decide that the refactoring hasn't led to an abstraction you trust.

There's no particular virtue in the halfway nature of Fake It and Triangulation.

There ceremonies are a means to an end....

  Keep track of how often you are surprised by red bars....

"Call your shots" later became a more general thing - the earliest reference I can find is Needham 2010, but he in turn attributes the idea to Kent.

Monday, August 26, 2024

TDDbE: Testing Patterns

When I write a test that is too big, I first try to learn the lesson.  Why was it too big? What could I have done differently that would have made it smaller? ... I delete the offending test and start over.

Delete and start over is a power move, and will later lead to the exploration of Test Commit/Revert.

This reminds me of Mikado method: that we might want to systematically reset and explore smaller goals until we get to an improvement we actually can make now, and then work our way back up the graph until the code is ready for the big test.

I think you have to be fairly far along in the design to start running into this sort of thing; when the design is shallow, you can usually just drop in a guard clause and early return, which gets the test passing, and now it is "just" a matter of integrating this new logic into the rest of the design.  But when the current output is the result of information crossing many barriers, getting the guard clause to work can require rework across several boundaries.

How do you test an object that relies on an expensive of complicated resource?

Oh dear.

OK, context: Beck is writing prior to the development of the pattern language of Test Doubles by Gerard Meszaros.  He's re-purposing the label introduced in the Endo-Testing paper, but it's not actually all that great a fit.  It would be better to talk about test stubs (Binder, 1999), or doubles if you wanted to be understood by a modern audience.

Anyway, we get a couple different examples of stubs/doubles that you might find useful.

Leaving a test RED to resume at the next session is a pretty good trick; but I think you'll do better to treat the test as a suggestion, rather than a commitment, to resume work at a particular place - the most important item to work next might change between sessions, and the failing test will be a minor obstacle if you decide that refactoring is worthwhile.

 

 

Saturday, August 24, 2024

TDDbE: Red Bar Patterns

On getting started - look for a one step test:

Pick a test that will teach you something and that you are confident you can implement.

What he argues here is that the metaphorical direction of our work is from known to unknown

Known-to-unknown implies that we have some knowledge and experience on which to draw, and that we expect to learn in the course of development.

Pretty good stuff there.  But also I can't help but observe that we often know more about the outside/top of our implementation than we do about the core/bottom.

Beginning with a realistic test will leave you too long without feedback.

I'm not quite sold on this one, certainly not universally.  The mars rover exercise, for instance, feels straightforward when approached with a "realistic test" first.  That may be a case of context - for the rover, choosing a representation of the input and a representation of the output, these are not particularly difficult choices unless one gets the itch to couple the implementation of the test tightly to the implementation of the solution.  Also, the rover exercise includes a specific complicated example to consider as the first test - the specific example where you are given the correct answer.

For a problem like polygon reduction, there may not be a specific complicated example to consider, so one might as well extract what value one can from a trivial example first, and then further extend.

The original polygon reduce discussion (Tom Plunket, 2002-02-01) is currently archived by Google.

Beware the enthusiasm of the newly converted.  Nothing will stop the spread of TDD faster than pushing it in people's faces.

Prophecy? sounds like prophecy.

Another argument in favor of checklists: as a defense against analysis paralysis - get the new idea put down on the test checklist, then put it down.

Also, throw away the code and start over is a big power move - I love it.  If it's not going well, declaring bankruptcy is a strong alternative to continuing to dig.

 

Saturday, June 1, 2024

HTTP Status Code Best Practices

Let's walk through a sequence of HTTP illustrations of how status codes should work.

First example:

GET /docs/examples/health-check/healthy

Where /docs is just a mapping to some folder on disk, so this is really "just" a file, the contents of which are an example of the response content returned by a health check resource when the system is, in fact, healthy.  When a client sends this HTTP request, and the server correctly processes that reques, copying the file contents into the body of the response, what should the HTTP status code be?

Answer: 200 - the request was successful, the response content is a representation of the target resource.  This is textbook, and should be unsurprising.

Second example:

GET /docs/examples/health-check/unhealthy

Assume the same mapping as before; this is really "just" a file, in the same directory in the previous example, but the contents of the file are an example of the response content returned by a health check resource when the system is unhealthy.  When a client sends this HTTP request, and the server correctly processes that request, copying the file contents into the body of the response, what should the HTTP status code be?

Answer: 200 - the request was successful, the response content is a representation of the target resource.  Again, we're just copying the contents of a file, just as though we were returning a text document, or a picture of a cat, or a file full of javascript.  The client asked for the file, here it is.

Third example:

GET /api/health-check

Here, we are connecting to a dynamic resource, something that actually runs the internal checks against the system, and sends back a report describing the health of the system.  Let's assume that the API is currently healthy, and furthermore that it is well documented; the response content returned when the API is healthy exactly matches the corresponding example in the documentation (/docs/examples/health-check/healthy).  When a client sends this HTTP request, and the server correctly processes that request while the API is healthy, therefore producing response content that describes a healthy API, what should the HTTP status code be?

Answer: 200 - the request was successful, the response content is a representation of the target resource.  Everything is in the green, we're able to do what the client asked.  It is, in fact, the same response that we would get from a boring web server returning the contents of a static document (as seen in the first example above).

Fourth example:

GET /api/health-check

Here, we are connecting to a dynamic resource, something that actually runs the internal checks against the system, and sends back a report describing the health of the system.  Let's assume that the API is currently unhealthy, and furthermore that it is well documented; the response content returned when the API is healthy exactly matches the corresponding example in the documentation (/docs/examples/health-check/unhealthy).  When a client sends this HTTP request, and the server correctly processes that request while the API is unhealthy, therefore producing response content that describes a unhealthy API, what should the HTTP status code be?

Answer: 200 - the request was successful, the response content is a representation of the target resource.  Everything is in the green, we're able to do what the client asked.  It is, in fact, the same response that we would get from a boring web server returning the contents of a static document (as seen in the second example above).

The key idea being this: that the HTTP status code is describing the semantics of the HTTP response, not the semantics of the response content.  The status code is metadata of the transfer-of-documents-over-a-network domain, it tells general purpose HTTP components what is going on, how different headers in the response may be correctly interpreted.  It informs, for example, HTTP caches that any previously cached responses for this same request can be invalidated.

This is the uniform interface constraint at work - the semantics of all of the self descriptive messages in our system are the same, even though the semantics of the targeted resources are different.  Our API, our help docs, Pat's blog, Alex's online bookshop, Kim's archive of kitten pictures all use the same messages, and any HTTP client can understand all of them.

BUT...

We lost the war years ago.

The right way to share information with a general purpose load balancer would be to standardize some link relations, or a message schema; to encode the information that the load balancer needs into some readily standardizable form, and then have everybody code to the same standard.

But the expedient thing to do is to hijack the HTTP status code, and repurpose it to meet this specialized need.

A 429 response to a request for a health-check should mean that the client is exceeding its rate limit, a 500 response should mean that some unexpected condition has prevented the server from sending the current representation of the health check resource.

But that's not where we ended up.  And we ended up here long enough ago that the expedient choice has been normalized.  Do the Right Thing ™ did not have a sufficient competitive advantage to overcome the advantages of convenience.

And this in turn tells us that, at some scales, the expedient solutions are fine.  We can get away with writing APIs with specialized semantics that can only be correctly understood by bespoke clients that specialize in communication with our API because, well, because it's not a problem yet, and we'll probably need to redesign our system at least once before the scale of the problem makes a difference, and in the meantime the most likely failure modes are all somewhere else; a given API may never get to a point where the concessions to convenience matter.

So... which best practices do you want?

 


Tuesday, May 14, 2024

TDDbE: Test-Driven Development Patterns

 Gotta have patterns if you want to be published to this audience.

Why does test the noun, a procedure that runs automatically, feel different from test the verb, such as poking a few buttons and looking at answers on the screen?

I find this parallel more useful when talking about design the noun vs design the verb, especially within the context of a practice that promises to improve design.

Beck's discussion of "isolated tests" is really twisted up, in that this heading includes two very different properties that he wants:

  • Tests that are order independent
  • Tests that don't overlap (two tests broken implies two problems)

I have seen people get really twisted up on the second property, when (within the context of TDD) it really isn't all that important: if I'm running my tests regularly, then there are only a small number of edits between where I am now and my last known good state; it doesn't "matter" how many tests start failing, because I have tight control over the number of edits that introduced those problems.

A trivial example: I'm refactoring, and I make a change, and suddenly 20 tests are failing.  Disaster!  How long does it take me to get back to a green state?  Well, if I revert the changes I just made, I'm there.  It really doesn't matter whether I introduced one problem twenty - fixing everything is a single action and easy to estimate.

The case where I care about being able to estimate the number of real problems?  Merge.

Isolating tests encourages you to compose solutions out of many highly cohesive, loosely coupled objects.  I've always heard that this was a good idea....

I'm still suspicious of this claim, as my experiences is that it delivers "many" far more often than it delivers either "highly cohesive" or "loosely coupled".

I think of Beck's justifications for the test list as paging information out of (human) memory (I wrote them down in my diary so I wouldn't have to remember).  What I hadn't recalled (perhaps I should have written it down) is that in Beck's version he's not only including tests, but also operations and planned refactorings.  The Canon version ("test scenarios you want to cover") is closer to how I remember it.

Test First: "you won't test after" - Beck's claim here is interesting, in that he's talks of the practice as primarily about stress management (the "virtuous cycle"), with the design and scope control as a bit of energy to keep the cycle going.

I need to think more about scope control -- that benefit feels a lot more tangible than those asserted about "design".

I find assert first interesting for a couple of reasons.  First, it seems clear to me that this is the inspiration for TDD-As-If-You-Meant-It.  Second, the bottom up approach feels a lot like the technique used to "remove duplication" from early versions of a design (if you aren't caught in the tar pit of "triangulation").

I don't find it entirely satisfactory because... well, because it focuses the the design on what I feel should be an intermediate stage.  This demonstration never reaches the point where we are hiding (in the Parnas sense) the implementation details from the test; that idea just wasn't a thing when the book was written (and probably still isn't, but it's my windmill, dammit.)

Never use the same constant to mean more than one thing,

This is a sneaky important idea here; fortunately the cost of learning the lesson first hand isn't too dear.

Evident Data makes me suspicious, because I've been burned by it more than once: broken code that passes broken tests because both components make the same errors translating from domain arithmetic to computer arithmetic.  The idea ("you are writing tests for a reader, not just the computer") is an important one, but it's expression as described here has not been universally satisfactory.

TDDbE: xUnit Retrospective

Beck offers the task of writing an xUnit clone as a possible exercise when learning a new programming language, on the grounds that you will quickly explore "many of the facilities [you] will be using in daily programming."

Given the contents of the earlier chapters, this feels like an invitation to draw the rest of the owl.