Tuesday, May 14, 2024

TDDbE: Test-Driven Development Patterns

 Gotta have patterns if you want to be published to this audience.

Why does test the noun, a procedure that runs automatically, feel different from test the verb, such as poking a few buttons and looking at answers on the screen?

I find this parallel more useful when talking about design the noun vs design the verb, especially within the context of a practice that promises to improve design.

Beck's discussion of "isolated tests" is really twisted up, in that this heading includes two very different properties that he wants:

  • Tests that are order independent
  • Tests that don't overlap (two tests broken implies two problems)

I have seen people get really twisted up on the second property, when (within the context of TDD) it really isn't all that important: if I'm running my tests regularly, then there are only a small number of edits between where I am now and my last known good state; it doesn't "matter" how many tests start failing, because I have tight control over the number of edits that introduced those problems.

A trivial example: I'm refactoring, and I make a change, and suddenly 20 tests are failing.  Disaster!  How long does it take me to get back to a green state?  Well, if I revert the changes I just made, I'm there.  It really doesn't matter whether I introduced one problem twenty - fixing everything is a single action and easy to estimate.

The case where I care about being able to estimate the number of real problems?  Merge.

Isolating tests encourages you to compose solutions out of many highly cohesive, loosely coupled objects.  I've always heard that this was a good idea....

I'm still suspicious of this claim, as my experiences is that it delivers "many" far more often than it delivers either "highly cohesive" or "loosely coupled".

I think of Beck's justifications for the test list as paging information out of (human) memory (I wrote them down in my diary so I wouldn't have to remember).  What I hadn't recalled (perhaps I should have written it down) is that in Beck's version he's not only including tests, but also operations and planned refactorings.  The Canon version ("test scenarios you want to cover") is closer to how I remember it.

Test First: "you won't test after" - Beck's claim here is interesting, in that he's talks of the practice as primarily about stress management (the "virtuous cycle"), with the design and scope control as a bit of energy to keep the cycle going.

I need to think more about scope control -- that benefit feels a lot more tangible than those asserted about "design".

I find assert first interesting for a couple of reasons.  First, it seems clear to me that this is the inspiration for TDD-As-If-You-Meant-It.  Second, the bottom up approach feels a lot like the technique used to "remove duplication" from early versions of a design (if you aren't caught in the tar pit of "triangulation").

I don't find it entirely satisfactory because... well, because it focuses the the design on what I feel should be an intermediate stage.  This demonstration never reaches the point where we are hiding (in the Parnas sense) the implementation details from the test; that idea just wasn't a thing when the book was written (and probably still isn't, but it's my windmill, dammit.)

Never use the same constant to mean more than one thing,

This is a sneaky important idea here; fortunately the cost of learning the lesson first hand isn't too dear.

Evident Data makes me suspicious, because I've been burned by it more than once: broken code that passes broken tests because both components make the same errors translating from domain arithmetic to computer arithmetic.  The idea ("you are writing tests for a reader, not just the computer") is an important one, but it's expression as described here has not been universally satisfactory.

TDDbE: xUnit Retrospective

Beck offers the task of writing an xUnit clone as a possible exercise when learning a new programming language, on the grounds that you will quickly explore "many of the facilities [you] will be using in daily programming."

Given the contents of the earlier chapters, this feels like an invitation to draw the rest of the owl.

Sunday, November 19, 2023

"Unit" Tests

A survey of the definition of "unit test", taken from references in my dead tree library.


I call them "unit tests" but they don't match the accepted definition of unit tests very well. -- Kent Beck, Test Driven Development by Example.


Unit testing means that one and only one unit is tested as such. -- Ivar Jacobson, Object-Oriented Software Engineering


Unit testing is the testing we do to show that the unit does not satisfy its functional specification and/or that its implementation structure does not match the intended design structure. -- Boris Beizer, Software Testing Techniques.


Module testing or unit testing is the verification of a single program module, usually in an isolated environment (i.e. isolated from all other modules).  -- Glenford Myers, Software Reliabililty.


The objective of unit testing is to attempt to determine the correctness and completeness of an implementation with respect to unit requirements and design documentation by attempting to uncover faults.... -- IEEE-1008-1987

Thursday, September 28, 2023

TDDbE: How Suite It Is

 As his final bow in this section, Beck writes a new test case for TestSuite.

A couple things stand out here.

First, the notion that TestCase/TestSuite is an example of the Composite "design pattern" is not something that is falling out of the test -- it's an insight that Kent Beck has because he has written multiple xUnit implementations already.  The TestCase code doesn't currently conform to that pattern because Beck was pretending that he didn't know this.

Because he got this far before "discovering" TestSuite, he has a small pile of work to redo - in this case, nothing high risk (toy problem, he has tests, he understands the change, he didn't let the original implementation stray too far from where it was always going to end up, and so on).

That's the happy version - the change happens before the code really starts to ossify.

What this brings to mind for me is Jim Coplien's observation (Beust claims it is an exact quote, but I haven't been able to verify that via the provided transcript) about YAGNI leading to an architectural meltdown. 

Here, we have relatively little investment the old idea, so the cost of change is pretty trivial.  But this example may not be representative of the general case.

Second - are we sure that the design that is emerging here is good?  The story ends in sort of an ugly spot - there's a lot of work left to do, although not necessarily any new lessons.  Don't confuse "these are the things we do" with "these are the results we settle for".

Which I think is unfortunate, in that one of the communication gaps I see is that people don't share the same understanding of how much remove duplication is supposed to happen before you move on.

Possibly interesting exercise: see if you can get to pick your favorite moden python testing framework without binning this work and starting fresh.

TDDbE: Dealing with Failure

Beck switches into a smaller grained test; this introduces a testFailed message, which gives him the permission that he needs to extract the error count and use general formatting to eliminate the duplication in the test summary message.

There is a subtlety hidden inside this method....  However, we need another test before we can change the code.

I don't find this section satisfactory at all.

Let's review: in chapter 21, we started working on testFailedResult, which was intended to show that the failure count is reported correctly by TestResult when a broken test is run.   That test "fails wrong": it exits on the exception path rather than on the return path.  So we "put this test on the shelf for the moment".

We take a diversion to design the TestResult::summary without the TestCase baggage.

All good up to that point - we've got a satisfactory TestResult, and a TestResult::testFailed signal we can use to indicate failures.

So now we unshelf the test that we abandoned in the previous chapter.  It fails, but we can make it pass by introducing an except block that invokes TestResult::testFailed.

However, the scope of the try block is effectively arbitrary.  It could be fine grained or coarse grained -- Beck's choice is actually somewhere in the middle.  TADA, the test passes.  Green bar, we get to refactor.

But he's going to make a claim that we can't change the boundaries of the try block without another test...?

I think the useful idea is actually this: the current suite of tests does not include constraints to ensure that exceptions thrown from setUp are handled correctly.  So we'd like to have tests in the suite that make that explicit, so it should go into the todo list.

What's missing is the notion of test calibration: we should be able to introduce tests that pass, inject a fault to ensure that the tests can detect it, remove the fault, and get on with it.  Of course, if we admit that works, then why not test after...?

So I think what's irritating me here is that ceremony is being introduced without really articulating the justifications for it.

Contrast with this message:

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence -- Kent Beck, 2008

Monday, September 18, 2023

TDDbE: Counting

In general, the order of implementing tests is important.  When I pick the next test to implement, I find a test that will teach me something and which I have confidence I can make work.

Reminder: step size varies with confidence - when things are clear, we set a faster pace.

Note that this isn't the only possible strategy; the "Transformation Priority Premise" proposes that we want to always make the change that is "closest" to where we are now -- with "closest" being determined by a sort of weighted priority function.

What Beck wants to do is implement tearDown.  But it is difficult to test - exception is a bit overloaded here.  So instead of pursuing the difficult implementation, he introduces an intermediary -- separating what is complicated from what is difficult.

Here, that takes the form of a data structure to keep track of what is going on within the framework; to begin, just tracking tests run and tests failed.

What caught my eye here: having introduced a trivial fake implementation, he's immediately willing to "remove duplication" in counting the number of tests run - introducing an initialized value, incrementing the value, and so on.

But he does not choose to do the analogous refactoring on the test failed count, instead sticking with the hard coded value.  "The tests don't demand it."  I'm suspicious of that justification, in so far as with a single test, they didn't demand changing the test count either.

You might be able to make an argument from the idea that we started with an implicit counter for tests running, but there is no implicit counter for failures (because this test doesn't include the failing test code path) and therefore it's appropriate to make the former counter explicit, but not yet for the latter counter.

On the other hand, you probably can justify it via "make the next change easy".

Another point to note: in this first test, TestResult is not isolated from TestCase -- TestResult::testStarted is invoked by TestCase, not by the test.  Beck is including within his observation the behavior of TestResult and also the protocol shared between TestResult and TestCase.

Thursday, September 7, 2023

TDDbE: Cleaning Up After

Doing a refactoring based on a couple of early uses, then having to undo it soon after is fairly common.  Some folks wait until they have three or four uses before refactoring because they don't like undoing work.  I prefer to spend my thinking cycles on design, so I just reflectively do the refactorings without worrying about whether I will have to undo them immediately afterwards.

I find myself wanting to push back on this a little bit.  I don't mind refactoring the measured code on the regular - we learn as we go, and that's fine.  

 I fret a little bit about refactoring the measuring code; we red-green when the test is new, which gives us additional confidence that the test is measuring what we think it is, but if we are continuing to tinker with the guts of the test we should be recalibrating from time to time.

Which is to say, we red-green on the theory that its not enough to assume that we're measuring what we think we are, we need additional evidence.  It seems inconsistent to assume that we can keep evolving the test on the assumption that it still measures what it did before.

Spying on the calls as the framework runs the test is a clever idea, and of course there's no reason that the telemetry needs to be so complicated as an "object".

Got myself really twisted up looking at the final code, until I realized that the code in the text doesn't necessarily match the code he's narrating -- the code in my edition of chapter 20 includes some artifacts that won't be introduced until chapter 21.

Nonetheless, I find myself thinking again that the point is to catch mistakes, not to do The Ritual flawlessly.

A git history with each change following along with the text might be an interesting exercise.  Or for this specific example, maybe a jupyter notebook kind of thing.