Cascade Faliure: September 2023

Thursday, September 28, 2023

TDDbE: How Suite It Is

As his final bow in this section, Beck writes a new test case for TestSuite.

A couple things stand out here.

First, the notion that TestCase/TestSuite is an example of the Composite "design pattern" is not something that is falling out of the test -- it's an insight that Kent Beck has because he has written multiple xUnit implementations already. The TestCase code doesn't currently conform to that pattern because Beck was pretending that he didn't know this.

Because he got this far before "discovering" TestSuite, he has a small pile of work to redo - in this case, nothing high risk (toy problem, he has tests, he understands the change, he didn't let the original implementation stray too far from where it was always going to end up, and so on).

That's the happy version - the change happens before the code really starts to ossify.

What this brings to mind for me is Jim Coplien's observation (Beust claims it is an exact quote, but I haven't been able to verify that via the provided transcript) about YAGNI leading to an architectural meltdown.

Here, we have relatively little investment the old idea, so the cost of change is pretty trivial. But this example may not be representative of the general case.

Second - are we sure that the design that is emerging here is good? The story ends in sort of an ugly spot - there's a lot of work left to do, although not necessarily any new lessons. Don't confuse "these are the things we do" with "these are the results we settle for".

Which I think is unfortunate, in that one of the communication gaps I see is that people don't share the same understanding of how much remove duplication is supposed to happen before you move on.

Possibly interesting exercise: see if you can get to pick your favorite moden python testing framework without binning this work and starting fresh.

TDDbE: Dealing with Failure

Beck switches into a smaller grained test; this introduces a testFailed message, which gives him the permission that he needs to extract the error count and use general formatting to eliminate the duplication in the test summary message.

There is a subtlety hidden inside this method.... However, we need another test before we can change the code.

I don't find this section satisfactory at all.

Let's review: in chapter 21, we started working on testFailedResult, which was intended to show that the failure count is reported correctly by TestResult when a broken test is run. That test "fails wrong": it exits on the exception path rather than on the return path. So we "put this test on the shelf for the moment".

We take a diversion to design the TestResult::summary without the TestCase baggage.

All good up to that point - we've got a satisfactory TestResult, and a TestResult::testFailed signal we can use to indicate failures.

So now we unshelf the test that we abandoned in the previous chapter. It fails, but we can make it pass by introducing an except block that invokes TestResult::testFailed.

However, the scope of the try block is effectively arbitrary. It could be fine grained or coarse grained -- Beck's choice is actually somewhere in the middle. TADA, the test passes. Green bar, we get to refactor.

But he's going to make a claim that we can't change the boundaries of the try block without another test...?

I think the useful idea is actually this: the current suite of tests does not include constraints to ensure that exceptions thrown from setUp are handled correctly. So we'd like to have tests in the suite that make that explicit, so it should go into the todo list.

What's missing is the notion of test calibration: we should be able to introduce tests that pass, inject a fault to ensure that the tests can detect it, remove the fault, and get on with it. Of course, if we admit that works, then why not test after...?

So I think what's irritating me here is that ceremony is being introduced without really articulating the justifications for it.

Contrast with this message:

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence -- Kent Beck, 2008

Monday, September 18, 2023

TDDbE: Counting

In general, the order of implementing tests is important. When I pick the next test to implement, I find a test that will teach me something and which I have confidence I can make work.

Reminder: step size varies with confidence - when things are clear, we set a faster pace.

Note that this isn't the only possible strategy; the "Transformation Priority Premise" proposes that we want to always make the change that is "closest" to where we are now -- with "closest" being determined by a sort of weighted priority function.

What Beck wants to do is implement tearDown. But it is difficult to test - exception is a bit overloaded here. So instead of pursuing the difficult implementation, he introduces an intermediary -- separating what is complicated from what is difficult.

Here, that takes the form of a data structure to keep track of what is going on within the framework; to begin, just tracking tests run and tests failed.

What caught my eye here: having introduced a trivial fake implementation, he's immediately willing to "remove duplication" in counting the number of tests run - introducing an initialized value, incrementing the value, and so on.

But he does not choose to do the analogous refactoring on the test failed count, instead sticking with the hard coded value. "The tests don't demand it." I'm suspicious of that justification, in so far as with a single test, they didn't demand changing the test count either.

You might be able to make an argument from the idea that we started with an implicit counter for tests running, but there is no implicit counter for failures (because this test doesn't include the failing test code path) and therefore it's appropriate to make the former counter explicit, but not yet for the latter counter.

On the other hand, you probably can justify it via "make the next change easy".

Another point to note: in this first test, TestResult is not isolated from TestCase -- TestResult::testStarted is invoked by TestCase, not by the test. Beck is including within his observation the behavior of TestResult and also the protocol shared between TestResult and TestCase.

Thursday, September 7, 2023

TDDbE: Cleaning Up After

Doing a refactoring based on a couple of early uses, then having to undo it soon after is fairly common. Some folks wait until they have three or four uses before refactoring because they don't like undoing work. I prefer to spend my thinking cycles on design, so I just reflectively do the refactorings without worrying about whether I will have to undo them immediately afterwards.

I find myself wanting to push back on this a little bit. I don't mind refactoring the measured code on the regular - we learn as we go, and that's fine.

I fret a little bit about refactoring the measuring code; we red-green when the test is new, which gives us additional confidence that the test is measuring what we think it is, but if we are continuing to tinker with the guts of the test we should be recalibrating from time to time.

Which is to say, we red-green on the theory that its not enough to assume that we're measuring what we think we are, we need additional evidence. It seems inconsistent to assume that we can keep evolving the test on the assumption that it still measures what it did before.

Spying on the calls as the framework runs the test is a clever idea, and of course there's no reason that the telemetry needs to be so complicated as an "object".

Got myself really twisted up looking at the final code, until I realized that the code in the text doesn't necessarily match the code he's narrating -- the code in my edition of chapter 20 includes some artifacts that won't be introduced until chapter 21.

Nonetheless, I find myself thinking again that the point is to catch mistakes, not to do The Ritual flawlessly.

A git history with each change following along with the text might be an interesting exercise. Or for this specific example, maybe a jupyter notebook kind of thing.

Wednesday, September 6, 2023

TDDbE: Set the Table

When you begin writing tests, you will discover a common pattern...

I'm not a fan of Arrange-Act-Assert myself; (Arrange-Act)-Assert would be better -- we shouldn't entangle taking a measurement with checking that the measurement is satisfactory.

But arrange, wherever you put it, is likely to get re-used; we often have many instances of tests that are using the same constellation of objects to produce a measurement.

And if you are building an xUnit, then your customers are going to are going to expect that the framework will work a particular way, including having implicit facilities for arranging object constellations. So be it.

Tuesday, September 5, 2023

TDDbE: First Steps to xUnit

Hey, we're in python!

We need a little program that will print out true if a test method gets called, and false otherwise.

Two things strike me with this exercise.

First, it reminds me of the practice of working through the "imperative shell" to discover where the boundary is going to be between the testable core and the humble object.

Second, it reminds me that this part of the exercise tends to be notable for its lack of ambition. It would be easy to imagine, for example, what the finished console output of running the test should be, and refactor the test toward that end. Instead, Beck starts with a trivial output, applying the Guru Checks Output pattern on the early runs.

Having created code that produces the correct answer (for this one trivial case), Beck now sets to work refactoring the exercise - beginning the march toward the design that is currently in his head (he may be taking baby steps, but the section title is "The xUnit Example" - he's working toward a specific framework).

We're not really looking at a test "driving" the design, at least not yet. Right now, the test is just the ratchet, ensuring that we don't lose ground as the code is cleaned up.

Sunday, September 3, 2023

TDDbE: Money Retrospective

The biggest surprise for me in coding the money example was how different it came out this time.

In other words, the design isn't something that merges organically from the test; the tests and the design co-evolve under the guidance of the programmer. If the programmer has an inspiration, then the programmer writes different tests to constrain the behavior of the implementation, and then the implementation follows.

The tests that we have in this example are not metaphor agnostic, as a review of them will show. They are a sort of long term bet that this metaphor will stick in this code base.

The tests that are a natural by-product of TDD are certainly useful enough to keep running as long as the system is running. Don't expect them to replace the other types of testing.

I'm reminded of Coplien's comments about unit tests and changing signals; we should think carefully about how much we want to invest in keeping these tests running as the system changes. In this toy example, the code isn't finished - we'd likely want to keep the tests around while the system was under development, because that tells us something important about the changes that we are making.

JProbe reports only one line in one method not covered by the test cases -- Money.toString(), which we added explicitly as a debugging aid, not real model code. [emphasis added]

It seems to me that much of the extremism of TDD has built into it the idea that our focus is on model code, rather than on effects (especially true of the "classical" school).

Removing duplication between test and code as a way to drive the design

I'm happy to see this message reinforced again, but I'm not sure its useful - until you see it, you don't see it, and you can be doing TDD for a long (long) time without seeing it.

TDDbE: Abstraction, Finally

This test is a little ugly, because it is testing the guts of the implementation, not the externally visible behavior of the objects. However, it will drive us to make the changes we need to make....

I think there's a lot to unpack here.

Writing tests that were tightly coupled to the guts of the implementation was something that people did a lot of experimenting with in the early going - how do you "test drive" recursion, for example? One important lesson: tests that overfit their implementations make those implementations harder to change.

(Part of the problem here is that this example is Java circa 2002; generics haven't been added to the language yet, and so designs like Money extends Expression<Money> aren't an option.)

I think "externally visible behavior" is a spelling error - the type of the object, in Java, certainly is externally visible - you can query the type of an object by sending it a message, just like you would for any other query (ex: toString()).

It's not a domain behavior - which is the more important distinction.

I'm somewhat amused by the fact that Beck is reluctant to couple the test to a specific return type, but has no difficulty coupling tests to specific constructors.

On the whole, the overall design strikes me as unsatisfactory; it passes the tests, which ain't nothing, but "remove duplication" has had a rather devastating impact on the design in the simple case - even within these toy examples. It's not clear to me that a good trade has been made.

Saturday, September 2, 2023

TDDbE: Mixed Currencies

We're finally ready to write the real test that motivates this entire problem... and so of course Beck wants to change up the API again.

This is what we'd like to write.

I'm not so sure. What immediately stands out to me is that the proposed test is using mixed vocabulary. We have Money, and Banks (a facade in front of an exchange rate), and Expressions. One of these things is not like the others.

Anyway, the plan is to write the test using the language that will currently work. That test fails, so the implementation gets updated to restore the green bar.

Then, under the protection of Green and the compiler, the interfaces are rewritten into the language of Expressions.

To avoid the ripple effect, we'll start at the edges and work our way back to the test case.

A good reminder that we have the tests so that we can take small steps with immediate feedback, rather than making all of the changes in one chunk.

TDDbE: Change

Describing data with Java is not as easy as it should be, and that's certainly true of Java 2003. What I'd like to see in a currency exchange test is a list/map of exchange rates, a list of amounts, and a target currency -- without necessarily committing to a specific arrangement of that information into "objects".

But of course, that isn't what we get - we're in the Kingdom of Nouns, and all of the information needs to be put somewhere.

It feels to me as though we're ending up with tests that are more tightly couspled to internal details than I would like - it's somewhat hard to say: are we working toward tests that are describing the published interface of the utilities that are being created? or are we working toward tests that are intended to describe the workings of the utilities to future maintainers.

In modern Java, we might get a better sense for that by looking at what is exported by the modules, but of course we don't have that advantage here (and perhaps wouldn't anyway - I wouldn't want to be introducing a bunch of module ceremony while the interface is still in flux).

I do think it's interesting to watch Beck try to remove the duplication in Bank::rate. His green bar implementation is a choice between two hard coded values - that's no good, because the hard coded values here duplicate the values in the test. So he takes a guess at how that duplication should be managed, and creates a new Pair class to manage it.

Note that Pair comes into existence without an isolated test - it's an internal detail of how Bank manages its data structures.

It also interests me to notice that, in this particular case of introducing a regression during a refactoring, Beck's answer is to add another failing test.

The goal isn't to "follow the rules". The goal is "clean code that works".