Sunday, November 19, 2023

"Unit" Tests

A survey of the definition of "unit test", taken from references in my dead tree library.

 

I call them "unit tests" but they don't match the accepted definition of unit tests very well. -- Kent Beck, Test Driven Development by Example.

 

Unit testing means that one and only one unit is tested as such. -- Ivar Jacobson, Object-Oriented Software Engineering

 

Unit testing is the testing we do to show that the unit does not satisfy its functional specification and/or that its implementation structure does not match the intended design structure. -- Boris Beizer, Software Testing Techniques.

 

Module testing or unit testing is the verification of a single program module, usually in an isolated environment (i.e. isolated from all other modules).  -- Glenford Myers, Software Reliabililty.

 

The objective of unit testing is to attempt to determine the correctness and completeness of an implementation with respect to unit requirements and design documentation by attempting to uncover faults.... -- IEEE-1008-1987

Thursday, September 28, 2023

TDDbE: How Suite It Is

 As his final bow in this section, Beck writes a new test case for TestSuite.

A couple things stand out here.

First, the notion that TestCase/TestSuite is an example of the Composite "design pattern" is not something that is falling out of the test -- it's an insight that Kent Beck has because he has written multiple xUnit implementations already.  The TestCase code doesn't currently conform to that pattern because Beck was pretending that he didn't know this.

Because he got this far before "discovering" TestSuite, he has a small pile of work to redo - in this case, nothing high risk (toy problem, he has tests, he understands the change, he didn't let the original implementation stray too far from where it was always going to end up, and so on).

That's the happy version - the change happens before the code really starts to ossify.

What this brings to mind for me is Jim Coplien's observation (Beust claims it is an exact quote, but I haven't been able to verify that via the provided transcript) about YAGNI leading to an architectural meltdown. 

Here, we have relatively little investment the old idea, so the cost of change is pretty trivial.  But this example may not be representative of the general case.

Second - are we sure that the design that is emerging here is good?  The story ends in sort of an ugly spot - there's a lot of work left to do, although not necessarily any new lessons.  Don't confuse "these are the things we do" with "these are the results we settle for".

Which I think is unfortunate, in that one of the communication gaps I see is that people don't share the same understanding of how much remove duplication is supposed to happen before you move on.

Possibly interesting exercise: see if you can get to pick your favorite moden python testing framework without binning this work and starting fresh.

TDDbE: Dealing with Failure

Beck switches into a smaller grained test; this introduces a testFailed message, which gives him the permission that he needs to extract the error count and use general formatting to eliminate the duplication in the test summary message.

There is a subtlety hidden inside this method....  However, we need another test before we can change the code.

I don't find this section satisfactory at all.

Let's review: in chapter 21, we started working on testFailedResult, which was intended to show that the failure count is reported correctly by TestResult when a broken test is run.   That test "fails wrong": it exits on the exception path rather than on the return path.  So we "put this test on the shelf for the moment".

We take a diversion to design the TestResult::summary without the TestCase baggage.

All good up to that point - we've got a satisfactory TestResult, and a TestResult::testFailed signal we can use to indicate failures.

So now we unshelf the test that we abandoned in the previous chapter.  It fails, but we can make it pass by introducing an except block that invokes TestResult::testFailed.

However, the scope of the try block is effectively arbitrary.  It could be fine grained or coarse grained -- Beck's choice is actually somewhere in the middle.  TADA, the test passes.  Green bar, we get to refactor.

But he's going to make a claim that we can't change the boundaries of the try block without another test...?

I think the useful idea is actually this: the current suite of tests does not include constraints to ensure that exceptions thrown from setUp are handled correctly.  So we'd like to have tests in the suite that make that explicit, so it should go into the todo list.

What's missing is the notion of test calibration: we should be able to introduce tests that pass, inject a fault to ensure that the tests can detect it, remove the fault, and get on with it.  Of course, if we admit that works, then why not test after...?

So I think what's irritating me here is that ceremony is being introduced without really articulating the justifications for it.

Contrast with this message:

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence -- Kent Beck, 2008

Monday, September 18, 2023

TDDbE: Counting

In general, the order of implementing tests is important.  When I pick the next test to implement, I find a test that will teach me something and which I have confidence I can make work.

Reminder: step size varies with confidence - when things are clear, we set a faster pace.

Note that this isn't the only possible strategy; the "Transformation Priority Premise" proposes that we want to always make the change that is "closest" to where we are now -- with "closest" being determined by a sort of weighted priority function.

What Beck wants to do is implement tearDown.  But it is difficult to test - exception is a bit overloaded here.  So instead of pursuing the difficult implementation, he introduces an intermediary -- separating what is complicated from what is difficult.

Here, that takes the form of a data structure to keep track of what is going on within the framework; to begin, just tracking tests run and tests failed.

What caught my eye here: having introduced a trivial fake implementation, he's immediately willing to "remove duplication" in counting the number of tests run - introducing an initialized value, incrementing the value, and so on.

But he does not choose to do the analogous refactoring on the test failed count, instead sticking with the hard coded value.  "The tests don't demand it."  I'm suspicious of that justification, in so far as with a single test, they didn't demand changing the test count either.

You might be able to make an argument from the idea that we started with an implicit counter for tests running, but there is no implicit counter for failures (because this test doesn't include the failing test code path) and therefore it's appropriate to make the former counter explicit, but not yet for the latter counter.

On the other hand, you probably can justify it via "make the next change easy".

Another point to note: in this first test, TestResult is not isolated from TestCase -- TestResult::testStarted is invoked by TestCase, not by the test.  Beck is including within his observation the behavior of TestResult and also the protocol shared between TestResult and TestCase.


Thursday, September 7, 2023

TDDbE: Cleaning Up After

Doing a refactoring based on a couple of early uses, then having to undo it soon after is fairly common.  Some folks wait until they have three or four uses before refactoring because they don't like undoing work.  I prefer to spend my thinking cycles on design, so I just reflectively do the refactorings without worrying about whether I will have to undo them immediately afterwards.

I find myself wanting to push back on this a little bit.  I don't mind refactoring the measured code on the regular - we learn as we go, and that's fine.  

 I fret a little bit about refactoring the measuring code; we red-green when the test is new, which gives us additional confidence that the test is measuring what we think it is, but if we are continuing to tinker with the guts of the test we should be recalibrating from time to time.

Which is to say, we red-green on the theory that its not enough to assume that we're measuring what we think we are, we need additional evidence.  It seems inconsistent to assume that we can keep evolving the test on the assumption that it still measures what it did before.

Spying on the calls as the framework runs the test is a clever idea, and of course there's no reason that the telemetry needs to be so complicated as an "object".

Got myself really twisted up looking at the final code, until I realized that the code in the text doesn't necessarily match the code he's narrating -- the code in my edition of chapter 20 includes some artifacts that won't be introduced until chapter 21.

Nonetheless, I find myself thinking again that the point is to catch mistakes, not to do The Ritual flawlessly.

A git history with each change following along with the text might be an interesting exercise.  Or for this specific example, maybe a jupyter notebook kind of thing.

 

Wednesday, September 6, 2023

TDDbE: Set the Table

 When you begin writing tests, you will discover a common pattern...

I'm not a fan of Arrange-Act-Assert myself; (Arrange-Act)-Assert would be better -- we shouldn't entangle taking a measurement with checking that the measurement is satisfactory.

But arrange, wherever you put it, is likely to get re-used; we often have many instances of tests that are using the same constellation of objects to produce a measurement.

And if you are building an xUnit, then your customers are going to are going to expect that the framework will work a particular way, including having implicit facilities for arranging object constellations.  So be it.


Tuesday, September 5, 2023

TDDbE: First Steps to xUnit

 Hey, we're in python!

We need a little program that will print out true if a test method gets called, and false otherwise.

Two things strike me with this exercise.

First, it reminds me of the practice of working through the "imperative shell" to discover where the boundary is going to be between the testable core and the humble object.

Second, it reminds me that this part of the exercise tends to be notable for its lack of ambition.  It would be easy to imagine, for example, what the finished console output of running the test should be, and refactor the test toward that end.  Instead, Beck starts with a trivial output, applying the Guru Checks Output pattern on the early runs.

Having created code that produces the correct answer (for this one trivial case), Beck now sets to work refactoring the exercise - beginning the march toward the design that is currently in his head (he may be taking baby steps, but the section title is "The xUnit Example" - he's working toward a specific framework).

We're not really looking at a test "driving" the design, at least not yet.  Right now, the test is just the ratchet, ensuring that we don't lose ground as the code is cleaned up.

Sunday, September 3, 2023

TDDbE: Money Retrospective

The biggest surprise for me in coding the money example was how different it came out this time.

In other words, the design isn't something that merges organically from the test; the tests and the design co-evolve under the guidance of the programmer.  If the programmer has an inspiration, then the programmer writes different tests to constrain the behavior of the implementation, and then the implementation follows.

The tests that we have in this example are not metaphor agnostic, as a review of them will show.  They are a sort of long term bet that this metaphor will stick in this code base.

The tests that are a natural by-product of TDD are certainly useful enough to keep running as long as the system is running.  Don't expect them to replace the other types of testing.

I'm reminded of Coplien's comments about unit tests and changing signals; we should think carefully about how much we want to invest in keeping these tests running as the system changes.  In this toy example, the code isn't finished - we'd likely want to keep the tests around while the system was under development, because that tells us something important about the changes that we are making.

JProbe reports only one line in one method not covered by the test cases -- Money.toString(), which we added explicitly as a debugging aid, not real model code. [emphasis added]

It seems to me that much of the extremism of TDD has built into it the idea that our focus is on model code, rather than on effects (especially true of the "classical" school).

Removing duplication between test and code as a way to drive the design

I'm happy to see this message reinforced again, but I'm not sure its useful - until you see it, you don't see it, and you can be doing TDD for a long (long) time without seeing it.

 

TDDbE: Abstraction, Finally

This test is a little ugly, because it is testing the guts of the implementation, not the externally visible behavior of the objects.  However, it will drive us to make the changes we need to make....

I think there's a lot to unpack here.

Writing tests that were tightly coupled to the guts of the implementation was something that people did a lot of experimenting with in the early going - how do you "test drive" recursion, for example?  One important lesson: tests that overfit their implementations make those implementations harder to change.

(Part of the problem here is that this example is Java circa 2002; generics haven't been added to the language yet, and so designs like Money extends Expression<Money> aren't an option.)

I think "externally visible behavior" is a spelling error - the type of the object, in Java, certainly is externally visible - you can query the type of an object by sending it a message, just like you would for any other query (ex: toString()).

It's not a domain behavior - which is the more important distinction.

I'm somewhat amused by the fact that Beck is reluctant to couple the test to a specific return type, but has no difficulty coupling tests to specific constructors.

On the whole, the overall design strikes me as unsatisfactory; it passes the tests, which ain't nothing, but "remove duplication" has had a rather devastating impact on the design in the simple case - even within these toy examples.  It's not clear to me that a good trade has been made.


Saturday, September 2, 2023

TDDbE: Mixed Currencies

We're finally ready to write the real test that motivates this entire problem... and so of course Beck wants to change up the API again.

This is what we'd like to write.

I'm not so sure.  What immediately stands out to me is that the proposed test is using mixed vocabulary.  We have Money, and Banks (a facade in front of an exchange rate), and Expressions.  One of these things is not like the others.

Anyway, the plan is to write the test using the language that will currently work.  That test fails, so the implementation gets updated to restore the green bar.

Then, under the protection of Green and the compiler, the interfaces are rewritten into the language of Expressions.

To avoid the ripple effect, we'll start at the edges and work our way back to the test case.

A good reminder that we have the tests so that we can take small steps with immediate feedback, rather than making all of the changes in one chunk.

TDDbE: Change

Describing data with Java is not as easy as it should be, and that's certainly true of Java 2003.  What I'd like to see in a currency exchange test is a list/map of exchange rates, a list of amounts, and a target currency -- without necessarily committing to a specific arrangement of that information into "objects".

But of course, that isn't what we get - we're in the Kingdom of Nouns, and all of the information needs to be put somewhere.

It feels to me as though we're ending up with tests that are more tightly couspled to internal details than I would like - it's somewhat hard to say: are we working toward tests that are describing the published interface of the utilities that are being created? or are we working toward tests that are intended to describe the workings of the utilities to future maintainers.

In modern Java, we might get a better sense for that by looking at what is exported by the modules, but of course we don't have that advantage here (and perhaps wouldn't anyway - I wouldn't want to be introducing a bunch of module ceremony while the interface is still in flux).

I do think it's interesting to watch Beck try to remove the duplication in Bank::rate.  His green bar implementation is a choice between two hard coded values - that's no good, because the hard coded values here duplicate the values in the test.  So he takes a guess at how that duplication should be managed, and creates a new Pair class to manage it.

Note that Pair comes into existence without an isolated test - it's an internal detail of how Bank manages its data structures.

It also interests me to notice that, in this particular case of introducing a regression during a refactoring, Beck's answer is to add another failing test.

The goal isn't to "follow the rules".  The goal is "clean code that works".


Tuesday, August 29, 2023

TDDbE: Make It

 We can't mark our test for $5 + $5 done until we've removed all of the duplication.

Huh.  Because I have really felt like we've marked earlier tests as done before the duplication is removed.

We don't have code duplication, but we do have data duplication....

My experience is that people often miss this duplication - not necessarily in the long term, but certainly in the short term.  Data duplication gives you "permission" to do a lot of the refactoring early (after first test), rather than deferring it until later.

In this case, why 10? because there are two 5s -- great, so "let's make the code say that".

Instead... ugh, Beck's example really seems to be going off of the rails here.  Sums with augends and addends?  who ordered that?

In passing, I do think it is important to notice that the details of the implementation are bleeding into the tests.  These are programmer tests - they are there to measure the interactions of the internal elements in our module.  We aren't restricted to only testing via the "published" API.

Wrote a test to force the creation of an object we expected to need later.

This has been a peeve for a long time - a lot of the mythology of test driven development, especially early on, centered the fact that the design arrives organically from the needs of the tests.  At the same time, practitioners were constantly guessing at designs ("I'm going to need a class that does/represents mumble") and then creating a bunch of scaffolding around their guess, rather than attacking the problem that they actually have in front of them right now.

I don't think there's anything wrong, necessarily, with having a mental model of how the code should read in the end, and creating a sequence of tests to get there.  I do think, though, that if that's what we are doing we really ought to be upfront about it.

 

Monday, August 28, 2023

TDDbE: Addition, Finally

Beck introduces the idea of rewriting the TODO list as both a warmup exercise, and as permission to clean up the little items first rather than copying them to the new list.

I'm not sure how to write the story of the whole addition, so we'll start with a simpler example

Again, the lesson of shifting to smaller steps as a counter measure to uncertainty; accompanied by my usual interjection that baby-footin' is optional, not mandatory.  Have the skill to work at multiple cadences and shift between them, then use the cadence that is appropriate at the time.

I hope you will see through this how TDD gives you control over the size of the steps.

Vindication!

TDD can't guarantee that we will have flashes of insight at the right moment.  However, confidence-giving tests and carefully factored code give us preparation for insight, and preparation for applying that insight when it comes.

That said, I think that testSimpleAddition really goes sideways.  I think there are two flaws here: first, that the metaphors aren't very good.  Second, the experimenting with the metaphor is something that should emerge during refactoring, not during the red task.

What we should be doing is taking the metaphor we understand (adding dollars to dollars), and making that work, then refactoring the underlying implementation to find the working metaphor (encoding into the design our improved understanding of the proper way to think), then lifting the new metaphor into the published API and finally (assuming it makes sense) sunsetting the dollars to dollars metaphor.

In other words:

for each desired change, make the change easy (warning: this may be hard), then make the easy change -- Kent Beck, 2012

This is not a fair fight: I'm writing 10+ years after Beck shared this approach, whereas the work I'm criticizing was written ten years before.  If he waited to write the book until he understood everything thoroughly, the book would not exist for me to critique.

Part of the point of this review exercise is to rediscover where we were, in the expectation that we will find past practices that should be improved.


Thursday, August 24, 2023

TDDbE: The Root of All Evil

A quick lesson on how to get rid of a pair of classes that shouldn't have been introduced in the first place.

One thing stands out here: he doesn't do a lot of work to verify that the tests he wants to get rid of are redundant.  Think, zap, done.  And here, that's the right thing (toy problem, not too much to keep in your head, and of course it happens the tests are actually unnecessary).

Of course, if we are deleting tests that are passing, there's no problem in the sense that the production code is still right.  The potential difficulty would be if we continue to assume that some part of the code is covered by tests when in fact it is not, and then start working on that code expecting to be afforded the same protections we had when the test was in place, and start making mistakes.

Sunday, August 20, 2023

TDDbE: Interesting Times

 Whoa! Code without a test?  Can you do that?

This is a strange little digression, on two points.  

First, the error messages for the equality tests are something that should have been observed when the equality tests were created during RED.  That would have been the obvious time to notice the messages and make improvements to them.  Perhaps addressing the issue then would have made the earlier chapter more confusing.

Second, the change to the output message doesn't actually clarify the problem - Kent calls attention to this in the text, of course, but I'm amused that the extra work thrown in at this point made the failure harder to recognize, rather than easier.

Kent decides to back out the change and start over: here, he calls it the conservative approach.  Fifteen years later, he'll introduce test && commit || revert -- a way of committing to the conservative approach.

With the tests you can decide whether an experiment would answer the question faster.  Sometimes you should just ask the computer.

 I like the experiment language quite a bit.

 I'm beginning to feel that the main lesson up to this point is something along the lines of: be flexible until you know what you are doing.

Friday, August 18, 2023

TDDbE: Times We're Livin' In

The answer in my practice is that I will entertain a brief interruption, but only a brief one, and I will never interrupt an interruption.

 That's a useful constraint right there.

This is the kind of tuning you will  be doing constantly with TDD.  Are the teeny-tiny steps feeling restrictive?  Take bigger steps.  Are you feeling a little unsure?  Take smaller steps.  TDD is a steering process -- a little this way, a little that way, There is no right step size, now and forever.

Yet another definition of TDD.  Heh.

As usual, I'm not particularly fond of the currency test; again, it lacks connection to the business needs.  Expressed another way, there's no assurance that the test, as written, is a good example for future clients because we aren't actually looking at the needs of the client.  We're just sort of assuming that we're going to end up with micro-methods eventually anyway, so we might as well proceed to them without passing go.


Sunday, August 13, 2023

TDDbE: Makin' Objects

The two subclasses of Money aren't doing enough work to justify their existence, so we would like to eliminate them.

Minor complaint: that's a pretty clear conclusion to reach as soon as you start thinking about them.  Again, my charitable interpretation is that we are witnessing a demonstration of how to climb out of a hole that you probably shouldn't have dug in the first place.

And that's perhaps concerning, in so far as we could have had a lot more coupling to the "wrong" interface if we hadn't discovered the improvement as early as we did.

The first step is interesting - replacing the direct invocation of the object constructors with factory methods (aka "named constructors").  That's a good idea often enough that I wouldn't have minded terribly if the "perfect interface" we imagined in chapter one had used them.

Kevlin Henney once observed that if your program is really object oriented, then the only place you would see the name of a concrete class is at the point that the constructor is invoked - everything else would be interfaces or (rarely) abstract classes.

 


Friday, August 11, 2023

TDDbE: Apples and Oranges

We would like to use a criterion that makes sense in the domain of finance, not in the domain of Java's objects. But we don't currently have anything like a currency, and this doesn't seem to be sufficient reason to introduce one.

Working outside in, we'd probably have one.  After all, the multi-currency report shown in Chapter 1 includes ISO 4217 currency codes.  

Many kinds of values will want to have both an amount and unit, and currency code is the logical expression of unit in this domain.

The thought occurs to me: are we floundering about because we don't actually know what we are doing (or are pretending that we don't know what we are doing?)


Thursday, August 10, 2023

TDDbE: Equality for All, Redux

 I tried this already

I'm not entirely sure how I want to interpret this remark.  On the one hand, spikes are a thing, and we might reasonably run some quick experiments in a sandbox to guess which approach we would prefer to use.  On the other hand, cooked examples are a thing; demonstrating a practiced solution is not nearly as compelling as demonstrating an unpracticed solution.

I'm not fond of this demonstration at all.

In order, it's hard to tell at this point if we're supposed to be learning that this is the way to do it, or if this is intended to be some flavor of "even though we started badly, we got there in the end."

Maybe it's a lesson about choice - introducing the complexity of Francs before we've invested too much in Dollars reduces the cost of change, with the penalty of having more "not yet done" things to keep track of?

It's hard for me to shake the notion that the viability of this approach is tightly coupled to the fact that this is a toy problem.

Wednesday, August 9, 2023

TDDbE: Franc-ly Speaking

 A copy paste chapter - I almost feel that I should a previous post, just to stay on theme.

The different phases have different purposes.  They call for different styles of solution, different aesthetic viewpoints.

In this case: copy-and-hack is an option.  OK, that doesn't bother me too much.  Why copy and hack instead of some other idea and hack?  Well, it's easy to type?  I suppose that's reason enough.

If we weren't willing to copy-and-hack, I suppose the alternatives would be to refactor toward a more flexible design first, then leverage that into a more dignified introduction of the new functionality.

It's probably also worth noting that one reason copy-and-hack is a wall clock fast option: there's not a lot of code here yet - less than 15 lines if we are willing to ignore the white space.  So 10 out of 10 for changing course while it's still easy to do.

I think the "eliminate duplication" message would be stronger if it didn't carry over from chapter to chapter.

Tuesday, August 8, 2023

TDDbE: Privacy

So having invested in an equals method, Beck now rewrites the multiplication test to use it.  At this point, the tests for both multiplication and equality are completely decoupled from the data structures hidden behind the Dollar facade.

In effect, he takes an approach similar to the "unit test" approach described by Beizer - if Dollar::equals is well tested, then we can have the same confidence in that method that we do in the standard library, and therefore we can use it in testing other methods.

(I wouldn't call Dollar::equals well tested yet, of course.  It still doesn't satisfy the general contract of Object::equals.  "This is a risk we actively manage in TDD.")

In addition, since the tests are only using the facade, the implementation details can be made private, reducing future costs should we decide later that the underlying data representation should change - a la Parnas 1971.

Kent seems to be much happier with the clarity of the multiplication test here, but I'm not certain that this improvement is worth the emphasis.

Fundamentally, my issue is this: Dollar::equals is unmotivated.  When we review the report we have in chapter 1, what do we find?  We need to be able to "display" Dollars, and sum a column of Dollars, and multiply a price (Dollars / share) and number of shares to determine a position.  The report tells us that presentation and arithmetic have business value.

But I see nothing in the report that indicates that logic has value.  Equality isn't a thing that we need to satisfy a customer need right now.

Instead, the demand for equality came about because we wanted a "value object", and because we wanted test design that looked nice.  Which feels to me as though we're chasing our own tail down a rabbit hole.

Expressed another way - by using equality, we've decoupled the current array of tests from the amount.  However, this is unsatisfactory because some representation of amount is precisely the thing we need to get out of the Dollar object when we are producing the report.

The representation in the report is effectively part of the human interface, and we should be, perhaps, reluctant to couple our tests too tightly to that interface ("abstractions should not depend upon details"), but we do need some abstraction of amount to produce a useful report.

A black box that accepts integers and returns booleans does not satisfy.

What this reminds me of: early TDD exercises where the actual business value was delivered at the end, when all of the design-in-the-mind was finally typed into the computer, as opposed to ensuring the value first, then refactoring until there are no more improvements to be made.


Monday, August 7, 2023

TDDbE: Equality for All

If I went through these steps, though, I wouldn't be able to demonstrate the third and most conservative implementation strategy: Triangulation.

And honestly: I kind of wish he hadn't done that.

Some backround - early after Kent shared the first draft of the book that eventually became Test Driven Development by Example, an objection was raised: there should be a second test to "drive" the refactoring that Kent had demonstrated in the multiplication example.  Until the second test is introduced, "You Aren't Going to Need It" should apply.  Furthermore, in Kent's approach, during the refactoring task, he makes changes that don't actually preserve the behavior; a function that ignores its inputs is converted into a function that is sensitive to its inputs.

So bunch of things all get tangled up here.  And when the dust clears, triangulation ends up with a lot more ink than is strictly justified, remove duplication gets a lot less ink, and a bunch of beginners fall into a trap (confession - I was one who did fall in, back in the day).

Kent's earliest remarks on triangulation emphasize that he doesn't find it particularly valuable.  And certainly he didn't come up with a particularly well motivated example when introducing it.

The metaphor itself isn't awful - sometimes you do need more information to be certain of your course - just over emphasized.

For example: it takes up this entire chapter.

I find the motivation for implementing Dollar::equals to be a little bit underwhelming.

If you use Dollars as the key to a hash table, then you have to implement hashCode() if you implement equals().

This rather suggests that we should be writing a check that demonstrates that Dollar can be used as a hash key.

The "implementation" of Dollar::equals really leaves me twitching (see Effective Java by Joshua Bloch) - this is an override that doesn't match the semantics of the parent class (yet).  I'd want to add the additional checks needed to finish this implementation before moving on to some other more interesting thing.

Sunday, August 6, 2023

TDDbE: Degenerate Objects

  1. Write a test.
  2. Make it run.
  3. Make it right.

Again, "get to green quickly" implying that we are going to transition from the design we have to the design we want via many more much smaller steps.

Beck offers "architecture driven development" as a label for the opposite approach, where implementing the clean design happens before we can know whether the clean design will in fact work the problem.  I don't love the name, and to be honest I'm not sure there's that much advantage in pretending that we don't know what the final design is going to look like.

I might argue as well that "invent the interface you wish you had" sounds to me a lot like solving the clean code part first.  If we can train our brains to do that well, why not the other?  Of course, Kent immediately takes the same view, but reversed

Our guesses about the right interface are no more likely to be perfect than our guesses about the right implementation.

But fine - sometimes we have to go back and change up the interface -- which is one reason that we implement a new failing test only when there are no other failing tests; we don't want to make extra work for ourselves by over committing when we know the least about our final design.

Something I notice: for this second chapter, Beck extends his original test into a more complicated scenario, rather than introducing a new test.  That's fine, I guess - I'm not sure I remember when disciplined assertions became a thing.

The longer I do this, the better able I am to translate my aesthetic judgments into tests.  When I can do this, my design discussions become much more interesting.

What catches my attention here is the sequence: judgment, test, implementation.  In this example, Kent doesn't wait until a mistake "discovers" that a different design would be easier to work with in the long term, but instead motivates his code change via "that's doesn't work how I want it to".  And without a doubt, I think treating this Dollar implementation as an immutable value is reasonable.

But there's certainly no "the test told us to do it" here.  This design change comes about because we're all carrying memories of designs that made things unnecessarily difficult, and we have no particular interest in repeating the experience.

Thursday, August 3, 2023

TDDbE: Multi-Currency Money

 Alright, chapter one, and we're introduced to the process of writing the first test

... we imagine the perfect interface for our operation.  We are telling ourselves a story about how the operation will look from the outside.  Our story won't always come true, but it's better to start form the best-possible application program interface (API) and work backward

Which I find interesting on two fronts - first, because where does this understanding of perfection come from? and second, because immediately after his first example he itemizes a number of ways in which the provided interface is unsatisfactory.

Another aspect to this example that I find unsatisfactory is that the presentation is rather in media res.  Beck is essentially working from the WyCash context - we already have a portfolio management system, and there's a gap in the middle of that system that we want to fill in.  Or if you prefer, a system where the simpler calculations are in place, but we want to refactor those calculations into an isolated module so that we might more easy make the new changes we want.

So we might imagine that we already have some code somewhere that knows how to multiple `dollars(5)` by `exchangeRate(2)`, and so on, and what we are trying to do is create a better replacement for that code.

I'm not entirely satisfied with this initial interface for this case, however - it rather skips past the parts of the design where we take the information in the form we have it and express it in the form that we need it.  In this case, we're looking at the production of a report, where the inputs are some transient representation of our portfolio position, then the outputs are some transient representation of the report.

In effect, `new Dollar` is a lousy way to begin, because the `Dollar` class doesn't exist yet, so the code we have can't possibly be passing information to us that way.

I don't think it matters particularly much, in the sense that I don't think that the quality of the design you achieve in the end is particularly sensitive to where you start in the solution.  And certainly there are a number of reasons that you might prefer to begin by exploring "how do we do the useful work in memory" before addressing the question of how we get the information we need to the places we need it.

Another quibble I have about the test example (although it took me many years to recognize it) is that we aren't doing a particularly good job about distinguishing between the measurement and the application of the "decision rule" that tells us if the measured value is satisfactory.

Moving on: an important lesson

We want the bar to go green as quickly as possible

The green task should be evaluated in wall clock time - we want less of that, because we want the checks in place when we are making mistakes.

A riddle - should TDD have a bias towards designs that produce quick greens, and if so is that a good thing?

(Isolating the measurement affords really quick greens via guard clauses and early returns.  I'll have to think more on that.)

Once again, I notice that Kent's example racks up four compile errors before he starts working toward the green bar, where "nanocycle TDD" would encourage prioritizing the green bar over finishing the test.  I'm not a big fan of nanocycle, myself, so I like having this evidence in hand when challenged.

We need to generalize before we move on.

You can call it generalizing, or you can call it removing duplication, but please notice that Kent is encouraging that we perform cleanup of the implementation before we introduce another failing test.

(There is, of course, room to argue with some of the labels being used - generalizing can change behaviors that we aren't constraining yet, so is it really "refactoring"?  Beck and Fowler aren't on the same page here - I think Kent addresses this in a later chapter.)

By eliminating duplication before we go on to the next test, we maximize our chances of being able to get the next test running with one and only one change.

Ten years later, this same advice

for each desired change, make the change easy (warning: this may be hard), then make the easy change

How much work we have to do to make the change easy is likely an interesting form of feedback about the design you start with; if it's often a lot of work, maybe our design heuristics need work.

The final part of the chapter is an important lesson on duplication that I don't think really got the traction that it should have.  Here, we have a function that returns 10 -- but there are lots of functions in the world that return 10, so we should do the work to make it clear why this specific function returns 10.  

(Heh: only now do I notice that we don't actually reach the multi-currency bits hinted at by the chapter title.)

Wednesday, August 2, 2023

TDDbE: Introduction

The introduction is an interesting story of technical debt: the work on improving the internal quality of the code, and in particular the outsourced `Dollar` object, had been on-going for six months before the arrival of the business opportunity that needed the improved design.

(The debt metaphor first appears in Ward's 1992 experience report, which describes this same portfolio management system).

The transition from testing that computations are within tolerance to testing that computations are exact hides an important lesson: code changes that relocate rounding within a calculation are not refactorings in domains where the calculations are regulated, and you do want tests that are sensitive to those changes if refactoring is a core value.

In a broader sense, we're also getting a statement of good design; a design is good if it is ready to be changed.

I suspect that the story takes place circa 1992, so of course the team isn't doing "TDD", and there's no particular indication that the team is taking a test first approach.  The promise here is really that TDD can bring you to this place that the Wyatt team got to another way (not too different though -- "if you assume all the good ideas in here are Ward's" and all that).

We also get our first look at the two simple rules

  • Write a failing automated test before you write any code.
  • Remove duplication.

The first of these, I think, fails, in the sense that it describes an XP approach where you turn up the dials to eleven to see how it works.  And it works a lot, but it doesn't work eleven - the costs of writing automated tests are not uniform, and expensive tests against unstable specifications is generally a miserable experience for everybody.

In regarding the second rule, it's probably worth noting that, at the time, Kent had a reputation for being something of a wizard at recognizing duplication, and pursuing a sequence of refactorings that would eventually make the duplication obvious to everyone else, and finally taking steps to remove it.

I suspect that there is a similar skill, in which we take two things that look to be the same and improve the design until it becomes clear that they are not, in fact, duplicates.

Remove duplication hints at the four rules of simple design.  One of the questions I'm still carrying around is whether the rules of simple design are part of the TDD bundle -- are we still "doing TDD" if we replace simple design with some other collection of design heuristics?

Monday, July 31, 2023

TDDbE: Preface

Opening paragraph: the goal of Test Driven Development is "clean code that works" - which is a bit of sloganeering describing the stop condition.  TDD is a way to apply our effort to achieve the stop condition.

First promise: the long bug trail after "done" goes away.  Certainly a lot of the faults in complicated logic should have been identified and eliminated, because we will be making sure to design our systems so that testing the complicated logic is cost effective (i.e. relentlessly reducing the coupling between complicated logic and modules that are difficult or expensive to test).  Depending on how much of your system is complex logic, that could be the bulk of your problem right there.

It gives you a chance to learn all of the lessons that the code has to teach you.
This one, I'll need to chew on.  You've got structured time to focus attention, but there's an opportunity cost associated with each of these lessons.

  • Write new code only if an automated test has failed
  • Eliminate duplication

These are two simple rules....

Eliminate duplication being a rule is kind of important, in so far as it pulls one of the pillars of the four rules of simple design into a definition of TDD.  Sandro Mancuso has suggested that the four rules of simple design are separate from TDD, and while I'm receptive to the idea that they should be, the rule listed here means it isn't quite so easy.

A lot of lip service was paid to the "only if" rule early on.  Michael Feathers introduced Humble Dialog before TDDbE was published, and we already had people using test doubles as an isolation mechanism.  I think you can interpret that a lot of different ways - either the rule isn't so absolute, or the rule is absolute but the TDD process isn't sensible for all coding problems.

Running code providing feedback between decisions

This is a riddle I've been thinking on lately: the obvious feedback we get between decisions is red/green bar, announcing whether the values measured in our experiments match the described specifications.  But the feedback for the design decisions - running the code doesn't give us very much there.  Feedback for design decisions must come from somewhere else.

Refactor - Eliminate all of the duplication created in merely getting the test to work.

This is a big one: in my experience, most to nearly all practitioners move on from the refactor phase while duplication is still present.  Back in the day, Beck had a reputation for being something of a wizard in spotting duplication that wasn't obvious and immediately taking steps to eliminate it.

Next, lots of emphasis on implications of reduced defect density.  Just something to notice, possibly in contrast to his earlier writings about test first being primarily an analysis and design technique.

I'm noticing in passing that he isn't making any attempt here to clarify whether he is talking about reduced defect density because defects are being discovered and removed, or reduced defect density because fewer defects are being introduced.

TDD is an awareness of the gap between decision and feedback during programming, and techniques to control that gap.

 One of my two favorite quotations from the book.

Some software engineers learn TDD and then revert to their earlier practices, reserving TDD for special occasions when ordinary programming isn't making progress.

Huh.  If this sentence is written in 2009 - fine, there's been lots of time for people to try it, learn it, and decide it's a sometimes useful tool in the kit.  On the other hand, if that sentence goes back to the earliest drafts, then it becomes a very interesting statement about some of the early adopters.

There certainly are programming tasks that can't be driven solely by tests.  Security software and concurrency, for example, are two topics where TDD is insufficient to mechanically demonstrate that the goals of the software have been met.

 Horses for courses.

Before teeing off on the examples as being too simple

Well, I certainly do do that often enough - mostly because the starting position (money as model code) already elides a bunch of interesting decisions about module boundaries.  And weren't we promising that the tests would "drive" those decisions?


 

Book Club: Test Driven Development by Example

I think I'll try a review of Test Driven Development by Example.  I'll be using the 14th printing (October 2009) paperback

Friday, January 6, 2023

Schools of Test Driven Development

There are two schools of test driven development.  There is the school that believes that there are two schools; and there is the school that believes that there is only one school.


The school of "only one school" is correct.

As far as I can tell, "Chicago School" is an innovation introduced by Jason Gorman in late 2010.  Possibly this is a nod to the fact that Object Mentor was based in Evanston, Illinois.

"Chicago School" appears to be synonymous with "Detroit School", a term proposed by Michael Feathers in early 2009.  Detroit, here, because the Chrysler Comprehensive Compensation team was based in Detroit; and the lineage of the "classical" TDD practices could be traced there.

Feathers brought with that proposal the original London School label, for practices more readily influenced by innovations at Connextra and London Extreme Tuesday Club.

Feathers was proposing the "school" labels, because he was at that time finding that the "Mockist"  and "Classicist" labels were not satisfactory.

The notion that we have two different schools of TDDer comes from Martin Fowler in 2005.

This is the point where things get muddled - to wit, was Fowler correct to describe these schools as doing TDD differently, or are they instead applying the same TDD practices to a different style of code design?

 Steve Freeman, writing in 2011, offered this observation

...There are some differences between us and Kent. From what I remember of implementation patterns, in his tradition, the emphasis is on the classes. Interfaces are secondary to make things a little more flexible (hence the 'I' prefix). In our world, the interface is what matters and classes are somewhere to put the implementation.

In particular, there seems (in the London tradition) to be an emphasis on the interfaces between the test subject and its collaborators.

I've been recently reviewing Wirfs-Brock/Wilkerson's description of the Client/Server model.

Any object can act as either a client or a server at any given time

As far as I can tell, everybody has always been in agreement about how to design tests that evaluate whether the test subject implements its server responsibilities correctly.

But for the test subject's client responsibilities?  Well, you either ignore them, or you introduce new server responsibilities to act as a proxy measure for the client responsibilities (reducing the problem to one we have already solved), or you measure the client responsibilities directly.

Mackinnon 2008 reported that extending the test subject with testing responsibilities was "compromising too far", and John Nolan had therefore challenged the team to seek other approaches.

Reflecting in 2010, Steve Freeman observed: 

The underlying issue, which I only really understood while writing the book, is that there are different approaches to OO. Ours comes under what Ralph Johnson calls the "mystical" school of OO, which is that it's all about messages. If you don't follow this, then much of what we talk about doesn't work well. 

Similarly, Nat Pryce:

There are different ways of designing how a system is organised into modules and the interfaces between those modules....  Mock Objects are designed for test-driving code that is modularised into objects that communicate by "tell, don't ask" style message passing.

My take, today, is still in alignment with the mockists: the TDD of the London school is the same TDD as everybody else: controlling the gap between decision and feedback, test first with red green refactor, and so on.

The object designs are different, and so we also see differences in the test design - because tests should be fit for purpose.