When I write a test that is too big, I first try to learn the lesson. Why was it too big? What could I have done differently that would have made it smaller? ... I delete the offending test and start over.
Delete and start over is a power move, and will later lead to the exploration of Test Commit/Revert.
This reminds me of Mikado method: that we might want to systematically reset and explore smaller goals until we get to an improvement we actually can make now, and then work our way back up the graph until the code is ready for the big test.
I think you have to be fairly far along in the design to start running into this sort of thing; when the design is shallow, you can usually just drop in a guard clause and early return, which gets the test passing, and now it is "just" a matter of integrating this new logic into the rest of the design. But when the current output is the result of information crossing many barriers, getting the guard clause to work can require rework across several boundaries.
How do you test an object that relies on an expensive of complicated resource?
Oh dear.
OK, context: Beck is writing prior to the development of the pattern language of Test Doubles by Gerard Meszaros. He's re-purposing the label introduced in the Endo-Testing paper, but it's not actually all that great a fit. It would be better to talk about test stubs (Binder, 1999), or doubles if you wanted to be understood by a modern audience.
Anyway, we get a couple different examples of stubs/doubles that you might find useful.
Leaving a test RED to resume at the next session is a pretty good trick; but I think you'll do better to treat the test as a suggestion, rather than a commitment, to resume work at a particular place - the most important item to work next might change between sessions, and the failing test will be a minor obstacle if you decide that refactoring is worthwhile.