Unit tests are your specification

Recently a Schalk Cronjé forwarded me a tweet from Joshua Lewis about some unit tests he’d written.

I took a quick look and thought I may as well turn my comments into a blog post. You can see the full code on github.

Comment 1 – what a lot of member variables

Why would we use member variables in a test fixture? The fixture is recreated before each test, so it’s not to communicate between the tests (thankfully).

In this case it’s because there’s a lot of code in the setup() method (see comment 2) that initialises them, so that they can be used by the actual tests.

At least it’s well laid out, with comments and everything. If you like comments – and guess what – I don’t. And they are wrapped in a #region so we don’t even have to look at them, if our IDE understands it properly.

Comment 2 – what a big setup() you have

I admit it, I don’t like setup()s – they move important information out of the test, damaging locality of reference, and forcing me to either remember what setup happened (and my memory is not good) or to keep scrolling to the top of the page. Of course I could use a fancy split screen IDE and keep the setup() method in view too, but that just seems messy.

Why is there so much in this setup()? Is it all really necessary? For every test? Looking at the method, it’s hard to tell. I guess we’ll find out.

Comment 3 – AcceptConnectionForUnconnectedUsersWithNoPendingRequestsShouldSucceed

Ok, so I’m an apostate – I prefer test names to be snake_case. I just find it so much easier to parse.

The name, though, is pretty good. Very descriptive, although I’m still not sure what ShouldSucceed really means.

The problem starts when I try to relate the comment (C):

///GIVEN User1 exists AND User2 exists AND they are not connected AND User1 has requested to Connect to User2

with the name of the test (N), which incorporates the text: WithNoPendingRequests

and the code in the test (T):

Given(user1Registers, user2Registers, user1RequestsConnectionToUser2);
AndEventsSavedForAggregate<User>(user1Id, user1Registered, connectionRequestedFrom1To2, connectionCompleted);
AndEventsSavedForAggregate<User>(user2Id, user2Registered, connectionRequestFrom1To2Received, connectionAccepted);

So, the name (N) says that:

  1. the users should not be connected, and
  2. they (?) should have no pending requests [this is probably a cut and paste error]

The comment (C) says that:

  1. the users should not be connected, and
  2. user1 has requested a connection to user2

And the test (T) says:

  1. nothing explicit about whether they are already connected
  2. nothing explicit about pending requests [although there is a request connection command, which is implicitly pending]
  3. plenty of assertions on events that have nothing directly to do with identifying a successful connection

Why is any of this a problem? Well, in the first place the tests (a.k.a. specification) should be consistent and easy to understand. These tests are way better than many that I see, but still it’s worth thinking about how they can be even better.

And, secondly, tests should only assert on the behaviour that they are actually interested in. Over-specifying a test makes it brittle in the face of change, and one thing we can do without is brittle test suites.

Criticism is easy – is there an alternative?

  1. Use builders to create instances at the point you need them.
  2. Use methods on the builder to express attributes that are important for the behaviour being validated in the test.
  3. Only assert on the event(s) that are directly related to the behaviour being validated in the test (that may require the writing of more tests)

This is one way you could write it in Gherkin, using Roger (the requester) and Andrea (the accepter):

Given Roger and Andrea are not connected
And Roger has made a connection request to Andrea
When Andrea accepts the connection request
Then Roger and Andrea are connected

So here’s my re-write. Note that there are utility classes/methods that would need to be written to allow this to compile:

public void AcceptingConnectionRequestFromUnconnectedUserShouldSucceed
  User roger = userBuilder.withNoConnections().build();
  User andrea = userBuilder.withNoConnections().build();


  assertThatConnectionExistsBetween(roger, andrea);

I’ve dropped the Given/When/Then format, because, though it was concise, I don’t find that it adds much when we’re at the implementation level. In fact, the very cleverness that allows a list of commands or events to be handled by the G/W/T doesn’t work for me – I find that it obscures what’s going on.

Instead, I’ve stuck to old school arrange/act/assert structure delineated by white space. I’ve posited the existence of a user builder object and a user class for use in the test code. Is this an overhead? Sure, but it will get used over and over again, and it localises the behaviour in a single place, so that when the flow of events changes, or new flows are identified, there’s only a single place in the code that needs maintenance.

I’ve also suggested the withNoConnections() method – which will likely be a no-op – to emphasise that being unconnected is important. You may consider this overkill in this situation, since there’s it’s unlikely that two freshly created users will be connected. I prefer to be explicit about these things.

A question I’m still left with is “how should we implement assertThatConnectionExistsBetween()“. My initial thought would be that it would check that the connectionCompleted() event had been recorded, but that really only checks that the event has been emitted, not that the connection has actually taken place. Without digging deeper into the domain it’s hard to know which approach is more appropriate.



, , ,




9 responses to “Unit tests are your specification”

  1. Grzegorz Gałęzowski Avatar

    “The fixture is recreated before each test, so it’s not to communicate between the tests (thankfully).”

    From what I see, the tests are in NUnit. AFAIK, NUnit does not recreate the fixture for each test, at least in version 2.x (don’t know about 3.0). That’s probably one of the reasons for setup – to reassign those fields with new values for each test as otherwise, values from the previous one would be used. This said, it only makes your points stronger.

    By the way, I find GIVEN/WHEN/THEN comments in test code useful, because they help new people remember that they should not write monster tests. As we always have someone new in the team, we always use them 😉

    1. Administrator Avatar

      I didn’t realise that about the NUnit fixture lifecycle – how horrible.

      And on the subject of GWT helping keep tests small – I’ve seen some pretty monster Gherkin in my time. I think pairing and/or peer review are probably a better way to get new team members aligned with your code philosophy.

      1. Grzegorz Gałęzowski Avatar

        You’re right on the second one. Two comments:
        1. Like I said, we use those comments to help, not to make it happen ;-). That said, they serve as a gentle reminder. All the things you said are true, however, I saw being explicit about a test consisting from three (or four, if we follow Meszaros) phases helps people getting used to thinking in terms of those phases. As a side not, I lost my faith in any process or convention being able to lead people the right path. Over the last years, I have seen so much creativity working around limitations that were going to “inspire to do the good thing”, that I think I got much closer to understanding what “people and interactions over processes and tools” means 😉
        2. Usually, we say to people that they are not allowed to have “given” and “when” after “then”. In gherkin, this is “legal”. So yes, we’re back to the education stuff 😉

  2. Joshua Lewis Avatar

    Hi Seb, I am the author of the code 🙂 Thanks for your comments. Re your comments 1 & 2, relating to the number of fields and the big Setup(), the reason is simply to reduce duplication (step 3 in TDD cycle). Those variables are used in several of the tests, with the exact same definition (construction). Instead of repeating the code for every test case, I extracted them into class fields. Setup() is there merely to construct the objects. Since https://github.com/joshilewis/lending/commit/e949dd860e24ac46bef3cf50d8c230ed6c5316d8 all test data is in a single class: DefaultTestData. The tests are much cleaner now: https://github.com/joshilewis/lending/blob/experiment/eventsourcing/Tests/Commands/AcceptConnectionTests.cs

    1. Administrator Avatar

      Now the tests are much cleaner, but the offending code has been moved elsewhere. Now there is less locality of reference – to see what’s going on I need to look in two separate files.

      1. Joshua Lewis Avatar

        I agree, I’m not sure which the best trade-off/decision is here. Improved locality of reference with duplication?

  3. Joshua Lewis Avatar

    I need to point out that the system is designed with event sourcing, CQRS and DDD.

    With regard to the rest, thanks for catching the error in the test name, I have corrected this to: AcceptConnectionForUnconnectedUsersWithAPendingRequestShouldSucceed

    I agree with your comment about ‘ShouldSucceed’ and ‘Succeed’ not being obvious. My thoughts on this are that the name may become too long (itself a smell). There is a convention in the codebase that ‘Succeed’ means ‘A success result is sent to the user, and the correct domain events are raised and persisted’. Each feature spec has exactly one ‘ShouldSucceed’ scenario.

    ‘And, secondly, tests should only assert on the behaviour that they are actually interested in. Over-specifying a test makes it brittle in the face of change, and one thing we can do without is brittle test suites.’ I’ve tried very hard to make these tests non-brittle, i.e. robust. The intent of the test is to specify the system from outside the system boundaries, i.e. by specifying interactions with all 3rd parties, starting from a user issuing a command, to the persistence of domain events, and persistence of read models. Each test has only 2 types of asserts: asserting the result sent to the user, and asserting the persistence of domain events. Since some commands affect more than one aggregate, the tests for those commands assert persistence of domain events for all aggregates affected. Note that all tests in this codebase hit an embedded eventstore client, as well as a live Postgresql instance. I don’t see this as over-specification. I’d like to understand why you do.

    In fact, the test you suggest as an improvement is more brittle in my opinion, since you’ve specified a lot more implementation. For example, the line ‘ roger.requestsConnectionWith(andrea);’ implies that we have some kind of User Aggregate. The only implementation the original tests specify is the implementation of commands, events, and queries, which are actually defined as part of the domain. A user or product owner would be able to work with the original tests without knowing anything about the system implementation. My thought process is this: “We’ve specified that users aren’t connected. How do we know this? Because there’s no AcceptConnection Command issued by any user.” That’s a conversation you have with a domain expert, using the Ubiquitous Language for the domain. This reminds me of a Tweet I read recently: https://twitter.com/mike_stockdale/status/675330704721350656 – if you include the name of the SUT in the test method, its not ‘true’ TDD, since there shouldn’t be *any* implementation in the test.

    ‘My initial thought would be that it would check that the connectionCompleted() event had been recorded, but that really only checks that the event has been emitted, not that the connection has actually taken place. ‘ In event sourcing, asserting the recording of an event is the *only* way to assert this, since aggregates are built up from history when they’re manipulated. Aggregates are not persisted, only events are.

    1. Seb Rose Avatar

      I don’t think the behavioural tests should care that your implementation uses CQRS or Event Sourcing. I think they should specify and assert externally observable behaviour (although I grant you that the meaning of ‘external’ depends on where you are looking from).

      I disagree that ‘roger.requestsConnectionWith(andrea) says anything about implementation, aggregates or otherwise. It describes the behaviour of an actor interacting with the system being developed. On the other hand, mentioning commands, events and queries explicitly in the tests is demonstrably exposing the implementation. These tests will break if the events emitted change in any way, even if the specified behaviour (connection succeeding) is still implemented correctly.

      Finally, I understand that events are what is persisted, but they aren’t what’s visible externally. What’s visible externally is that the two users are now connected and that’s what I’d want to check.

Leave a Reply

Your email address will not be published. Required fields are marked *