Living documentation can be readable and fast

In an earlier blog I promised to describe how we could exercise thin slices of our application stack, while still expressing our scenarios in a business-readable, end-to-end style. I talked about this at Cuke Up! last week and published an article covering it in the ACCU journal Overload. For completeness, I’m now adding this as a blog entry too.

Seb Cuke Up

Let’s assume we have the example scenario below, taken from Matt Wynne’s Squeaker example, that deals with registering at some website. (This is written in Gherkin and will be executed by Cucumber.)

Feature: Sign Up
 Scenario: New user redirected to their own page
  When I sign up for a new account
  Then I should be taken to my feeds page
  And I should see a greeting message

Each line in the scenario causes a corresponding ‘step definition’ to be executed. How you implement the step definitions is up to you, and depends on the nature of your system. The example above might:

  • fire up a browser, navigate to the relevant URL, enter specific data, click the submit button and check the contents of the page that the browser is redirected to
  • call a method on a Registration object and check that the expected event is fired, with the correct textual payload
  • or anything else that make sense (e.g. using smart card authentication or retina scanning)

The point is that the text in the example describes the behaviour, while the step definitions (the glue code) specify how to exercise the system. An example glue method would be:

@When("I sign up for a new account")
public void I_sign_up_for_new_account() {
 // Do whatever it takes to sign up for a new account
 // e.g. exercise the web UI using Selenium WebDriver

Newcomers to this style of working often adopt a style in which every example is executed as an end-to-end test. End-to-end tests mimic the behaviour of the entire system and create an example’s context by interacting directly with the UI, and the full application stack is involved throughout (databases, app servers etc.). This sort of test is very useful for verifying that an application has deployed correctly, but can become quite a bottleneck if you use it for validating every behaviour of the system. The Testing Pyramid  was created to give a visual hint about the relative number of ‘thick’ end-to-end tests and ‘thin’ unit tests. In the middle are the component/integration tests that verify interactions within a subset of the entire system.

It may be reasonable to use the example scenario above as a ‘Happy Path’ end-to-end test, demonstrating that the whole application is hanging together. However, there are some other situations that emerged when this feature was discussed, some of which were:

  • what happens if the user already exists?
  • what happens if the credentials provided are unacceptable?
  • how will errors be communicated to the user?

These questions are still independent of how the system is actually going to be implemented, and we can start fleshing out some examples:

Scenario: Duplicate user registration
 Given I already have an account
 When I sign up for a new account
 Then I should see the "User already exists" error message
Scenario: Unacceptable credentials at signup
 Given my credentials are unacceptable
 When I sign up for a new account
 Then I should see the "Unacceptable credentials" error message

These extra examples could be implemented using the whole application stack, but then the runtime of the example suite begins to rise as we execute more end-to-end tests. Instead, we could decompose these examples into:

1. examples that demonstrate the correct feedback is given to the user in various circumstancesScenario Outline: Display correct error message

 When the registration component returns an <error>
 Then the correct <message> should be returned
 | error | message |
 | error-code-user-already-exists | "User already exists" |
 | error-code-unacceptable-credentials | "Unacceptable credentials" |

2. examples that exercise the validation components

Scenario: Detect duplicate user
 Given user already exists
 When the registration component tries to create the user
 Then it will return error-code-user-already-exists
Scenario: Unacceptable credentials at signup
 Given the credentials are unacceptable
 When the registration component tries to create the user
 Then it will return error-code-unacceptable-credentials

These examples should run a lot faster, but are no longer written in business language (if you want an explanation of Scenario Outline look at the Cucumber documentation). They have lost some of their benefit and have become technical tests, mainly of interest to the development team. If we choose to ‘push them down’ into the unit test suite, where they seem to belong, then we will have lost some important documentation that was meaningful to the business stakeholders.

This demonstrates the conflict between keeping the examples in a form that is consumable by non-technical team members and managing the runtime of the executable examples. Teams that have ignored this issue and allowed their example suite to grow have seen runtimes that are counted in hours rather than minutes. Clearly this limits how quickly feedback can be obtained, and has led teams to try different solution approaches, none of which are ideal:
– partition the example suite and only run small subsets regularly
– speed up execution through improved hardware or parallel execution
– push some tests into the technical (unit test) suite

In a recent blog post I introduced the Testing Iceberg, which takes the traditional Testing Pyramid and introduces a readability waterline. This graphically shows that some technical tests can be made visible to the business, while there are some end-to-end tests that the business are not interested in. We want to implement our business examples in such a way that they:
– document everything relevant to the business
– do not duplicate technical tests
– minimise the execution time of the examples

I have been experimenting with a technique that uses Cucumber’s Tagged Hooks to vary the depth of the stack exercised by a scenario without affecting the scenarios readability. The scenario below is tagged as executing without the UI:

Scenario: Duplicate user registration
  Given I already have an account
  When I sign up for a new account
  Then I should see the "User already exists" error message

This tag causes the following tagged hook to execute before the scenario runs:

public void beforeScenario() {
  without_ui = true;

This in turn changes the behaviour of our re-written step definition:

@When("I sign up for a new account")
public void I_sign_up_for_new_account() {
  if (without_ui){
    // Send information directly to registration component
  } else {
    // Drive UI directly using Selenium or similar.

The benefits of working like this are:
– we can write our examples from a user perspective (which makes it easy for the business to understand)
– we can execute the examples as thinner component or unit style tests (which keeps the runtime down)
– we can avoid duplication by using the glue to delegate directly to the unit tests where appropriate
– we can run the examples using the whole application stack and begin to thin down the stack using tags once we have built some trust in our initial implementation.

It is the business who should prioritise how to evolve a product, based on their understanding of the customers needs. Face to face communication between the business and the development team can help develop a ubiquitous language that can be used to document the behaviour of the system in a manner that is clear and unambiguous to all concerned. The examples that are produced during these conversations can then be automated, but there is an ongoing tension between the comprehensibility of end-to-end scenarios and the quick feedback of unit tests. Using Cucumber and tags it is possible to write the examples in an end-to-end style, but modify how they are executed (and hence their runtime costs) by applying or removing tags, without adversely affecting the comprehensibility of the example itself.






6 responses to “Living documentation can be readable and fast”

  1. Jessica Avatar

    How would you handle running that test for without AND with the UI? If the tag is always there, won’t you always be testing it without the UI, and therefore not ever use the else section?

  2. Seb Rose Avatar

    The piece of code that contains the if … else section is the step definition. It can be used from multiple scenarios. Some scenarios will have the without_ui tag, some won’t. Those without the tag will go through the UI, those with the tag won’t.

    The idea is that, as trust builds, we can decrease the number of scenarios that depend on thick slices of the application stack. This improves run times, while keeping the living documentation comprehensive.

  3. Dan Haywood Avatar

    Hi Seb,
    you might remember, we met at AOTB last year in Cornwall. I enjoyed watching this talk, and it inspired me to implement something similar in Apache Isis (now a TLP:

    I just committed some code in support of this, with a demonstration of how to use it in one of our examples (a hackneyed todo app). The Isis code is mirrored in github, so you can take a look-see at

    The interesting bit is the @unit and @integration tags, and the corresponding @Before annotations in the underlying step definitions. These delegate to a ScenarioExecution class, basically (if I have the terminology right) an equivalent of a “World” object. The tags cause the appropriate subclass of ScenarioExecution to be instantiated; see and

    One slight ugliness: I had to introduce a supportsMocks() guard because attempting to set JMock expectations on “real” domain services throws an exception that cannot be caught because it is thrown in the ExpectationBuilder.

    Anyway, would be interested in your thoughts.


  4. Roberto Lo Giacco Avatar

    While I appreciate the idea and the rationale I have to say I dislike the implementation because the if statement in the step definition introduces an unnecessary complexity into the step def: my suggestion is to replace the if with a polymorphic method invocation.

    1. Seb Rose Avatar

      I agree with you.

      I used an if statement because I thought it made the example clearer, but that may have been a bad decision.

Leave a Reply

Your email address will not be published. Required fields are marked *