From Unit Testing by Vladimir Khorikov
This part covers:
• The two schools of unit testing: classical and London
• The differences between unit, integration, and end-to-end tests
For an introduction to unit testing, see part 1.
How the classical and London schools handle dependencies
The differences between the classical and London schools was discussed in part 1, but so you don’t have to flip back and forth, here’s a table that lays it out.
Note that despite the ubiquitous use of test doubles, the London school still allows for utilizing some dependencies in tests as is. The litmus test here is whether this dependency is mutable. It’s fine not to substitute objects that don’t ever change — immutable objects.
And you saw in the earlier examples that, when I refactored the tests towards the London style, I didn’t replace the
Product instances with mocks but rather used the real objects, as shown in the following code (I copied it from listing 2 from part 1 for your convenience).
public void Purchase_fails_when_not_enough_inventory()
var storeMock = new Mock<IStore>();
.Setup(x => x.HasEnoughInventory(Product.Shampoo, 5))
var customer = new Customer();
bool success = customer.Purchase(storeMock.Object, Product.Shampoo, 5);
x => x.RemoveInventory(Product.Shampoo, 5),
Among the two dependencies of
Store contains an internal state that can change over time. The
Product instances are immutable (
Product itself is a C# enum). For this reason I substituted the
Store instance only.
And it makes sense if you think about it. You wouldn’t use a test double for the number
5 in the above test either, would you? That’s because it’s also immutable — you can’t possibly modify this number. Note that I’m not talking about a variable containing the number, but rather about the number itself. In the statement
RemoveInventory(Product.Shampoo, 5), we don’t even use a variable;
5 is declared right away. The same is true for
Such immutable objects are called value objects or values. Their main trait is that they have no individual identity and they’re identified solely by their content. As a corollary, if two such objects have the same content, it doesn’t matter which of them you’re working with, these instances are interchangeable. For example, if you’ve got two
5 integers, you can use them in place of one another.
The same is true for the products in our case. You can declare an instance of
Product.Shampoo once and reuse it or declare several of them — it wouldn’t make any difference. These instances have the same content and can be used interchangeably. You can read more about value objects in my article Entity vs Value Object: the ultimate list of differences (https://enterprisecraftsmanship.com/2016/01/11/entity-vs-valueobject-the-ultimate-list-of-differences/).
Figure 1 shows the categorization of dependencies and how both schools of unit testing treat them.
A dependency can be either shared or private. A private dependency, in turn, can be either mutable or immutable. In the latter case, it’s called a value object. For example, a database is a shared dependency — its internal state is shared across all automated tests (that don’t substitute it with a test double). A
Store instance is a private dependency which is mutable, and a
Product instance (or an instance of a number
5 for that matter) is an example of a private dependency which is immutable, a value object.
The classical school advocates for replacing mostly shared dependencies with test doubles. Although the London school stands for the replacement of private dependencies as well, as long as they’re mutable.
Collaborator vs dependency
Note that earlier, I used the term collaborator when describing the classes other that the system under test itself, and later switched to dependency. Now we can reconcile the two terms. A collaborator is a dependency which is either shared or mutable. For example, a class providing access to the database is a collaborator because the database is a shared dependency.
Store is a collaborator too as its state can change over time.
Product or number
5 are dependencies too but they aren’t collaborators; they’re values or value objects. A typical class may work with dependencies of both types: collaborators and values. Look at the method call below for example:
customer.Purchase(store, Product.Shampoo, 5)
Three dependencies are here, of which one (
store) is a collaborator, and two (Product
.Shampoo, 5) aren’t.
And the last point about the dependency types: not all out-of-process dependencies fall into the category of shared dependencies. A shared dependency almost always resides outside the application’s process, but the opposite isn’t true (see figure 5). In order for an out-of-process dependency to be shared, it has to provide means for unit tests to communicate with each other. The communication is done through modifications of the dependency’s internal state. In that sense, an immutable out-of-process dependency doesn’t provide such a means. The tests can’t modify anything in it, nor can it interfere with each other’s execution context.
For example, if there’s an API somewhere that returns a catalog of all products the organization sells, this wouldn’t be a shared dependency as long as the API doesn’t expose the functionality to change the catalog. It’s true that such a dependency is volatile and sits outside the application’s boundary, but because the tests can’t affect the data it returns, it isn’t shared. This doesn’t mean you must include such a dependency into the testing scope. In most cases, you still need to replace it with a test double to keep the test fast, but if the out-of-process dependency’s quick enough and the connection to it’s stable, you can make a good case in favor of using it.
Having that said, in this article, I use the terms shared dependency and out-of-process dependency interchangeably, unless I explicitly state otherwise. In real-world projects you rarely have a shared dependency that isn’t out-of-process. If a dependency is in-process, you can easily supply a separate instance of it to each test; there’s no need to share it between tests. Similarly, you normally don’t encounter an out-of-process dependency that isn’t shared. Most such dependencies are mutable and can be modified by tests.
All right, enough of this exercise in vocabulary and definitions. Let’s contrast the classical and London schools on their merits.
Contrasting the classical and London schools of unit testing
To reiterate, the main difference between the classical and London schools is in how they treat the isolation issue in the definition of a unit test. This, in turn, spills over to the treatment of a unit — the atomic piece of the code base that should be put under test — and the approach to handling dependencies.
As I mentioned in part 1, I prefer the classical school of unit testing. It tends to produce tests of higher quality and it’s better suited for achieving the ultimate goal of unit testing, which is the sustainable growth of your project. For now, let’s take the main selling points of the London school and evaluate them one by one.
The London school’s approach provides the following benefits:
- Better granularity. The tests are fine-grained and check only one class at a time.
- It’s easier to unit test a larger graph of interconnected classes. Because all collaborators are replaced by test doubles, you don’t need to worry about them at the time of writing the test.
- If a test fails, you know for sure which functionality has failed. Without the class’ collaborators, there could be no suspects other than the class under test itself.
There may still be situations where the system under test uses a value object and it’s the change in this value object that makes the test fail, but these cases wouldn’t be that frequent because all other dependencies are eliminated in tests.
Unit testing one class at a time
The point about better granularity relates to the discussion about what constitutes a unit in unit testing. The London school considers a class as such a unit. Coming from the object-oriented programming background, developers usually regard classes as the atomic building blocks that lie at the foundation of every code base. This naturally leads to treating classes as the atomic units to be verified in tests too. This tendency is understandable but misleading.
Tests shouldn’t verify units of code. Rather, they should verify units of behavior: something which is meaningful for the problem domain and, ideally, something that a business person can recognize as useful. The number of classes it takes to implement such a unit of behavior is irrelevant. It could span across multiple classes, only one class, or even take up a tiny method.
The aim at better code granularity isn’t helpful. As long as the test checks a single unit of behavior, it’s a good test. Targeting something less than that can, in fact, damage your unit tests as it becomes harder to understand what exactly these tests verify. Remember, a test should tell a story about the problem your code helps to solve. And this story should be cohesive and meaningful to a nonprogrammer.
For instance, this is an example of a cohesive story:
When I call my dog, he comes right to me.
Now compare it to the following:
When I call my dog, he moves his front left leg first, then the front right leg, his head turns, the tail start wagging...
The second story makes much less sense. What’s the purpose of all those movements? Is the dog coming to me? Or is he running away? You can’t know. This is what your tests start to look like when you target individual classes (the dog’s legs, head, tail) instead of the actual behavior (the dog coming to his master).
Unit testing a large graph of interconnected classes
It’s true that the use of mocks in place of real collaborators can make it easier to test a class, like when there’s a complicated dependency graph or where the class under test has dependencies, each of which relies on dependencies of their own, and so on several layers deep.
With test doubles, you can substitute the class’s immediate dependencies and break up the graph, which can significantly reduce the amount of preparations you must do in a unit test.
If you follow the classical school, you need to re-create the full object graph (with the exception of shared dependencies) for the sake of setting up the system under test, which can be a lot of work.
Although this is all true, this line of reasoning focuses on the wrong problem. Instead of finding ways to test a large and complicated graph of interconnected classes, you should focus on not having such a graph of classes in the first place. More often than not, this is a code design problem.
It’s a good thing that the tests point this problem out. The ability to unit test a piece of code is a good negative indicator — it predicts poor code quality with a relatively high precision. If, in order to unit test, you need to extend the test’s arrange phase beyond all reasonable limits, it’s a certain sign of trouble. The use of mocks only hides this problem, it doesn’t tackle the root cause.
We’ll talk about how to fix the underlying code design problem later in this article.
Test failures point to a specific class with the bug
If you introduce a bug to a system with London-style tests, it normally causes only tests with SUTs that contain the bug to fail. With the classical approach, tests that target the clients of the malfunctioning class can also fail. This leads to a ripple effect where a single bug can cause test failures across the whole system. As a result, it becomes harder to find the root of the issue, and you might need to spend some time debugging the tests to figure it out.
It’s a valid concern, but I don’t see it as a big problem. If you run your tests regularly (ideally, after each source code change), then you know what caused the bug — it’s what you edited last, and it’s not that difficult to find the issue. Also, you don’t have to look at all the failing tests. Fixing one automatically fixes all others.
Furthermore, there’s some value in failures cascading all over the test suite. If a bug leads to a fault in a lot of tests, it shows that this piece of code which you broke is of great value — the entire system depends on it. And it’s a useful piece of information to keep in mind when working with code.
Other differences between the classical and London schools
Other differences exist between these styles of testing. For example, with test-driven development, the schools lead to different takes on how to tackle the system design.
Test-driven development (TDD) is a software development process that relies on tests to drive the project development. The process consists of three (some authors specify four) stages which you repeat for every test case:
- Write a failing test to indicate which functionality needs to be added and how it should behave.
- Write enough code to make the test pass. At this stage, the code doesn’t have to be elegant or clean.
- Refactor the code. Under the protection of the passing test, you can safely clean up the code to make it more readable and maintainable.
Good sources on this topic are the two books I recommended in part 1: Kent Beck’s Test-Driven Development: By Example and Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce.
The London style of unit testing leads to outside-in TDD, in which you start from the higher-level tests that set expectations to the whole system. By using mocks, you specify which collaborators the system should communicate with to achieve the expected result. You then work your way through the graph of classes until you implement every one of them.
Mocks make this design process possible because you can focus on one class at a time. You can cut off all the SUT’s collaborators when testing it and postpone implementing those collaborators to a later time.
The classical school doesn’t provide quite the same guidance because you must deal with the real objects in tests. Instead, you normally use the inside-out approach. In this style, you start from the domain model first and then put additional layers on top of it until the software becomes usable by the end user.
The most crucial distinction between the schools is the issue of over specification. This is coupling the tests to the SUT’s implementation details. The London style tends to produce tests that couple to the implementation more often than the classical style, and this is the main objection against the ubiquitous use of mocks and the London style in general.
Integration tests in the two schools
The London and classical schools also have different definitions of an integration test. This distinction flows naturally from the difference in their views on the isolation issue.
The London school considers any test that uses a real collaborator object an integration test. Most of the tests written in the classical style are deemed integration tests by the London school proponents. For an example, see Listing 1, in which I first introduced the two tests covering the customer purchase functionality. It’s a typical unit test from the classical perspective, but it’s an integration test for a follower of the London school.
In this article, I use the classical definitions of both unit and integration testing. Again, a unit test’s an automated test that has the following characteristics:
- It verifies a small piece of code.
- Does it quickly.
- And in an isolated manner.
Now that I’ve clarified what the first and the third attributes mean (more on that can be found in part 1), I’ll re-define them for accuracy. A unit test is a test that:
- Verifies a single unit of behavior.
- Does it quickly.
- And in isolation from other tests.
An integration test, then, is a test that doesn’t meet one of the above criteria. For example, a test that reaches out to a shared dependency, say a database, can’t run in isolation from other tests. A change in the database’s state introduced by one test alters the outcome of all other tests that rely on the same database if run in parallel. You’d need to take additional steps to avoid this interference. Particularly, you’d have to run such tests sequentially, so each test waits its turn to work with the shared dependency.
Similarly, an outreach to an out-of-process dependency makes the test slow. A call to a database adds hundreds of milliseconds, potentially up to a second, of additional execution time. It might not seem like a big deal at first, but when your test suite grows large enough, every second counts.
In theory, you could write a slow test that works with in-memory objects only, but it’s highly unlikely. Communications between objects inside the same memory space are much less expensive than between separate processes. Even if the test works with hundreds of in-memory objects, such a test would still run faster than if it worked with the database.
Finally, a test is an integration test when it verifies two or more units of behavior. This is often a result of trying to optimize the test suite execution speed. When you have two slow tests that follow similar steps but verify different units of behavior, it might make sense to merge them into one: one test checking two similar things runs faster than two more granular tests. But then again, the two original tests would have been integration tests already (due to them being slow), and this characteristic usually isn’t decisive.
An integration test can also verify how two or more modules developed by separate teams work together. This also falls into the third bucket of tests that verify multiple units of behavior at once. But again, because such an integration normally requires an out-of-process dependency, the test fails to meet all three criteria, not only one.
Integration testing plays a significant part in contributing to software quality by verifying the system as a whole.
End-to-end tests are a subset of integration tests
In short, an integration test is a test that verifies that your code works in integration with shared dependencies, out-of-process dependencies, or with code developed by other teams in the organization.
A separate notion of an end-to-end test is that an end-to-end test is a subset of integration tests. They too check to see how your code works with out-of-process dependencies. The difference between integration and end-to-end tests is that end-to-end tests usually include more of such dependencies.
The line is blurred at times, but in general a test can work with one out-of-process dependency out of many to be considered an integration test. On the other hand, an end-to-end test works with all out-of-process dependencies, or with the vast majority of them. Hence the name end-to-end, which means that the test verifies the system from the end user’s point of view, including all the external applications this system integrates with (see figure 6).
You can also see people use such terms as UI, GUI, or functional tests. The terminology’s ill-defined, but in general, these terms are all synonyms. As long as the tests in question verify the system from the end user’s perspective.
Let’s say that your application works with three out-of-process dependencies: a database, the file system, and a payment gateway. A typical integration test includes only the database and the file system in scope and uses a test double to replace the payment gateway. This is because you have full control over the database and the file system, and can easily bring them to the required state in tests, but you don’t have the same degree of control over the payment gateway. You may need to contact the payment processor organization to set up a special test account. You might also need to check that account from time to time to manually clean up all the payment charges left over from the past test executions.
Because end-to-end tests are the most expensive in terms of maintenance, it’s better to run them late in the build process, after all the unit and integration tests have passed — possibly even only on the build server, not individual developers’ machines.
Keep in mind that even with end-to-end tests, you might not be able to tackle all of the out-of-process dependencies. There may be no test version of some dependencies, or it may be impossible to bring it to the required state automatically, and you may still need to use a test double. This once again shows that there’s no distinct line between integration and end-to-end tests.
- The classical school states that it’s unit tests that need to be isolated from each other, not units. Also, a unit under test is a unit of behavior, not unit of code. Only shared dependencies should be replaced with test doubles. Shared dependencies are dependencies that provide means for tests to affect each other’s execution flow.
- The London school provides the benefits of better granularity, the ease of testing large graphs of interconnected classes, and the ease of finding which functionality contains a bug after a test failure.
- The benefits of the London school look appealing at first, but there are several issues with them. First, the focus on classes under test is misplaced: tests should verify units of behavior, not units of code. Furthermore, the inability to unit test a piece of code’s a strong sign of a problem with the code design. The use of test doubles doesn’t fix this problem, it only hides it. Although the ease of finding which functionality contains a bug after a test failure is helpful, it’s not that big of a deal as you often know what caused the bug anyway — it’s what you edited last.
- The biggest issue with the London school of unit testing is the problem of overspecification — coupling tests to the SUT’s implementation details.
- Throughout this article, I’ve refined the definition of a unit test. Here’s the updated and final version:
- A unit test verifies a single unit of behavior.
- Does it quickly.
- And in isolation from other tests.
- An integration test is a test that doesn’t meet at least one of the above criteria. End-to-end tests are a subset of integration tests. They verify the system from the end user’s point of view. End-to-end tests reach out directly to all or almost all out-of-process dependencies your application works with.
- I recommend Kent Beck’s Test-Driven Development: By Example as a canonical book about the classical style. Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce is a good source on the London style. I also recommend Dependency Injection: Principles, Practices, Patterns by Steven van Deursen and Mark Seeman for further reading about working with dependencies.