Effective Testing Part I - Foundations
Software testing is an area that I have spent a large amount of time and energy in my career looking at. Often test infrastructure and frameworks are considered after-thoughts to high-end development and architectural work. This is in spite of the fact that the efficacy of tests can be one of the key enablers-of or hindrances-to quick and pain free delivery of software.
So in this series, I wanted to explain my thoughts and philosophy on software testing. This isn’t meant to be overly prescriptive or exclusionary to other approaches, only what I have found works well over the years, in the hope that for some it might provide new ideas and inspiration in their testing troubles.
One of my first forays into trying to understand how to improve an ineffective test suite came from reading XUnit Test Patterns by Gerard Meszaros. Indeed, this book shaped my views on testing massively, and this post can be viewed largely as distillations of some of the wisdom from this book. I feel this book is still highly relevant nearly 20 years after publishing and would recommend it for anyone, and much of the subject matter is much broader than strictly Unit Tests.
In future posts I hope to expand upon the material with others, but for now focus on just an important question, What are good tests?
What Are Good Tests
In my view the following things are true of an effective test suite:
- Tests should stop bugs.
- Tests should be fast to run.
- Tests should be easy to read and understand.
Tests Should Stop Bugs
I’ll never forget being annoyed when during our QA cycle we found a showstopper bug before release, a pretty fatal crash. This crash occurred in a file that had 100% test coverage, and yet in actuality it crashed 100% of the time and worked 0% of the time1.
At first glance this might seem obvious, but this principle comes up in a few ways. In the software testing literature, there is an idea of Defect Detection Efficiency, the idea of how effective our tests are for actually detecting defects. Different kinds of testing can be measured as to how effective they are. In the Economics of Software Quality the authors state that according to research Unit Tests are able to remove about 35% of shipped defects in software.
Often times tests as written may not stop bugs and two of the key reasons I’ve seen this:
- Some tests overly rely on mocks, to the point where the tests become tautological.
- Tests can be written for other reasons, such as ensuring code coverage, developer auto-pilot, or auto-generated by an AI agent.
Something I keep in mind, and advise others to keep in mind is ensuring that the tests are going to stop some bugs. It’s not that rare to come across a test, that really won’t stop any bugs, and in those cases, I advise people to delete them.
Tests Should Be Fast
Another important facet to effective testing, is execution speed.
Slow tests are the bane of most developer’s existence and lead to worse outcomes as the tests are occasionally skipped or not run, or releases slow down. The test pyramid encourages developers to avoid E2E tests as they are “much slower”, and often times these tests can only be run nightly, or there is push back in running them in regular pipelines. While optimizing the release process in my last company, there was considerable push-back to running E2E tests as part of a release when they took 90 minutes, but these became much more tolerable when they took 10 minutes2.
There isn’t much to say here other than driving down the time to execute the tests helps improve the CI/CD life cycle, and the developer experience.
What I will say, is this is often an area that benefits the most from heavy involvement from senior technical leaders, especially those from outside QA roles. In effect, changes to how the code or architecture of a platform is structured can greatly enable much faster, more streamlined and reliable tests. As an example, using knowledge that I had gleaned from High Performance MySQL 3rd Edition, I was able to reduce the overhead of our database tests from about 5 seconds, to under 100 ms3.
Tests Should Help Understand The System
Finally, another key point that is maybe the least appreciated, is that tests should help developers understand the system. Tests are key to building developer confidence that the system works, and that confidence can only be established if we believe and understand the tests.
The Four Phase Test from XUnit Test Patterns or Given/When/Then from Cucumber provide a coherent structure for structuring individual tests, that aid in understandability. When I gave a talk on this topic at work, I showed our longest test, and it was 4 screenfuls of code that no one understood.
Perhaps one of the clearest things that sticks out from XUnit Test Patterns is what I call Meszaros’ Law4:
When something is important to understanding a test it is important that it be in the test.
When something is NOT important to understanding a test it is important that it NOT be in the test.
As AI agents are writing more and more of our code (nearly 100% for me since late 2025), and we move from authors to reviewers, this principle is even more important, as what has been very effective for me is ensuring that I can understand the tests quickly, understand how much is being tested and how much confidence I have that things are working.
Summary
In this post, I provided an overview of what I think a good test looks like, focusing on ensuring that tests help stop bugs, execute quickly, and are understandable. This largely comes from my read of XUnit Test Patterns, which as I mentioned was a game changer for me. However, the catch was XUnit Test Patterns could only take us so far, and there was still something missing.
In the next part of this series, I hope to build upon this one by focusing on what level I think is best for writing tests (hint: think pentagon not pyramid 😲), and then techniques for doing that at scale.
-
The issue had been that the constructor had been mocked out, and the super class was throwing an exception, so the class could never be successfully loaded, again 100% coverage :D. ↩
-
Books such as Accelerate, suggest perhaps running these tests only daily (p.90), because they are slower. ↩
-
The first major improvement was running the tests against a MySQL instance running on a RAM disk, since we needed no durability. The second was a hand tuned my.cnf file, which turned off and changed almost everything not needed for tests. ↩
-
The second half of this quote is on (p.90), the first half I’m not sure is directly stated but heavily implied in the Test Smells, and Tests as Documentation. ↩