Occasionally, software developers get involved in discussions about what a good goal for code coverage is.
What’s the right answer? Well, clearly it’s 100%. After all, why wouldn’t you want to have all of your codebase covered by automated tests?
That was easy. Next question?
Actually, it might not be that easy. This is a hotly debated topic within the field of software development - certainly deserving of a better answer than a flippant two-sentence response. And most folks who are familiar with code coverage as a metric would strongly disagree with my answer. There are serious problems with trying to achieve 100% code coverage:
- It’s almost impossible to get 100% code coverage for any significant body of code. Covering every branch, every piece of error handling code, and every piece of UI code is extremely difficult. (I say “almost impossible” because SQLite has done it. SQLite has 711 times as much test code as it has production code.)
- Even if it were possible, it’s likely to be so labor-intensive that you’d get more value from your software development efforts elsewhere. (The effort was valuable for SQLite, but SQLite may be the most widely deployed software library in the world; its requirements are different than many projects’.)
- Even if you attained 100% coverage, it doesn’t mean that the software actually works. Maybe you attained 100% line coverage but missed a branch. Coverage can’t tell you about code that doesn’t exist but should (i.e., bugs because of cases that you didn’t consider). Maybe your code does what you intend but not what the customer needs.
- Making 100% code coverage the target can encourage counterproductive behaviors. Maybe you warp the code to facilitate testing (“test-induced design damage”). Maybe you consciously or subconsciously start to focus on covering low-value or low-risk code that’s easy to test instead of focusing testing efforts where they do the most good, or worse, developers can start gaming the metrics: writing “tests” that run the code but don’t actually validate anything about its behavior.
So why do I say 100%? Because, even if it’s not practically attainable, it represents a worthwhile ideal: the ability to have a computer test every piece of our code and provide at least some level of evidence that it’s doing what we intend. To further explain, let’s consider another question. What’s a good goal for the number of bugs in a program?
Anyone who’s been involved in software development for any length of time will notice serious problems with that answer:
- It’s impossible, or at least effectively impossible, for software of any complexity to be bug-free. All software of any complexity has bugs. Even the space shuttle software project, which was legendary for the care and time and budget that it devoted to writing thoroughly checked, bug-free code, had a few errors.
- Even if it were possible, it’s likely to be so labor intensive that you’d get more value from your software development efforts elsewhere. The space shuttle program had a team of 260 people working on a codebase of 420,000 lines. It had a 40,000 page written specification and a budget of $35 million per year.
- Even if you attained it, it doesn’t mean that the software does what you need. Software that’s bug-free but is too hard to use or that solves the wrong problem is of little value.
- Making zero bugs the target can encourage counterproductive behaviors. The easiest way to not ship any bugs is to not ship any code. Even if you don’t go to that extreme, an overemphasis on avoiding bugs could lead to slowing down the development process by continually adding checks and layers and quality gateways, instead of finding ways to move with agility while maintaining an appropriate level of quality.
So, zero bugs is impossible, and making it the top priority is a bad idea, but we have no problem saying that it’s our goal. We recognize it’s a worthy aim, and we practice skills and techniques to help us get closer, and, when we do create bugs, we look for ways to do better in the future. But we can also have intelligent conversations about bug priority, and about when the software has an acceptable level of quality, and about whether bugs need to be fixed now or scheduled for later or, if they’re prohibitively difficult to fix or are too low-impact, deferred indefinitely in favor of features that bring more benefit to more users.
I’d suggest that code coverage is the same. 100% code coverage is not practical, but we can make it a goal to have complete automated test coverage, and we can look for ways to help us get closer, while also having intelligent conversations about how to balance that with other priorities. After all, design is the art of balancing constraints and balancing sometimes conflicting goals.
What does this look like in practice?
- We can establish the habit of, whenever we touch a piece of code to add or change functionality, we also try to add one or more automated tests relating to that code. This lets us start small and improve test coverage as part of ongoing software development. Making quality part of the software development process like this can actually pay for itself, by speeding up future development, instead of taking time away as a separate activity.
- We can trend code coverage metrics as part of our build process to verify that, even if they’re not where we’d want, they’re improving over time rather than getting worse.
- We can look for ways to make our code more testable: “clean code” principles such as separation of concerns and dependency injection, functional programming ideas of clearly defining inputs and outputs and avoiding side effects, moving business logic out of the view or UI layer, and so on. There will always be regions of our codebase that aren’t practical to automatically test, but these practices help keep those not-practically-testable areas small.
- If we find a bug, we can look to see if it’s practical to improve our test coverage in that area to help prevent future bugs - and, if not, we get on with our work, instead of spending disproportionate development time just to meet a code coverage metric.
- We can adopt the mantra of “I better have a good reason” to change a piece of code without adding automated tests. This helps give us the discipline of moving toward the goal of better test coverage (instead of rationalizing lack of testing due to bad habits or broken windows) while staying pragmatic enough to recognize that sometimes there are good reasons to omit a test.
- We recognize that code coverage is asymptotic: the closer we get to 100%, the harder it gets. As we increase our test coverage, we pay attention to how much energy and effort those automated testing efforts are costing us compared to how much value they’re delivering.
- If it helps us, we can pick a code coverage target as a rule of thumb. Many folks in the industry have found that 80-85% coverage strikes a good balance of getting value from automated tests without getting into the asymptotic costs that from trying to actually achieve 100%. (Corgibytes has given similar rules of thumb to our own clients.) But these are just rules of thumb. One of Corgibytes’ core values is craft in context: we’re trying to balance the goal of being able to fully test our code against our other goals, instead of just targeting some semi-arbitrary code coverage percentage.
Simply picking a coverage number as a target has problems. First, I’ve seen large codebases with coverage values of 40% or 15% or 0%; for a codebase like that, arguing over whether the goal should be 100% or 90% or 85% seems purely academic because any of those numbers are so far off as to seem unattainable. Often, a better answer to the question, “What should be my goal for test coverage?” is “0.1% more than yesterday.” Second, some of the online discussions advocating for high code coverage communicate not only “Here is a useful practice for your software development” but also “and you’re a bad developer if you don’t do it.” I must confess here that my own testing skills and habits aren’t where I’d like for them to be, and they fall short of what many automated-test purists argue for, and I’ve written plenty of poorly covered production code. Improving test coverage is often as much a matter of practicing my own skills - looking for opportunities to add tests, making sure I take the time to practice writing tests - as it is simply a task of incrementing a coverage percentage.
The idea here is incremental improvement. We’ve had good success taking poorly tested codebases and, through consistent focus on improving automated testing and the code’s testability, making slow and steady improvement in turning brittle code into something much healthier. And, really, “A little bit better than the day before” is a good goal for any of us.
- “Flaws in coverage measurement”, by Ned Batchelder, looks at more ways in which code coverage can fail to find bugs.
- Using bugs found as feedback for areas in which test coverage can be improved is one example of what Arlo Belshee calls safeguarding.
- Thoughtfully applying additional testing techniques such as mutation testing (like Stryker) and property-based testing (like Python’s Hypothesis) can be more effective than simply trying to drive up the code coverage percentage.