Embracing the Red Bar: Safely Refactoring Tests

SEP 20, 2016 • Written by M. Scott Ford

Do you ever refactor your test code? If not, I hope you consider making this part of your normal practice. Test code is still code and should adhere to the same high standards as the code that’s running directly in production.

As important as it is, refactoring your test code is actually a little risky. It’s very likely that you could turn a perfectly valid test into one that always passes, regardless of whether or not the code that it covers is correct. Let’s explore a technique for protecting against that possibility.

But before I dive into the nitty-gritty, let me tell you where I discovered this technique and why I feel it should be part of everyone’s refactoring practice.

Backstory

I first read about “Refactoring Against the Red Bar” years ago while subscribed to a blog that was published by Michael Feathers. The article is still available, which is good, because it turns out that Michael Feathers isn’t the originator of this idea either. He discovered it while talking with Elizabeth Keogh at the XP2005 conference, and his article points his readers to an article that she wrote about the technique.

I’m often guilty of assuming that because I’ve read something once, that everyone else already knows about it. And that happens with this technique as well. I was at the first Mob Programming conference held by Agile New England in Cambridge, MA, and I was participating in a mob where one of the members suggested refactoring the tests. I mentioned the risk of invalidating the tests by doing that and that we should refactor against the red bar to defend against it. The initial response was a mixture of blank stares and confused looks. Once I described the technique and demonstrated its merits by navigating the mob, people were very excited, and several asked why they’d never encountered the technique before. I mentioned its origin, and I was encouraged by the rest of the mob to promote the idea further.

That lead to both this article and a talk that I’ll be presenting at Agile DC 2016. And, that’s also why I’m rehashing someone else’s idea: it’s a really good one, and more people need to hear about it. A big thanks to Elizabeth and Michael for their earlier work on this technique.

What Makes TDD Work So Well?

The short answer is that when you strictly follow the TDD steps, you’re guaranteed that your tests will fail if the implementation breaks. And that’s the bedrock that makes it so safe to continue refactoring your code.

In case that’s not super clear:

There are three simple steps to TDD which we continue to repeat until our application does everything that we want it to do.

  1. Write a failing test
  2. Write the simplest production code possible that will make it pass
  3. Refactor production code to make the implementation clean
  4. Do we need more tests? If ‘yes’ then start back at 1, if ‘no’ then we’re done

I kind of lied about there being only 3 steps. There’s an important bit that the typical TDD process description glosses over. What often gets left out is a check step that I’ve included above as step number four.

I find that to be a very important addition to the process. If you don’t know that step is coming, you sometimes try to violate the most important clause of step 2: the simplest production code possible. And I think this is worth demonstrating with a simple example.

TDD Example: Calculator.Add()

My favorite kata to use to demonstrate this concept is building a simple calculator using TDD and starting with the addition function. Although, I’m using ruby for the code samples, please remember that this technique is not language specific. I’ll intentionally avoid any ruby features that might not read well for those less familiar with the language.

Here’s what our first failing test might look like:

describe 'additon' do
  specify 'adding two numbers that result in 4' do
    calculator = Calculator.new
    result = calculator.add(2, 2)
    expect(result).to eq(4)
  end
end

So, what would be the simplest way to make that test pass? How about this:

class Calculator
  def add(left, right)
    return 4
  end
end

The reaction I often hear at this point is: “But that’s not complete. It’s not going to work for 2+5!” That’s when I remind people about the implicit step 4 that I outlined above: Do we need more tests? It sounds like the answer is a resounding yes. At least if we want this calculator to be able to do more that just claim that any two numbers result in 4 when added together.

Here’s what the tests might look like now:

describe 'additon' do
  specify 'adding two numbers that result in 4' do
    calculator = Calculator.new
    result = calculator.add(2, 2)
    expect(result).to eq(4)
  end

  specify 'adding two numbers that result in 5' do
    calculator = Calculator.new
    result = calculator.add(2, 3)
    expect(result).to eq(5)
  end
end

And now we have a failing test which confirms the objection to the original implementation: just returning 4 isn’t good enough.

So what’s the easiest way to make just the new test pass? Well, that would be to return 5.

class Calculator
  def add(left, right)
    return 5
  end
end

And that will cause our new test to pass. But we’ve got a problem. Our previous test is now failing.

So what’s the easiest way to make them both pass? That would be to actually do the work.

class Calculator
  def add(left, right)
    return left + right
  end
end

Now, we have two tests that are working together to force our implementation to work the way we want it to. And it’s by only having both tests that we’re able to safely refactor.

That’s how TDD is able to guarantee that our test suite is a complete description of our production code. But that guarantee gets invalidated as soon as we just simply refactor our test code.

Why is it dangerous to blindly refactor test code?

Strictly speaking refactoring should never result in a change in behavior. That would violate the definition of refactoring: changing an implementation without changing its behavior. But mistakes sometimes happen, especially if you’re working without an automated refactoring tool and you’re applying the refactoring by hand. And keep in mind that a bug in your refactoring tool isn’t likely, but it’s not impossible, either.

Let’s look at a hypothetical refactoring scenario where we attempt to remove some duplication in our test suite.

We’ll start with the test suite that we finished with earlier:

describe 'additon' do
  specify 'adding two numbers that result in 4' do
    calculator = Calculator.new
    result = calculator.add(2, 2)
    expect(result).to eq(4)
  end

  specify 'adding two numbers that result in 5' do
    calculator = Calculator.new
    result = calculator.add(2, 3)
    expect(result).to eq(5)
  end
end

Let’s refactor this to remove some of the duplication:

describe 'additon' do
  [
    { left: 2, right: 2, result: 4 },
    { left: 2, right: 3, result: 5 }
  ].each do |example|
    specify "adding two numbers that result in #{example[:result]}" do
      calculator = Calculator.new
      result = calculator.add(example[:left], example[:right])
    end
  end
end

If we were to run that, it’ll pass. But did you catch the mistake that was made during the refactoring? The assertion has been removed by accident. The code should look like this:

describe 'additon' do
  [
    { left: 2, right: 2, result: 4 },
    { left: 2, right: 3, result: 5 }
  ].each do |example|
  specify "adding two numbers that result in #{example[:result]}" do
    calculator = Calculator.new
    result = calculator.add(example[:left], example[:right])
    expect(result).to eq(example[:result])
  end
end

How do we defend against breaking our tests when we refactor them?

When we refactor our production code, it’s the safety provided by our test suite that lets us safely refactor our production code. So how can we get that same safety when we need to refactor our test code? To do that, we have to break our production code in a way that will cause the tests we want to refactor to fail. If they don’t fail, then we’ve got a problem. Either we didn’t break the production code correctly, or our tests didn’t work in the first place. If we can’t force our tests to fail, then they’re not doing us any good and that would need to be addressed before continuing any further.

Once those tests fail correctly, we can refactor them. And after every tiny refactoring we do, the tests should still fail. If any of those tests start to pass, then we’ve made a mistake in our refactoring somewhere. That’s what would have happened with the mistake that was introduced in the example above.

After we’re done refactoring our tests, we can revert the changes we made to break our production code, and that should cause all of our refactored tests to start passing. If any of our tests still fail, then we’ve also made a mistake in our refactoring. But this time, instead of creating a test that always passes, we’ve created one that always fails.

Before we walk through a couple of examples, let’s simplify the workflow a little bit for review.

  1. Break our production code to cause our test to fail
  2. Refactor our test code
  3. Ensure that our tests still fail
  4. Revert changes to production code
  5. Verify that tests once again pass

A Red Bar Refactoring Example

Let’s start with the example code that we created in the TDD example above.

describe 'additon' do
  specify 'adding two numbers that result in 4' do
    calculator = Calculator.new
    result = calculator.add(2, 2)
    expect(result).to eq(4)
  end

  specify 'adding two numbers that result in 5' do
    calculator = Calculator.new
    result = calculator.add(2, 3)
    expect(result).to eq(5)
  end
end

And here’s the production code that makes those tests pass:

class Calculator
  def add(left, right)
    return left + right
  end
end

Time to refactor our tests. There’s a little bit of duplication in there. Here’s what I see:

  • The Calculator class is instantiated by every test
  • Each test just calls add with different parameters and expects a different result

Let’s tackle just the first one while refactoring against the red bar.

The first step is to force the tests we’re changing to fail by intentionally breaking the production code. This should do that:

class Calculator
  def add(left, right)
    return 0
  end
end

We verify that by running our test suite and making sure that the tests that we want to change are failing. It’s okay if more tests are failing, too. But it’s super important that all of the tests that you intend to refactor are failing. If the tests that you want to refactor are not failing, then you need to keep changing your production code until they do. And if you’re unable to make those tests fail, then you need to jump down to the “What if something goes wrong?” section.

Now, we can safely extract the instantiation of the Calculator class, which might look something like this:

describe 'additon' do
  let(:calculator) { Calculator.new }

  specify 'adding two numbers that result in 4' do
    result = calculator.add(2, 2)
    expect(result).to eq(4)
  end

  specify 'adding two numbers that result in 5' do
    result = calculator.add(2, 3)
    expect(result).to eq(5)
  end
end

Now the creation logic for building a proper Calculator instance has been factored out into one spot.

We need to re-run our test suite to make sure that we didn’t make a mistake. Remember, though, we’re actually expecting the tests to fail.

In this case, our tests still fail in the way we expected them to. So we can now proceed by restoring the original implementation of our production code:

class Calculator
  def add(left, right)
    return left + right
  end
end

Finally, we run our test suite to make sure that everything is passing.

What if something goes wrong?

There are a few points during the red bar refactoring process where you might encounter a surprise. I mentioned these earlier, but I’m reiterating them for easy reference.

I’m changing my production code, but I can’t make my tests fail!

If you encounter this issue, then you already have a test that’s producing a false positive. You’ve got a couple of options at this point.

  1. Delete the test and treat the production code as untested, legacy code

    A test that always passes is providing just as much value as one that’s always failing. At best, reading it will give you an indication about the test authors’ original intent, but it’s essentially documentation that’s drifted out of sync with your implementation. Deleting the test is a perfectly acceptable option in this scenario. Your test suite will be no weaker without the false positive test than it was. Now that you’ve discovered the false positive, it’s best to go ahead and write something that covers the production code the test was intending to run.

  2. Review change history and try to restore a working version of the test

    Take a peek at the tests history in your change control system if you have that available. If you discover that you’re looking at the first version of the test that was committed, then this test has always been a false positive, and you’ll need to follow the previous option instead. If there is history for the test that you’re working on, then attempt to revert those changes while your production code is still modified to simulate a failure. If the older version of the test fails, then you can revert your production code modifications, and see if the test passes. If so, then you’ve found a valid version of the test. If that version still has some refactoring that you’d like to do, then you can go ahead and start the process over again.

I refactored my test, and the production code is broken, but now the test is passing!

In this scenario, you’ve broken your test. The easiest way to see this is to remove a critical assertion that would otherwise force the test to fail, or perhaps a subtle change has been made to the way the test setup process was running. Revert the changes that you’ve made to your test and try again.

I’ve successfully refactored my test, and I’ve reverted my changes to the production code, but now the test is still failing!

Take a close look at the reason that the test is failing. I’ve seen this scenario happen most often because of an invalid type reference or syntax error. Since you’re refactoring against the red bar, those things can sometimes slip into your test as you’re refactoring. In this case, fixing that error should make your tests pass again, but you’ll want to repeat your production code changes to double-check that you haven’t accidentally introduced a false positive while trying to get the test to pass again.

It’s also possible that you’ve added an assertion that the production code can’t make pass or the test setup is different than it used to be, and the logic that used to be running no longer applies. If this is the case, then you’ll need to revert the changes that you’ve made to your test and start over.

What have we learned?

Following this simple technique is a great way to safely ensure that your tests suite still correctly tests your production code while you’re refactoring your test code. This is a practice that I follow anytime that I modify a test, and I’ve been doing so successfully for several years now. It’s practically second nature at this point.

I’d love to hear from others who are using this technique, and I’d also enjoy hearing from people who are starting to work this technique into their practice. Are there any challenges that you’re encountering? Figured anything out that others should know about? Please share in the comments below.