“The regression tests are all passing, so we can merge this code.” How many times have you heard or said something similar to this? It is a very common way of thinking and reveals to us something about the way we view automated regression tests. We see their main purpose to be giving us confidence that it is ok to release new code to the next stage of production. This is all well and good and probably should be one of the main purposes of automated regression tests, but do you know how good your tests are at doing this? If you trust that running your regression suite means that it is probably ok to release your code, is that trust well founded? How do you know?
I was recently faced with these questions. A regression test started failing and so I dug into it. Nothing out of the ordinary there, but as I looked at the failure I was puzzled. It looked like it was doing the right thing, and sure enough after a bit more digging it turned out that we had encoded a check that asserted the wrong behavior as being correct. Recent changes had caused this bug to get fixed, but we had been running this test for months with it explicitly checking that the bug was there and passing the test if it was.
This led to bit of existential crisis. If this test is asserting on the existence of a bug, how many other tests are doing the same thing? How can I know if my tests are any good? Do I have to test all my tests? And then who tests the testing of the testing? This could get really crazy really quickly so what should I do?
Ok, what I need to do is think about ways that I can evaluate how well founded my confidence is in these tests. What are some heuristics or indicators I could use to let me know if my tests should be trusted when they tell me everything is ok?
I want to emphasize that the ideas below are just indicators. They don’t cover every circumstance and I certainly wouldn’t want them to be applied as hard and fast rules, but they might indicate something about the trustworthiness of the tests.
Do they fail?
One indicator would be looking at how often they fail. If the tests rarely fail then they might not be a helpful indicator of the quality of a build. After all if the tests never fail then either we haven’t broken anything at all (hmm), or we aren’t checking for the kinds of things that we actually have broken. A note here: when I say fail, I mean fail in ways that find bugs. Failures that merely require test updates don’t count.
Do they miss a lot of bugs?
The point of regression tests is to find out if you have caused things to break that were working before. If we find a lot of bugs of this sort after the automated scripts have run, those scripts might not be checking the right things.
Do they take a long time to run?
What is the average run time per test? Long running tests may be an indicator that we are not checking as much as we could. As a rule of thumb, long running tests are spending a lot of time on setup and other activities that aren’t actively checking or asserting anything. If you have a lot of these in your test suite, you might not be getting the coverage that you think you are.
When is the last time I looked at this test?
Tests age, and they usually don’t age well. If you haven’t looked at a test script in a long time, it is quite likely that it’s checking things that just aren’t as important as they used to be. Regular tests maintenance and review is essential to keeping a useful and trustworthy test suite. And don’t be afraid to delete tests either – sometime things just need to go!
So there you have it. A few indicators that can be used to give you an idea of the trustworthiness of your automated regression tests. I’m sure you can add many more. Feel free to share indicators you use in the comments below!