Our automated tests help us find a lot of bugs. In our reporting system we link bug investigations to the tests that are failing because of them. I looked through this the other day and was surprised by the number of bugs that have been found with our automated tests. They hit a lot of issues.
At first blush this seems like a good thing. They are doing what they are meant to right? They are helping us find regressions in the product before we ship to customers. Isn’t this the whole point of having automated regression tests? Well, yes and no.
Yes, the point of these tests is to help make sure the product regressions don’t escape to the users and yes it is good that they are finding these kinds of issues. The problem is in when we are finding these issues. I don’t want our automation to find bugs, at least not the kind that get put into a bug tracking system. I don’t want those bugs to get to customers, but I also don’t want them to even get to the main branch. I want them taken care of well before they are something that even needs to be tracked or managed in some bug tracking system.
We find many bugs – good. But we find them days removed from the code changes that introduced them – bad. When bugs are found this late, they take a lot of time to get resolved and they end up creating a lot of extra overhead around filing and managing official defect reports. I want a system that finds these bugs before they officially become bugs, but why doesn’t it work that way already? What needs to change to get us there? There are a few ideas we are currently working on as a team.
Faster and more accessible test runs
One of the main reasons bugs don’t get found early is that tests don’t get run early. There are two primary pain points that are preventing that. One is how long it takes to execute the tests. When the feedback time on running a set of tests starts at hours and goes up from there to a day or more depending on which sets you run, it is no wonder we don’t run them. We can’t. We don’t have enough machine resources to spend that much time on each merge.
Another pain point is around how easy it it to select particular sets of tests to run. We can easily do this manually on our own machines, but we don’t have an easy way to pick particular sets of tests to run as part of merge build chain. This means we default to a minimum set of ‘smoke’ tests that get run as part of every merge. This is helpful, but often there are other sets of tests that it would make sense to run, but that don’t get run because it is too difficult to get them to run.
Improper weighting for test runs
We do have some concept of rings of deployment in our process, and so we run different sets of tests along the way. The purpose of this is to have finer and finer nets as we get closer to releasing to customers. This is a good system overall, but right now it would seem that the holes in the early stages of the net are too big. We need to run more of the tests earlier in the process so that we don’t have so many defects getting through the early rings. Part of the issue here is that we don’t yet have good data around how many issues are caught at each stage in the deployment and so we are running blind when trying to figure out which sets of tests to run at each stage. We don’t need to catch every single bug in the first ring (as that kind of defeats part of the purpose of rings of deployment), but we do need to catch a high percentage of them. At this point we don’t have hard numbers on how many are getting caught at each stage, but looking the number that make it to the finial stage, we can be pretty confident that the holes are too big in the early stages.
We work on a large application that has a lot of different parts that need to be integrated together. Many of the defects that our integration tests find are, well, integration defects. The challenge here is that much of the integration happens as one batch process where we pull together many different parts of the software. If we could integrate the parts in a more piece-wise fashion we could find some of these integration bugs sooner without needing to wait for many different moving parts to all come together. We need to continue to work on making better components and also on improving our build chains and processes to enable earlier integration of different parts of the product.
Our automation finds a lot of bugs, but instead of celebrating this we took a few minutes to think about it and came to realize that this wasn’t the good news it seemed at first. This gives us the ability to take steps towards improving it. What about your automation? Does it find a lot of bugs? Should it? What can you do to make your automation better serve the needs of the team?
Hey, interesting article. You don’t mention what type of automation you are doing, but I’m guessing it’s coded gui tests based upon the fact they take too long to run consistently.
I always consider coded gui tests as “Automated manual regression”, which is why they take so long to run. They can replace people doing what people do. They shouldn’t find many bugs because your other automation and manual testing should have picked those up long before.
Ideally you will have a set of integration tests, api tests, unit tests etc etc which can be run against each build, component or integration.
However, I don’t know your product and for some, particularly older developments, automation is hard and even not cost effective.
Our tests aren’t quite GUI tests (although we do have some of those). They are more like very high level integration tests. I think they have been historically thought of as having value by replacing the work that people do. I don’t agree particularly agree with that line of thinking as representing valuable automation and hence the work to dramatically change these tests to fit more with the situation you described as ideal.