Too Many Tests?

As I mentioned before, I have moved to a new team and we are working on a consolidation effort for automated regression tests that we inherited from several different teams.  I have spent the last 2 weeks setting up builds to run some of the tests in TeamCity.  2 weeks seems like a long  time to get some testing builds up and running and so I stopped to think about why it was taking me so long.  I know TeamCity quite well as I have been using it to do similar test runs for years now.  The automation framework tool that I’m working with now is new to me, but it is pretty easy and straightforward to use.  I would say I only lost about a day figuring out the tool and getting it set up on my VM’s.  So what is making this take so long?

The most important factor at play here is how long it takes to run the tests.  I broke the tests up between several builds, but each build still takes between 3 and 6 hours to run. What that means is that if I have a wrong configuration on the machine or I’ve setup wrong permission or anything like that, I have a 3 to 6 hour turnaround on getting feedback.  I did manage to help that out a bit by creating some dummy test runs that were much faster which allowed me to do some of the debugging more quickly, but at the end of the day, these long run times slowed me down a lot in my ability to get the tests up and running.

But let’s stop and think about that for a minute.  What is point of running automated regression tests? To give feedback on any regressions we might have introduced while making changes to the code right?  Well, if that is the case, how effective are these test runs at doing that?  The length of the feedback loop made it a slow process for me to find and fix issues with my builds and led to it taking much longer to get them setup than it would have otherwise.  What does the length of that feedback loop do for finding and fixing issues in the code?  The slow feedback meant that I would have to go work on some other stuff while I waited for a build to finish, which led to some of the inefficiencies of multitasking.  Does the same thing happen with long feedback loops for code changes?

I could go on, but I think the point has been made.  The pain I felt here applies to code changes as well.  Having a long running set of regression tests reduces the value of those tests in many ways! Do we perhaps have too many tests?

5 Comments

  1. Great article, and yes definitely. I see this happening a lot with automation–everything has to get run every time.

    Problem is, some of those tests are wrapped around code that’s unlikely to break. For example, I’ve seen tests that check that the order of columns in certain UI tables are maintained. Is it a problem if the order changes? Yes. Is it likely to happen? No. Does that test add value? I kinda think not.

    Same case for localized bugs. Once that bug is fixed, it’s unlikely to break in the same way again, especially if it’s some weird esoteric edge case. If a test around that bug were to fail, I’ve learned it’s usually not because that particular bug happened again, it’s because of some other higher-up issue that’s causing a bunch of tests to fail. Do these tests add value? Or are they just there to confirm that the bug got fixed and just never got removed because, hey, it’s coverage?

    The best automation I’ve seen–and the model I apply when building it–is the stuff that lets a human know that hey, there’s something wonky going on in [this part of the system], prompting the tester to go take a look. We can’t get there with tons of tests running.

    Like

    1. offbeattesting says:

      Yes, I love so much of this comment. I’ve been trying to hash out for myself some heuristics that help me figure out what tests to keep and what tests to kill (especially now that we are inheriting a bunch of tests). This comment has give me some ideas 🙂 Thank you.

      Like

  2. majd says:

    Thanks for writing this as just last week we were discussing something similar. We have set up all long run tests (Performance, Compatibility, Data Conversion, Regression) on weekly basis but then the problem is that the feedback to Dev. is one week long. If we make them run with every build, that will increase the build time for all Dev. members. So we were thinking to have a subset of those tests run to give quick feedback. What’s your take on that?

    Also when you say that prompt tester to take a look, you mean that when the first test fails in the automated run, it should stop and report to testing? Or you mean that it incrementally logs so that Tester can review results while tests are running? Thanks.

    Like

    1. offbeattesting says:

      I used to split up my tests into three types – Hourly, Nightly, and Weekend. I’ve been moving away from that though, because the feedback on the weekend tests was just too long. I’ve also tried having tests split up between ‘Quick’ and ‘Full’ where the Quick version of the test would just run simplified runs that would be much faster, but once again the full tests just took too long to give feedback if they were run on too long of a schedule and the Quick runs would fail too often for invalid reasons leading to lots of test maintenance.

      The core of the issue I think is that we have too many tests. Which begs the question: why do we have too many tests? I’ve become more and more convinced that it is an information/communication problem. We don’t understand the system and the risks and so we overcompensate by creating far to many end to end high level tests. If we were to better understand the systems and the risk and able to push testing ‘down’ to lower levels we would end up with far less tests and be less hesitant about deleting tests. One of the most important things I have learned in dealing with test automation is that it is crucial that we regularly delete tests.

      Like

Leave a Comment