“Now that is weird.”
I stared at my screen and puzzled over the problem. How could this happen?
It had all started with a 500 error due to a missing config. When I checked the config panel, sure enough, the config was missing. We were searching through pull requests trying to figure out what had gone wrong, when one of my co-workers said that it seemed to be there for her.
So I refreshed the page. And there it was.
How can that be? We didn’t make any code changes to the site so how can a config that wasn’t there a minute ago just appear?
But oh well right? The initial thing I was trying to do was working now, so maybe it was just some network glitch or something. I felt the temptation to just shake my head, attribute it to the whims of the software gods, and move on. But I decided to be curious and to see if I could find out anything else.
I posted on our company’s testing slack channel asking for ideas and within minutes I had a response that led to understanding the issue. We were creating the variable too late in the process and so the site didn’t know about it the first time we tried to use it. We got a fix together and took care of the issue, but I think there are a few important lessons to learn.
Be curious – It would have been easy to shrug my shoulders and move on, but that would have led to a bunch of clients hitting weird random errors when they first started using the feature. Not a good first impression!
It takes a team – I might have noticed the issue, but without the help of others it would have taken a long time to understand what was going on. We can’t know everything about everything. It really does take a team to solve problems like this
Open communication – A team isn’t a team without good communication and if it wasn’t for open communication channels like Slack, it would have taken much longer to get to the bottom of an issue like this.
Intermittent bugs are still bugs – Just because you can’t reproduce something, doesn’t mean it doesn’t matter. The only way to reproduce this bug is to create a whole new site. The problem is we create hundreds of sites for clients for each release. So even though I can’t reproduce this bug (at least not without spinning up a whole new site), it would still be a high impact bug. The more users and usage you have, the more important intermittent bugs become. Don’t just ignore them!
Did that bug really go away? Don’t assume it too quickly!
Photo by Sebastian Pichler on Unsplash
I’ve seen a few bugs where my initial reaction was “Did I just see that?” It was the sort of bug that you aren’t even certain you have actually seen. These are normally bugs that occur under very particular circumstances and the trick is spotting exactly which combination of circumstances those are. The main thing is to remember them and look out for them in future, even if it is days or weeks between their occurrence.
The best example I recollect was some very anomalous behaviour in the front-end of an app I was testing. It turned out that it only manifested itself on certain pages where the path from the “Next” button at the foot of one page to the first field for completion at the top of the next page passed through a clickable incremental time counter, and where the user had a wheel mouse and used it to speed up pointer movement whilst scrolling up the page. Only certain pages in the workflow were configured so that the path of the pointer passed through these counters. It took three months for me to confirm a) that the bug was actually happening, and b) how to replicate it. The devs spotted it on the same day that I did!
Thanks for sharing that story. There sure are a lot of sneaky bugs hiding out there 🙂