One of the difficult problems of testing is coming up with good test data to use. In my world that often means being able to find particular files with particular types of data in them that I can use and so I have written a utility that searches for these kinds of files. However, I have been thinking recently about how to find or generate data that I might not otherwise think of. Humans are notorious for having certain cognitive biases and as a human, I guess I’m subject to this as well. Mindfulness and other techniques can help to reduce the effect of cognitive biases, but they will never fully take them away and so I’ve been thinking about how we can leverage and use computers to generate useful input data.
There are a couple of ideas I have had so far:
There are some random generators on the web that will generate random data of a particular type (ex. floats or strings), which you could then use in your test fields. In general the ones I have been able to find are a bit too ‘nice’ though. For example, floats like 0.0 or 1.0/3.0 are much more likely to cause issues than other random float we might generate and so its seems that it would make sense to bias your float generator towards those kinds of numbers. Randomness in itself can help you find some issues you might not otherwise, but coupling that with built in biases that might counteract or complement your normal biases is helpful as well. I’ve been looking into writing some of my own data generators here to help me out with these kinds of issues as this seems to be an area the current test tooling options are weak on.
Another idea, is to let a random number generator choose my input files for me. As a creature of habit I tend to have my favorite set of input files that I use. By creating a script that will randomly pick a file from the database, I could increase the coverage of the characteristics in the files while also increasing the likelihood of running into some totally unexpected issues. By adding randomness into the test, I may increase the probability of running into a black swan event.
Those are just a couple of ideas off the top of my head. What about you? What do you use when generating input data for your tests? Have you been able to use data generators to help overcome some of your natural biases?