Dummy Data – to use or not to use for testing?

I haven’t posted for a while due to a number of reasons, so it’s time to pick up another random topic – the use of dummy data for testing.

It came up in conversation last week, and I wondered what other people’s views on this is, and how other teams approach this.

There are obvious pro’s and con’s and I’ll summarise as follows:

Pro’s:

  • Easy to create a dummy set of data for testing as and when needed.
  • There is no need to obfuscate live data.
  • Testers can create the data they need without depending upon other teams.
  • A┬ásmaller data set can be created to test against where the testers know exactly what data exists (controlled sample).

Con’s:

  • Dummy data cannot fully replicate every single type of data that exists in production, thus defects could be missed.
  • Using a smaller data set means that load test results may not reflect the size of production data (web page/web service response times).
  • Processing times on a smaller dataset will not accurately reflect what will happen in production (e.g. on an Oracle Financials database).

Ok, there are many more, but it’s food for thought.

The biggest change in recent years is the legislation requiring live data to be obfuscated on pre-live environments. I remember the days of having copies of live data (with people’s personal information) installed on numerous development and test environments! The challenge is to replicate live issues on non-live environments, and to test on live-like data prior to releasing code to production. Failure to do so can lead to defects being uncovered in production, just due to a deficiency in the actual data or the volume of the data used on a test environment.

It’s a challenge but one that cannot be ignored. Either you use hand-crafted dummy data, or obfuscated live data – either way, you cannot just take live data and test it unchanged!