Building proper testing environments to perform quality assurance for new applications requires considerable time, effort, and resources. One of the most difficult aspects of setting up an effective environment is the creation of comprehensive test data for debugging the code, verifying its accuracy, and ensuring that the application adheres to the intended business rules. Let’s look at some of the issues involved in creating test data. Then, we’ll offer a few pointers and look at some tools that can make the job easier.
Problems with manually created data
Creating test data has traditionally been a manual process often performed by the same technical team that designed and built the application. Unfortunately, there are potential drawbacks to the manual approach.
One risk is that you’ll end up with an incomplete test database. Since new applications consist of hundreds of data elements and business rules, extensive time and resources are needed to create a test database that verifies all the application’s functions with every possible combination of input values. And even with all of that effort, it is still possible (even likely) that something will be missed.
Another problem with manually producing test data is data bias, which is the tendency to create data that is focused on a particular set of application functionality. This leads to a tunnel-visioned approach for the test data creation.
The entire issue of test data creation is exacerbated by complicated data relationships. If you create a table of test data containing order header information, you will need to make sure you have appropriate related records in other tables for information such as customer name and address (the customer master file), order line items, shipping address files, and product information files. Some of these ancillary files can be included in their entirety, but in the case of the order header and line items, you must be sure to maintain the referential integrity of those files.
These problems, as well as other shortcomings of manual test data creation, can be avoided by carefully considering how you are getting your data. The next section outlines some best practices to follow to create better, and more meaningful, test data.
Keys to building good data
Here are some general guidelines to follow when building a test database:
- Randomize: Select records at random when building test data from an existing database. Don’t select only the first 1,000 records or all records between two given dates. This leads to “scoping” your data to a given environment and reduces your chances of obtaining the widest variety of feedback. When creating data, involve several people with different backgrounds. To eliminate bias, do not allow development team members to create this data. One simple algorithm for creating a sample might be to extract every nth record. So, for example, a thousand-record sample can be created from a million-record database by selecting every 1000th record.
- Ensure even sampling: In the case of geographic information, you can ensure even sampling when extracting data from existing databases by including an equal number of records for all regions.
- Build an adequate sample: Large samples will better represent the live data and will be less affected by sampling error. A thousand records are better than a hundred. A million are better than a thousand, and so forth.
- Assure completeness: Verify that all data elements will be populated and tested. Make sure that samples verify all required attributes for each data element, such as data type, length, date ranges, and code sets. If there is a low frequency of records that carry certain data, you will need to modify your extraction utility to specifically seek out these records or manually create them. Further, make sure that you have appropriate child records for any parent records you extract or create.
Data generation tools
If all this seems like an impossible task, help is available. A new generation of tools is available for building test data. Popular examples include Datatect, CA-Datamacs/II, QTEST, and DataFactory.
Most of these tools generate test data by focusing on the applications’ data structures. Among the features you’ll find are:
- Elaborate GUIs to drill down into individual tables.
- Filters to enable parameter drive data selection.
- Large prepopulated databases for common business data elements, such as name, address, phone, Y2K dates, and Social Security numbers.
- Maintenance of referential integrity.
- Direct access to common database environments, including Oracle, DB2, MSSQL, Flat File, VSAM, DL/I, etc.
Worth the effort
Creating good test data can be very difficult, especially when you’re working with multiple related tables. However, whether you have to create the data manually or you have tools and utilities to do it for you, it’s essential that you produce data that will test all of the requirements of the new system.