Invariably, whenever I teach a lesson on database query and reporting, questions arise involving data quality:

  • Why can’t we find out how many days our invoices are overdue?
  • Why do our age-based promotions (such as ones directed toward 50 and over) have such low response rates?
  • Why do over 20 percent of our catalog mailings come back with addresses unknown, or worse still, with customers complaining they are getting catalogs in triplicate?

No matter how well they master the material, users should not use any type of CRM or ERP reporting tool unless they can be assured that the software will give them the correct results. In this article, we’ll discuss some methods your organization can use to address these data-quality issues.

Help users trust your data
When database administrators ask me what they should do to handle user complaints about their databases, I usually respond by asking them if they’ve looked into the quality of their data. Good, quality data is accurate, consistent, and nonredundant. Have they examined their data for misspellings, typographical errors, out-of-range values, or incorrect data types? For example, if users are complaining that they cannot find out whether their invoices are overdue, the database administrator should examine the Date fields to ensure the data type is Date/Time and not Text. Most data entry mistakes can be eliminated through validation routines, but if the problems involve older databases that have no such routines, they will need to generate conditional logic to identify and correct existing bad data.

Sponsored by
PeopleSoft is the exclusive sponsor of TechRepublic’s special series on Enterprise Applications.

For more information, check out TechRepublic’s Enterprise Application Center, or visit PeopleSoft

PeopleSoft is the exclusive sponsor of TechRepublic’s special series on Enterprise Applications.

For more information, check out TechRepublic’s Enterprise Application Center, or visit PeopleSoft

Use statistical checks to ensure validity
There are many instances where data will pass a validity check but still be incorrect. Take the example of when a user needs to know which customers qualify for an “over-50” promotion. If the validation rule for the Birth_Date field says it must include a date from 01/01/1911 to today’s date, users will typically enter 01/01/1911 if the birth date is unknown. Hence, customers whose birthday is unknown would be included in the mailing even though they were under 50. One organization I worked with had this type of validation rule. They never thought to allow null values until they performed a statistical analysis of the data. What they found was a disproportionate number of their customers were 90 years old!

Proper training and data filters are both important
Data quality, at the very minimum, implies that name and address fields are correct down to the character level. This means that not only are names spelled correctly but also that the entries agree with the field definition. For example, a person’s first name should not be confused with his or her last name and placed in the Last_Name field. One student of mine was able to trace the severe backlog of claims in her government agency to this type of data-quality error.

Ensuring accuracy, however, does not necessarily eliminate redundancy. For example, if a customer named James Robert Jones buys three products from a company at three different times, the first time he may be listed as J. R. Jones; the second time, James Jones; and the third time, Jim Jones. When you go to query this database, instead of showing Mr. Jones as one of your best customers, it shows him as three different customers, each a one-time buyer. While you may be able to avoid this type of redundancy through proper training of your sales staff, how can you control duplicate entries if Mr. Jones orders online? For this, you’ll need real-time data filters that correct data coming in from the Web and match it with existing data to avoid duplicates.

Try to standardize
Real-time data filters can help consolidate Web data with existing data sources, provided they conform to a consistent set of data-quality standards. This is particularly true when it comes to organizational acquisitions and mergers. For example, standardization proved to be the biggest data-quality issue for users who were consolidating the client records of several county agencies being brought under state government jurisdiction. They not only had to deal with inaccuracies and redundancies but also different data standards involving systems ranging from highly automated mainframe databases to manual systems that employed index cards!

Help is available
While data quality can seem daunting at first, there are tools to automate the process. In my next article, I’ll introduce some of these tools and explain what you should expect from them.
Does your organization emphasize training and end-user confidence during an enterprise application implementation? If so, how? Post a comment or write to Mary Ann Richardson and share your thoughts.