Dirty data costs the US economy up to $3.1 trillion a year and organizations have been trying to deal with it by using methods like deduplication, normalization, or even removing or correcting broken or incomplete pieces of data by hand.
There are tools that do this, but there are also times when manual data correction is necessary. Needless to say, organizations are drowning in data, and don't have the person power to keep up— to a point where data can't be expeditiously cleaned.
In short, it's time to rethink data cleanup in order to get on top of it.
How do you do this so you can keep moving forward?
1. Reframe data cleanup as a business integration problem
A major issue is data that doesn't work with other data because it is in a different format, or is known under a different name. A similar piece of data might even have different field sizes in different systems, and be referenced differently among departments.
Problems like these are bigger than just data discrepancies and inaccuracies. They could reflect disparate systems that are duplicating efforts because different business units are actually doing the same things and aren't even aware of it because they're using different systems that overlap with each other. In these cases, it makes more sense to get together with all of the users and departments to decide which system gets retired and which system stays so you can reduce the confusion.You might have a one-time job of porting over data from an old system to a new one, but you'll never have to do that job again.
2. Use tools and services that minimize impact to business processes
"When companies come to us, they tell us that they have data that is scattered over many different systems and in all types of formats and the quality of this data is poor," said Vivek Joshi, CEO of Entytle, which focuses on helping manufacturers sell more product to their after-market customer bases. "Companies want to make sense of this data and they don't know where to begin. They hire us because we have experience with purposeful algorithms and analytics that we use to clean and align this data so it is usable for them."
Joshi says that his company also makes integration of the data less painful for clients. "We do this by adding a click-on tab to existing system menus so users can just click and be routed to the data and the analytics as they need them, and then return to the system they are working in," he said, "The integration of the data analytics with a given system platform can be that simple."
The benefit for companies selecting tools and methods from outside vendors is that data becomes usable quickly and the pain of business process revision is all but avoided.
SEE: IT leader's guide to big data security (Tech Pro Research)
3. Don't avoid business process revisions where they are needed
While quick data and system integration tools are appreciated, companies should still understand that it is necessary to revise business processes so the business is operationally aligned with its mission and its customers.
For instance, a product engineer who once handed off a design to manufacturing and then forgot about it might need to continuously collaborate with manufacturing and even customer service in today's word of shrinking product cycles. If engineering doesn't stay "plugged in," it risks missing valuable product improvement insights that are gathered from customer feedback on product performance and ease of use.
4. Decide which data you want to keep
It is impossible for organizations to clean and store every piece of data that rushes into their businesses—even with the most efficient data cleaning and preparation tools. This is why it is absolutely paramount that IT sit down with the different user departments throughout the company to determine which data gets stored—and which gets dumped. Once you make the decision, keeping up with your mission-critical data concerns—and keeping your data clean— get easier
- What to do when big data gets too big (TechRepublic)
- How to keep your big data lakes clear and navigable (TechRepublic)
- Why your company should stop neglecting data storage: 6 tips for getting organized (TechRepublic)
- Finding the data buried in cloud storage (ZDNet)
- The data center takes form, and businesses adapt or perish (ZDNet)
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.