In a 2012 study conducted by IDC and sponsored by EMC, worldwide data volumes were projected to reach 40 zetabytes by 2020. The same study reported that only 3% of all data in 2012 was tagged and ready for manipulation, and that only 0.5% of total data was being used for analysis.
"As the volume and complexity of data barraging businesses from all angles increases, IT organizations have a choice: they can either succumb to information-overload paralysis, or they can take steps to harness the tremendous potential teeming within all of those data streams," said EMC's Jeremy Burton.
Enterprises have made headway in their data management since then, given the number of big data initiatives and the thousands of organizations of all sizes that are undertaking them. If nothing else, there has been a reshaping of thinking in the data center about data under management.
In the past, data that was seldom or never used for a period of time as defined by the business was jettisoned from data stores as part of general storage housekeeping measures. However, with the unpredictability of future big data uses and where this might take companies, this is no longer the case.
"I would very much like to get rid of all of this data that is building," a transportation services data manager recently told me. "But the problem for us is, how do we know that this very data won't be needed in new analytics questions that we haven't even thought of today, but that we might want to query our data with years from now?"
Herein lies the dilemma, or at least part of it.
The other difficult question for enterprises and their IT departments is how to make sense of all of the data they have accumulated from the outside, and also from the various silos of data that exist within their own enterprises. This is where semantic web technology and the use data visualization engines enter in. Semantic web technology is an extension of the World Wide Web that enables people to share content beyond the boundaries of applications and websites.
"What we can do is to map both structured and unstructured data from enterprise and outside systems into a single database that creates a model of the business and enables people to perform analytics on this data through the specific lens of their business," said John Reuter, vice president of marketing for Cambridge Semantics, which provides "smart data" solutions to enterprises.
At the end of 2014, Cambridge Semantics and Cambridge Intelligence decided to integrate their respective tools for Semantic Web technology and network visualization into a solution that enables companies in the pharmaceutical and financial services industries to obtain fast and accurate data-driven visualization of relationships, hierarchies, and patterns within their big data. The technology combination lets companies explore not only logical but the deeply embedded associational linkages between elements of data that would escape the "normal eye" of a keen researcher.
In the pharmaceutical industry, for example, the data analysis and visualization tools can provide the ability to visualize the connections between research and results during drug discovery, eliminating the risk of duplicated effort, while also identifying gaps in understanding. In clinical trials, data semantics and visualization tools can assist in targeting optimal trial participants with the required profiles for the trials. And in security and regulatory compliance surveillance, web logs, email, phone archives, instant messaging, and other communications sources can be linked together to uncover potential violations of regulatory requirements and internal policies and procedures. Best of all, the results of these data analyses are sent to users in Excel-like spreadsheets, so users don't have to remaster skills for keying on and sorting data.
Is this an answer to the entire data usage dilemma now facing enterprises? Of course not. But with the availability of both associational and logical data probe tools for mixed source unstructured and structured data, it's a start.
- Apply forensics to big data with a visualization tool
- Data visualizations need to present conclusions rather than artistic masterpieces
- Data visualization: When it's the wrong tool for the job
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.