According to Samsung, global internet traffic surpassed one zettabyte — or one billion terabytes — in 2016. That number is huge, but it doesn't begin to approach the total data that companies are storing.
Even more concerning is the possibility that, at most companies, data "under management" is a misnomer.
Key areas of data management challenge are:
- Understanding dark data
- Data retention
- Data integration for best analytics results
- Data access
IT departments struggle in these areas for the following reasons:
- The flow of incoming data of all types, much of it unstructured, is too great to manage on a daily basis, so they just end up putting the data anywhere.
- The debate between how much historical data legal and audit processes like eDiscovery and industry regulations will demand makes business decision makers reluctant to discard data; and end users have never liked sitting down in annual review meetings to discuss data retention policies, either.
- Data integration is one of the most difficult tasks for IT to do, and it is only intensifying as concepts like data aggregation play greater roles in analytics so that seemingly unlike sets of data can be combined into a searchable repository for new types of business queries.
- Rapid access to data is a business demand, but high speed storage on premises or in the cloud is expensive, so some data must be archived off to slower, cheaper storage. To address issues, management throws person power at projects, which takes time away from other important goals.
The question now is: can machine learning, artificial intelligence (AI) and analytics provide assistance in the area of data management—especially for the large amount unstructured data?
Here is where machine learning, AI and analytics can help:
Sorting through dark data
Every corporate system, and every business department, has troves of data that have accumulated but that people know nothing about. By using machine learning and combining its power with algorithms that address how to sort and handle different types of emails, documents, images, etc., stored on servers, machine learning, AI and analytics can go to work on this unplumbed data and pre-sort it for you. A knowledgeable human can then review what the automation recommends as a data classification scheme, tweak it, and perform the scheme. Part of the process could also address data retention, with the analytics producing a set of recommendations on which data could potentially be purged from files.
Deciding what to throw away
Machine learning, analytics, and AI can objectively identify data that is seldom or never used, and recommend that you throw it away, but it doesn't have the same discernment abilities that employees do. For instance, these processes can pick out pieces of data or records that haven't been accessed for more than five years, indicating that the data could be obsolete. This saves an employee time hunting down this potentially obsolete data, because now all they need to do is to determine whether there is any reason to keep it.
When analytics developers determine the kinds of data they need to aggregate for queries, they often produce a repository for the application, and then pull in various types of data from different sources to make up an analytics data pool. To do this, they must develop integration methods to access the different sources from which they pull data. Machine learning can make this still very manual process more efficient by automatically developing "mappings" between data sources and the application's data repository. This cuts down integration and aggregation times.
Organizing data storage for best access
Over the past five years, data storage vendors have made significant inroads into automating storage management, thanks to the development of lower cost solid state storage. These technology advances have enabled IT departments to use "smart" storage engines that use machine learning to see which types of data are used most often, and which are seldom or never used. The automation can be used to automatically store data in fast or slow storage, based on the business rules inserted into machine algorithms. The automation saves storage managers from having to address storage optimization manually.
Data management is a major IT challenge that is not close to resolution in most organizations—and it is going to get worse as the data continues to stream in.
CIOs, data architects, and storage managers need to highlight the issue to C-level executives, but data management projects are not easy "sells."
Nevertheless, by pointing out the value of faster times to market for analytics and potential person power and storage cost reductions for data management, IT managers at least have viable entry points into C-level discussions about how to increase strategic agility and reduce cost of operations at the same time.
- Big data policy (Tech Pro Research)
- Quick glossary: Big data (Tech Pro Research)
- How to integrate your big data and continuous improvement programs (TechRepublic)
- Big Data Power Tools Bundle (TechRepublic Academy)
- Intel and DARPA look to AI and machine learning to boost graph analytics in big data (TechRepublic)
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.