IDC and EMC estimate that by 2020 the world will have 40 zettabytes of data, and a 2016 Veritas Global Databerg Survey indicates that as much as 85% of this data will be “dark data.”

What is dark data?

Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).”

From an organizational standpoint, dark data is the data that may or may not be in systems of record (SOR). It could be paper-based documents in file cabinets, photos, videos, or any number of information artifacts that are simply overlooked and/or neglected in the course of doing business because the initial take on them is that they are not essential.

However, organizations are not throwing this data out, so shouldn’t there be a way to put the data to use?

SEE: Clear out dark data to make room for useful big data

“If companies can learn how to harness this data, it can yield new insights,” said Mads C. Brink Hansen, product manager at TARGIT, a business intelligence and analytics solution provider. “In one case, a company wanted to assess the efficiency of its field-based salesforce. By looking at the travel expense reports submitted by its salespersons, it was able to determine the number of meetings that each salesperson had while in the field each day and then measure this against what should normally be expected in the way of meetings. This was one way in which an HR-based reporting function (travel and expense reports) was repurposed to provide insights into how many meetings per day an in-field salesperson was likely to have, and who was hitting those targets.”

Hansen says that companies should devote as much time to plumbing the depths of their dark data as they do for their big data–and his company has a toolset that facilitates the process. But for CIOs and others in charge of data analytics, selling dark data projects can be tough. After all, the data has already (by its inactivity) been declared as relatively useless. What if you don’t find anything, and you lose hours of time and budgetary investments in the process?

How to explore dark data

“There are two ways to go about exploring your dark data,” said Hansen. “The first method is to explore all of this data in the hope of unearthing unique business insights that can transform your business. The second is setting specific goals and then pursuing an exploration through this data to see if you can solve the business problem you have identified.”

Hansen says the best approach is a business case-oriented probe into dark data because it is results-oriented. A project manager also has the ability to give up the search if a reasonable amount of time is spent without results.

“We use a combination of query and data algorithms to get at this data,” said Hansen. “With query tools, users without developed skills in data science can explore data. The tools also give the ability for more skilled data analysts to develop algorithms with languages such as R.”

The best way to approach dark data

“By ‘walking the floor,’ managers can observe what is really going on and how the business is doing,” said Hansen. “More than likely, individual departments have already developed their own ‘offline’ spreadsheets and small databases to record this activity into pools of dark data that IT and other areas of the enterprise aren’t even aware of. If you keep your eyes open and discover these troves of data, and if you have a business context that you can apply them to, dark data can contribute significant insights.”

Here’s how such a method might work:

The warehouse notices a high number of returns for a given item, records them, but doesn’t really know why they are being returned except that the customers say the items didn’t work. The items are sent to manufacturing for rework, and it turns out that an electrical contact on this particular widget is constantly breaking. Manufacturing replaces the faulty contact, and the item gets re-marketed. As part of its performance tracking, manufacturing keeps a rework activity database. Unfortunately, the electrical contact problem never gets reported to engineering so the item can be redesigned and fixed–which would eliminate the returns, the customer disappointment, and the drain on company profit margins.

“This is why it’s important to look into your dark data,” said Hansen. “What you are able to find there can make a tremendous difference in your business.”