Gartner defines dark data as "the information assets organizations collect, process and store during regular business activities, but generally, fail to use for other purposes."
Often, this unused data is saved for compliance or legal discovery purposes, and all too often, consists of unstructured big data that organizations are reluctant to eliminate, although they seldom or never use the data.
SEE: Big data policy (Tech Pro Research)
Dark data tracks climate change
However, this is a mistake. Take, for example, a zooplankton data restoration project, which is creating dark datasets of oceanic data collected in the 1970s and 1980s—a time when technologies for data curation, storage, and data dissemination were almost non-existent.
This data is important because it helps to track the impact of climate change due to populations of zooplankton, the microscopic animals that sustain many forms of sea life and are essential elements of the oceanic food chain. The data can be measured from now to what the zooplankton population was 50 years ago.
As this project demonstrates, you could miss vital information by failing to explore all of your stored data. Organizations don't consciously set out to do this. In many cases, the task of restoring data in the form of old videos, documents, photos, etc., is so daunting that there isn't the personnel or budget available to perform a full restoration.
SEE: Infographic: Most companies are collecting data, but aren't using big data solutions (Tech Pro Research)
Help on the way
The good news is that help is on the way.
Advances in computer vision, pattern recognition, and cognitive analytics are delivering tools that make it easier to process and probe unstructured dark data that organizations previously left unexplored.
This paves the way to new insights, and prompted industry researchers such as Deloitte to say that "[By] leveraging these advanced tools and skill sets, over the next 18 to 24 months an increasing number of CIOs, business leaders, and data scientists will begin experimenting with 'dark analytics': focused explorations of the vast universe of unstructured and 'dark' data with the goal of unearthing the kind of highly nuanced business, customer, and operational insights that structured data assets currently in their possession may not reveal."
What steps can CIOs, CDOs and other It professionals with big data responsibilities take to include dark data in their analytics strategies? Read the six suggestions below.
SEE: Quick glossary: Business intelligence and analytics (Tech Pro Research)
1. Find out what you've got under management
"The problem for most companies is that in the past data was always an afterthought. You built a system or application and there was data associated with it. Then you said, 'I'll figure out what I want to do with that later on,'" said Anil Chakraverthy, Chief Executive of Informatica.
That's exactly what many companies discover as they move through digital transformations. There are literally closets and storerooms full of unstructured data in hardcopy that no one ever thought to digitalize—until now. This data can provide valuable insights.
The goal for CIOs is simple: Find out what data is under company management, but that it possibly didn't know that it had. Then, develop a strategic data plan with executives that addresses what do with this data so that it delivers its highest value to the company.
SEE: Digital transformation in 2019: A business leader's guide to future challenges and opportunities (Tech Pro Research)
2. Tap into what you got
As soon as it is determined that certain areas of data are useful, begin to digitalize and exploit it for value so you can get it working for you.
3. Look for outside data that can augment your decision making
Outside data sources can enhance the value of data you already have under management. A prime example is the monitoring of Greenland's ice pack. If you monitor climate change and are concerned about the pace of global warming, you can study historical photos of Greenland's land mass from decades ago. Comparison of Greenland against how it was decades ago to how it is today can demonstrate both the impact and progression of global warming.
4. Curate data for privacy, integrity and data quality
As paper-based forms of unstructured data are digitalized, it is essential for data to undergo quality assurance checks for data integrity and quality. During this process, data errors should be detected and corrected. In some cases, there might also be privacy concerns that should be vetted. All of these bases should be explored in your data cleanup exercises before any of the newly digitalized content is admitted into a new data repository.
5. Develop proactive data management strategies for new technologies like IoT
Getting a handle on all of the data that you have under management (but might not know you have), isn't the end of the data management story. Cisco estimates that by 2019, IoT will generate more than 500 zettabytes of data per year. As companies bring on IoT technology, every implementation plan should address what to do with the future data that is collected.
6. Demonstrate results
Data is meant to be used. If you can't demonstrate compelling business cases for using unstructured data that you want to digitalize, reconsider retaining and investing in the data. For example, a majority of satellite images, old photos, documents, videos, etc., are used for long-term historical trends analysis. The data enables companies to learn from history and position themselves for the future. In other cases, the data can be used in short-term projects.
Just remember, whatever the use, it must bring immediate value to the business.
- Unstructured data: A cheat sheet (TechRepublic)
- How deep learning helps archaeologists rediscover the past (TechRepublic)
- 3 steps to improve the quality of your organization's data (TechRepublic)
- 6 digital trends spearheading business transformation in 2019 (TechRepublic)
- Drones, data analytics, smart seeds: How to reforest 1,000 times faster after wildfires (ZDNet)
- Unstructured data: the elephant in the Big Data room (ZDNet)
- Unstructured data: Challenge or asset? (ZDNet)
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.