One way to look at dark (or invisible) data is to think about all of the on- and off-premises paper files, closet- and shelf-based files, and any other type of information that hasn’t been digitized, and that employees can’t readily access. Despite recent organizational efforts at digitization, there is still an abundance of this hidden data floating around.
There is also a broader, simpler definition of dark data: any data that you do not have ready visibility of. This can include electronic information that you might not be aware of, such as instant messages and video call transcripts, or any other content that has not been centralized in a searchable data repository.
“Dark data is data that is not under management, or on a corporate resource where it can be discovered,” said Brian Remmington, CTO at Hyland, which provides enterprise content management solutions. “Once discovered, dark data is no longer dark data.”
SEE: Navigating data privacy (free PDF) (TechRepublic)
During the COVID pandemic, many organizations have been working in highly distributed and remote environments. This has boosted dark data growth.
“Employees also use company-appointed laptops for everything from video calls to sharing documents and accessing email,” Remmington said. “This creates an excessive amount of dark data that looms in company storage. In addition, employees may copy centralized content to their local laptops and not upload it after altering it.”
Remmington believes that the problem with the growing amount of dark data will always be managing it and making sure it’s revisited and leveraged appropriately for business intelligence and governance purposes. Additionally, when employees leave the business, enterprises risk a loss of intellectual property or corporate memory by missing dark data.
How serious is the risk of not managing dark data?
First, not having your dark data under management can create legal, security and compliance risks. What if your company is in a lawsuit and must get to this data as part of the legal discovery process?
Second, not having access to dark data can lead key decision makers into making the wrong decisions for the organization.
Third, significant waste can result. This waste comes in the form of lost employee productivity because it will take them longer to search for and obtain the information they need. Storage costs also increase, because storing data without using it is wasteful.
Fourth, the dark data problem contributes to enterprise data silos. “The number one objective needs to be discovering dark data so that it can be classified and managed appropriately, and potentially analyzed for business intelligence,” Remmington said. “One of the key things to examine is how each silo of dark data came to be that way, and to identify processes and tooling that can help prevent it from happening in the future.”
One proactive approach that companies can use to reduce dark data is to create data safekeeping policies for employees and train and retrain employees on these policies.
SEE: Electronic data retention policy (TechRepublic Premium)
Audits can also be conducted to ensure that data (and data changes) are synchronized between individual work stations and central data repositories.
Both of these steps can help to eliminate the “dark holes” in enterprise information that dark data creates.
“It’s going to be a while before dark data disappears,” Remmington acknowledged. “But in the meantime, organizations should do their best to tackle their data being created in real time as they work through backlogs of data. In the long run, this will help businesses reach their goals.”
- Document retention policy (TechRepublic Premium)
- Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)
- 5 data categories to learn for faster cybersecurity responses (TechRepublic)
- 6 ways to include dark data in analytic strategies (TechRepublc)
- When accurate data produces false information (TechRepublic)
- CXO: More must-read coverage (TechRepublic on Flipboard)