Data is plentiful and cheap to store, but that doesn't mean it should be hoarded indiscriminately. Here's why companies should be more careful about what to keep and what to discard.
Digitalization and big data are continuing to bulk up enterprise storage. This continuing avalanche of data should be enough to concern IT managers about keeping all this data—yet, it doesn't seem to.
Let's take eDiscovery as a use case. It requires companies to retain vast troves of legal documents, emails, social media commentary, photos, recorded conversations, etc.
Unfortunately, it is exactly areas like eDiscovery that should have rigorous data retention policies. At least two cloud-based eDiscovery attorneys I have spoken with have said that only 6% of documents and data reviewed for discovery end up being relevant to a case at hand—but your company is paying for a total review of an unedited pile of data by very expensive attorneys if you don't do something upfront about determining which data you keep, and which you don't.
SEE: 60 ways to get the most value from your big data initiatives (free PDF) (TechRepublic)
The bottom line for business leaders who want their money's worth from eDiscovery and big data is that indiscriminately packing away all this data on cheap storage will ultimately be expensive and counterproductive when it's time to use it.
The eDiscovery use case is equally applicable to other types of big data streaming into the enterprise.
Here are three strategies for getting their growing big data stockpiles under control, and retain only what your company needs:
1. Get guidance from your industry regulators
If you deal with the federal government, there are published data retention policies that different government agencies expect their vendors to adhere to. For instance, if you deal with the department of defense, the data retention requirement is three years. The FDIC's data retention requirement is five years. If you deal with the SEC, data retention can be seven years or more. Aligning your data retention policies with the standards of your industry is a solid first step for any company.
2. Set data retention rules for big data within your company
Companies are already used to users and IT working together to set data retention policies for transactional systems, but big data is still not reviewed for retention.. It's time to change this thinking since big data systems have matured and are in production. For example, if IoT machines and automation are at work in factories, manufacturing and IT should discuss the types of machine emitted data that are relevant to production, machine maintenance, safety and other business needs as defined by manufacturing and the company. Data extraneous to these business objectives should be eliminated.
SEE: Big data policy (Tech Pro Research)
3. When possible, clean big data at its source
Useless data isn't always useless because it is extraneous. Sometimes, the data is a duplicate of other data, although in a slightly different form. In other cases, data is just "noise," like machine or network jitter. In still other cases, data can be compressed as it is being ingested. The more these data cleaning operations can be done at the point where data enters the enterprise, the less follow-on work IT will have to do later. A number of cloud-based vendors offer edge computing data cleaning services. They clean the data and then route it to you. In this way, none of your extraneous or "junk" data gets in the door.
- Big data strategy: 5 areas to reassess by mid-2018 (TechRepublic)
- Why your company should stop neglecting data storage: 6 tips for getting organized (TechRepublic)
- How to keep your big data lakes clear and navigable (TechRepublic)
- Big Data 2018: Cloud storage becomes the de facto data lake (ZDNet)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)