Concerns about big data security are heightened because big data is much less predictable than the transactional data in systems of record that IT routinely patrols. Big data's unpredictability starts with the fact that it enters into data repositories from so many sources. For instance, if you collect customer data, that data can come from your website, in-house system transactions, the call center, a marketing service, Facebook, or other sources. Securing that data grows in complexity when there are more sources.
This point was recently emphasized by Brian Christian, CTO at Zettaset, which provides big data solutions. Christian characterized big data as a major security challenge because so much of it has been stockpiled into open source data warehouses that "were never written with security in mind."
In order to get a handle on big data security, IT needs to take additional steps outside of the security realm. These steps can be grouped together under the general heading of data stewardship.
What is data stewardship?
The U.S. Geological Survey (USGA) considers data a "natural resource" and defines as "one who manages another's facts or information to ensure that they can be used to draw conclusions or make decisions. Data stewards are 'keepers of the flame' in terms of data quality. They are responsible as stewards to serve and protect the customers' needs or assets…".
Jonathan G. Geiger says a "data stewardship program includes a governance structure and a set of responsibilities that enables managing data as an enterprise asset," and that "the soldiers in this program are the data stewards from the business areas, each of whom is responsible for a set of data."
In an ideal world, data stewards come from functional areas of the business, and they fastidiously work with IT to ensure the quality of big data, and the policies for storing and accessing data. They also have the personal and political skills needed to gain organizational consensus on which data should be "cleaned" and stored, who accesses it, and when it gets purged.
Even with standard data, these tasks have never been easy; in fact, they usually fall to IT by default since IT is regarded as the ultimate "data steward" in most organizations, and no one really wants to do the job. There is no reason to believe this dynamic will change with big data.
How IT can ensure big data stewardship happens
1: Act now
It's easy to get rolling with a big data implementation, since you are under pressure to produce analytics that demonstrate quick results from an IT big data investment. However, if you also take the time to structure how many big data marts you will have, who the data marts will be for, and what data retention rules will be and why, you can avoid the stockpiles of "junk data" that many are forecasting for big data. Equally important are frontend efforts and technologies (like deduplication) that can "clean" this data so users can rely on it.
2: Form a data governance committee
This data governance committee should consist of representatives from every functional area of the business actively planning to use big data. The individuals appointed from the business units will be responsible for collaborating with IT on big data policy formation and execution.
3: Call your IT auditor
No one likes to spend more time with auditors and examiners than they have to, but these specialists can provide useful recommendations for new processes for data security and best practices. It will save you time and ensure that your policies and practices are on track with those in your industry.
4: Find ways to take out the garbage
Organizations and regulators like to hang on to data for long periods of time, but to survive the big data deluge, companies have to find ways to "take out the garbage" so useless data doesn't accumulate. Here are three possible methods:
- Develop data retention policies with regulators and auditors to establish when big data gets purged;
- Develop an archiving procedure to remove big data from production (while still retaining it); and
- Throttle data accumulation with upfront data cleanup and deduplication.
It's also very important to get consensus from business units on what data retention policies should be. In addition, DBAs should assign specific usage timeframes for end user big data "sandbox" databases that are in test systems.
Make stewardship part of your big data strategy
Forward-thinking IT departments get that taking these stewardship steps now with big data will save your organization pain and storage/processing consumption as big data comes online. These IT departments are writing data stewardship objectives into the forefronts of their big data strategies as mechanisms to control the avalanche of data that is likely to come under corporate management as big data use expands.
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.