Concerns about big data security are heightened because big data is
much less predictable than the transactional data in systems of record
that IT routinely patrols. Big data’s unpredictability starts with the
fact that it enters into data repositories from so many sources. For
instance, if you collect customer data, that data can come from your
website, in-house system transactions, the call center, a marketing
service, Facebook, or other sources. Securing that data grows in
complexity when there are more sources.

This point was recently emphasized by Brian Christian, CTO at Zettaset, which provides big data solutions. Christian characterized big data as a major security challenge because so much of it has been stockpiled into open source data warehouses that “were never written with security in mind.”

In order to get a handle on big data security, IT needs to take
additional steps outside of the security realm. These steps can be
grouped together under the general heading of data stewardship.

What is data stewardship?

The U.S. Geological Survey (USGA) considers data a “natural resource” and defines
as “one who manages another’s facts or information to ensure that they
can be used to draw conclusions or make decisions. Data stewards are
‘keepers of the flame’ in terms of data quality. They are responsible as
stewards to serve and protect the customers’ needs or assets…”.

Jonathan G. Geiger says a “data stewardship program includes a governance structure and a set of responsibilities that enables managing data as an enterprise asset,”
and that “the soldiers in this program are the data stewards from the
business areas, each of whom is responsible for a set of data.”

In an ideal world, data stewards come from functional areas of the
business, and they fastidiously work with IT to ensure the quality of
big data, and the policies for storing and accessing data. They also
have the personal and political skills needed to gain organizational
consensus on which data should be “cleaned” and stored, who accesses it,
and when it gets purged.

Even with standard data, these tasks have never been easy; in fact,
they usually fall to IT by default since IT is regarded as the ultimate
“data steward” in most organizations, and no one really wants to do the
job. There is no reason to believe this dynamic will change with big
data.

How IT can ensure big data stewardship happens

1: Act now

It’s easy to get rolling with a big data implementation, since you
are under pressure to produce analytics that demonstrate quick results
from an IT big data investment. However, if you also take the time to
structure how many big data marts you will have, who the data marts will
be for, and what data retention rules will be and why, you can avoid
the stockpiles of “junk data” that many are forecasting for big data.
Equally important are frontend efforts and technologies (like
deduplication) that can “clean” this data so users can rely on it.

2: Form a data governance committee

This data governance committee should consist of representatives from
every functional area of the business actively planning to use big
data. The individuals appointed from the business units will be
responsible for collaborating with IT on big data policy formation and
execution.

3: Call your IT auditor

No one likes to spend more time with auditors and examiners than they
have to, but these specialists can provide useful recommendations for
new processes for data security and best practices. It will save you
time and ensure that your policies and practices are on track with those
in your industry.

4: Find ways to take out the garbage

Organizations and regulators like to hang on to data for long periods
of time, but to survive the big data deluge, companies have to find
ways to “take out the garbage” so useless data doesn’t accumulate. Here
are three possible methods:

  • Develop data retention policies with regulators and auditors to establish when big data gets purged;
  • Develop an archiving procedure to remove big data from production (while still retaining it); and
  • Throttle data accumulation with upfront data cleanup and deduplication.

It’s also very important to get consensus from business units on what
data retention policies should be. In addition, DBAs should assign
specific usage timeframes for end user big data “sandbox” databases that are in test systems.

Make stewardship part of your big data strategy

Forward-thinking IT departments get that taking these stewardship
steps now with big data will save your organization pain and
storage/processing consumption as big data comes online. These IT
departments are writing data stewardship objectives into the forefronts
of their big data strategies as mechanisms to control the avalanche of
data that is likely to come under corporate management as big data use
expands.