Read about how you can eliminate data "scoop outs" by adding governance to big data discovery.
When an enterprise launches its big data initiatives, the first implementation is often found in a single server acquired by an end business unit or department that used the resource for its own big data capture and queries. Once organizations are well into the big data and analytics process, a growing number of them either consolidate all of these individual departmental servers in IT (asking IT to administer their big data), or they plan to do so in the near future.
Nevertheless, moving big data operations to IT doesn't mean an end to the "democratization" of big data, which every C-level executive and middle manager in the enterprise wants to access; it simply means that departments and business units are too busy meeting their normal day-in, day-out objectives to babysit servers and initiate problem solving with vendors.
Is the democratization of big data a bad thing?
Assuredly, it is not. As we all know, "knowledge is power." Companies will be helped by the extent that they can avail the power of big data and analytics to executives and managers with a need to know, equipping these individuals with actionable intelligence.
But in these big data democratization scenarios, the key big data words for regulators and IT are increasingly becoming "appropriate" and "need to know." One risk is that an indiscriminate democratization of big data can unwittingly place information into the hands of individuals in the organization who either don't need or shouldn't have the data (i.e., the data should be confidential and available at only certain levels of the organization). Another risk is that individual business departments with the power to manipulate this data can transform it into data that no longer is consistent with the data that other departments are using. In these cases, there no longer is a single data "version of the truth," and an enterprise can run the risk of initiating potentially incompatible business decisions.
In a visit this month with Michael Hiskey, Vice President of MicroStrategy, a business intelligence and analytics provider, he described early corporate forays into big data where "chunks of data were scooped out of data repositories and then downloaded to laptops where end users performed analytics." (Disclaimer: CBS Interactive uses MicroStrategy.)
In these early "Excel spreadsheet empires," it wasn't hard to see how data could be potentially manipulated into individualized "versions of the truth" that drifted away from what was represented in the corporate data repository, or even perused by individuals who didn't require the data in their work.
"Because of this, we are seeing more companies asking themselves how they can add governance to the big data discovery that they do," said Hiskey.
Hiskey's (and MicroStrategy's) solution to the problem is to let a single version of the data truth reside on the corporate server in the data center (as it has in the past), and to no longer "copy out chunks of data that are then dished out to departmental servers and laptops."
In an environment where all big data is centralized, data "scoop outs" are replaced by end user access and authorization setups that enable individuals to access their chunks of data from the corporate server based upon their authorization and access clearances. In this regard, the procurement of big data becomes much like it has been for access and authorization to data from systems of record.
"For most organizations, moving to this model of big data distribution is a fairly straightforward process, because even in the largest organizations, there are usually only five to seven different classes of business users when it comes down to data access, and the data access is role-based," noted Hiskey. "On one level of access, you might find a financial analyst and a higher level marketing executive. On another level, you might find a marketing analyst and a customer support analyst."
The approach is realistic and one that both regulators and IT (which is charged with governance) can live with. Just as importantly, the approach uses data access and authorization techniques that organizations are already familiar with and that can now be applied to big data, where more rigorous governance is needed.