Toward an Ecosystem for Precision Sharing of Segmented Big Data
As the amount of data created and stored by organizations continues to increase, attention is turning to extracting knowledge from that raw data, including making some data available outside of the organization to enable crowd analytics. The adoption of the MapReduce paradigm has made processing Big Data more accessible, but is still limited to data that is currently available, often only within an organization. Finegrained control over what information is shared outside an organization is difficult to achieve with Big Data, particularly in the MapReduce model. The authors introduce a novel approach to sharing that enables fine-grained control over what data is shared. Users submit analytics tasks that run on infrastructure near the actual data, reducing network bottlenecks.