On Availability of Intermediate Data in Cloud Computations

Source: University of Illinois

Favorite

Free registration required

This paper takes a renewed look at the problem of managing intermediate data that is generated during dataflow computations (e.g., MapReduce, Pig, Dryad, etc.) within clouds. It discusses salient features of the intermediate data and outline requirements for a solution. Its experiments show that existing local write remote read solutions, traditional distributed file systems (e.g., HDFS), and support from transport protocols (e.g., TCP-Nice) cannot guarantee both data availability and minimal interference, which are key requirements.
Format:PDF Size:73.80
Date:Apr 2009