Science and Development Network (SciDev.Net)
Since the volume of data generated by a scientific data experiment has grown exponentially, new scientific methods to analyze and organize the data are required. Hence, these methods need to be used effective infrastructure composed of computing resources that are used for pre-processing and post-processing data. The demanding requirement has led to development of methods to reduce the size of dataset and to apply a new programming model and its implementation like MapReduce. In this paper, the authors describe an empirical study for handling the dataset of a scientific data experiment to support data transformation, which is an essential phase to handling large-scale data in scientific data experiments.