George Mason University
Big data processing is generally defined as a situation when the size of the data itself becomes part of the computational problem. This paper has made divide-and-conquer type algorithms implemented in clusters of multi-core CPUs in Hadoop/MapReduce environments an important data processing tool for many organizations. Jobs of various kinds, which consist of a number of automatically parallelized tasks, are scheduled on distributed nodes based on the capacity of the machines. A key challenge in provisioning such jobs in a Hadoop/MapReduce cluster is to be able to predict their completion times based on various job characteristics.