Association for Computing Machinery
Many distributed computing models have been developed for high performance processing of large scale scientific data. Among them, MapReduce is a popular and widely used fine grain parallel runtime. Workflows integrate and coordinate distributed and heterogeneous components to solve the computation problem which may contain several MapReduce jobs. However, existing workflow solutions have limited supports for important features such as fault tolerance and efficient execution for iterative applications. In this paper, the authors propose HyMR: a Hybrid MapReduce workflow system based on two different MapReduce frameworks.