Scalable Parallel Computing on Clouds Using Twister4Azure Iterative MapReduce
Recent advances in data intensive computing for science discovery are fueling a dramatic growth in the use of data-intensive iterative computations. The utility computing model introduced by cloud computing, combined with the rich set of cloud infrastructure and storage services, offers a very attractive environment in which scientists can perform data analytics. The challenges to large-scale distributed computations on cloud environments demand innovative computational frameworks that are specifically tailored for cloud characteristics to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. Twister4Azure extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a fault-tolerance execution of a wide array of data mining and data analysis applications on the Azure cloud.