Towards a High Performance Virtual Hadoop Cluster

Provided by: AICIT
Topic: Cloud
Format: PDF
Data-intensive computing emerges as the fourth paradigm for modern scientific discoveries. MapReduce, a programming paradigm for large-scale data-parallel applications, is widely applied to web indexing, machine learning, and scientific simulations in industries as well as in academia. Recently, the virtualized \"Utility computing\" environments, provided by cloud computing services, are becoming an important scenario to run MapReduce jobs. However, in such a virtualized computing environment, the network bandwidth between pair of virtual machines degrades badly, which makes the data-locality a more crucial goal to guarantee the performance of a Virtual Hadoop Cluster (VHC).

Find By Topic