Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop
The Hadoop filesystem is a large scale distributed filesystem used to manage and quickly process extremely large data sets. The authors want to utilize Hadoop to assist with data-intensive workloads in a distributed campus grid environment. Unfortunately, the Hadoop filesystem is not designed to work in such an environment easily or securely. They present a solution that bridges the Chirp distributed filesystem to Hadoop for simple access to large data sets. Chirp layers on top of Hadoopmany grid computing desirables including simple deployment without special privileges, easy access via Parrot, and strong and flexible security Access Control Lists (ACL). The authors discuss the challenges involved in using Hadoop on a campus grid and evaluate the performance of the combined systems.