A Novel Approach to Enhance Reliability in Hadoop with Reduced Storage Requirement
A data grid should provide fast, reliable and transparent access of data to a large number of heterogeneous and geographically distributed users. The volume of data handled in a data grid is in the order of petabytes. Hadoop is a data grid framework. In hadoop data is split into blocks that are distributed in the cluster. The data grid has to be reliable, as the information should be available even in the case of failure of nodes, communication links, and rack switches or data centers. Currently hadoop follows a cluster architecture that is tree based. It uses block replication for availability. This paper analyses the current approach used in hadoop to maintain block level reliability.