Reduction of Data at Namenode in HDFS Using Harballing Technique
HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB's, GB's or TB's). Scientific applications adapted this HDFS/Map-reduce for large scale data analytics. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server. Storing and processing these small size file in HDFS is overhead to map-reduce program and also have an impact on the performance on Namenode.