An Improved Small File Processing Method for HDFS

Provided by: AICIT
Topic: Big Data
Format: PDF
Hadoop Distributed File System (HDFS) has been widely used in various clusters to build large scale and high performance systems. However, it is designed to mainly handle big size files, therefore the performance processing massive small files is relatively low because of huge numbers of small files imposing heavy burden on Namenode of HDFS. Focusing the problem about HDFS when processing small files, an approach to improve I/O performance of small files on HDFS is introduced. The authors' main idea is to merge small files in the same directory into large one and accordingly build index for each small file to enhance storage efficiency of small files and reduce burden on Namenode caused by metadata.

Find By Topic