International Journal on Computer Science and Technology (IJCST)
The authors live in on-demand, on-command digital universe with data rapid reproducing by Institutions, Individuals and tools at very high rate. This data is categorized as "Big data" due to its absolute volume, variety, velocity and veracity. Most of the data is partly structured, unstructured or semi structured and it is heterogeneous in nature. Due to its specific nature, big data is stored in distributed file system architectures. Hadoop and HDFS by Apache are widely used for storing and managing big data.