International Journal of Emerging Technology and Advanced Engineering (IJETAE)
Big data processing is currently becoming increasingly important in modern era due to the continuous growth of data generated by various fields such as particle physics, human genomics, earth observations etc. However the efficiency of processing large-scale data on modern infrastructure is not clear. Machine Learning (ML) concepts play a major role in data analysis. The traditional practice has restriction that only subset of large set of data has to be taken for analysis as size of data grows in exponentially. Hadoop is one such framework that offers distributed storage and parallel data processing.