International Journal on Computer Science and Technology (IJCST)
Big data is the combination of large datasets and the management of this large dataset is very difficult. So, the authors require some new techniques to handle such huge data. The challenge is to collect or extract the data from multiple sources, process or transform it according to their analytical need and then load it for analysis, this process is known as "Extract, Transform & Load" (ETL). In this paper, firstly implementation of Hadoop in pseudo-distributed mode is done and then implement hive on Hadoop to analyze the large dataset.