A Survey on Hadoop Mapreduce Framework and the Data Skew Issues

Provided by: International Journal of Scientific Research Engineering &Technology (IJSRET)
Topic: Enterprise Software
Format: PDF
Hadoop is an open source implementation of Google's MapReduce framework. MapReduce is the heart of the apache's Hadoop. The file system which is used by the Hadoop for storing the files is known as Hadoop Distributed File System (HDFS) which is an open source implementation of the Google File System (GFS). Hadoop allows the parallel processing of the large data sets by splitting the larger data set into smaller partitions and each partition is fed to the separate task in the data node by the job tracker. The data node is the node where the data actually resides.

Find By Topic