The Anatomy of Mapreduce Jobs, Scheduling, and Performance Challenges

Hadoop is a leading open source tool that supports the realization of the big data revolution and is based on Google's MapReduce pioneering work in the field of ultra large amount of data storage and processing. Instead of relying on expensive proprietary hardware, Hadoop clusters typically consist of hundreds or thousands of multi-core commodity machines. Instead of moving data to the processing nodes, Hadoop moves the code to the machines where the data reside, which is inherently more scalable.

Provided by: George Mason University Topic: Storage Date Added: Nov 2013 Format: PDF

Find By Topic