The Anatomy of Mapreduce Jobs, Scheduling, and Performance Challenges

Free registration required

Executive Summary

Hadoop is a leading open source tool that supports the realization of the big data revolution and is based on Google's MapReduce pioneering work in the field of ultra large amount of data storage and processing. Instead of relying on expensive proprietary hardware, Hadoop clusters typically consist of hundreds or thousands of multi-core commodity machines. Instead of moving data to the processing nodes, Hadoop moves the code to the machines where the data reside, which is inherently more scalable.

  • Format: PDF
  • Size: 1518.22 KB