Download Now Free registration required
Implementations of map-reduce are being used to perform many operations on very large data. The authors explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. They develop, in this model, an algorithm for sorting that has a worst-case running time better than the obvious implementations of parallel sorting.
- Format: PDF
- Size: 233.23 KB