Provided by: Cornell University
Date Added: Oct 2014
The problem of evaluating joins efficiently in distributed environments has gained importance since the advent of Google's Map-Reduce and the emergence of a series of distributed systems with relational operators, such as pig, hive, SparkSQL, and Myria. The costs of join algorithms in such systems can be broken down to: local computation of machines; communication between the machines; and the number of global synchronizations that need to take place between the machines, e.g. the number of rounds of MapReduce jobs that need to be executed.