GYM: A Multiround Join Algorithm in MapReduce

Download Now
Provided by: Cornell University
Topic: Software
Format: PDF
The problem of evaluating joins efficiently in distributed environments has gained importance since the advent of Google's Map-Reduce and the emergence of a series of distributed systems with relational operators, such as pig, hive, SparkSQL, and Myria. The costs of join algorithms in such systems can be broken down to: local computation of machines; communication between the machines; and the number of global synchronizations that need to take place between the machines, e.g. the number of rounds of MapReduce jobs that need to be executed.
Download Now

Find By Topic