RWTH Aachen University
Data-intensive applications include large-scale data warehouse systems, cloud computing and data-intensive analysis. These applications have their own specific computational workload. For example, analytic systems produce relatively rare updates but heavy select operation with millions of records to be processed, often with aggregations. Applications for large-scale data analysis use such techniques as parallel DBMS (DataBase Management System), MapReduce (MR) paradigm, and columnar storage. In this paper, the authors focus in a MapReduce environment. This paper is to compare the different join algorithms and designing cost models for further use in the query optimizer.