Provided by: Cornell University
Data cubes are widely used as a powerful tool to provide multidimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. The problem is exacerbated by the demand of supporting more complicated aggregate functions (e.g. CORRELATION, Statistical Analysis) as well as supporting frequent view updates in data cubes. This calls for new scalable and efficient data cube analysis systems. In this paper, the authors introduce HaCube, an extension of MapReduce, designed for efficient parallel data cube analysis on large-scale data by taking advantages from both MapReduce (in terms of scalability) and parallel DBMS (in terms of efficiency).