University of Sfax
As an essential approach for extracting valuable summarized information from massive data set, aggregate query plays important roles for data-intensive applications in cloud computing. As a popular cloud computing platform, MapReduce is a promising paradigm for processing massive data. However, executing aggregate query over massive data sets is very time-consuming and it is also inefficient to run aggregate query directly on MapReduce platform. In order to process an aggregate query efficiently, this paper proposes a cache-based approach for improving the performance of aggregate queries on MapReduce platform. This approach enhances the performance of processing aggregate queries on MapReduce platform by caching the pre-processing results before executing the aggregate query.