Improving the Performance of Aggregate Queries with Cached Tuples in MapReduce
As an essential approach for extracting valuable summarized information from massive data set, aggregate query plays important roles for data-intensive applications in cloud computing. As a popular cloud computing platform, MapReduce is a promising paradigm for processing massive data. However, executing aggregate query over massive data sets is very time-consuming and it is also inefficient to run aggregate query directly on MapReduce platform. In order to process an aggregate query efficiently, this paper proposes a cache-based approach for improving the performance of aggregate queries on MapReduce platform. This approach enhances the performance of processing aggregate queries on MapReduce platform by caching the pre-processing results before executing the aggregate query.