Parallel Bulk Insertion for Large-Scale Analytics Applications

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this paper, the authors focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals.

Provided by: Association for Computing Machinery Topic: Data Management Date Added: Aug 2010 Format: PDF

Find By Topic