Data Management

Parallel Bulk Insertion for Large-Scale Analytics Applications

Free registration required

Executive Summary

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this paper, the authors focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals.

  • Format: PDF
  • Size: 327.1 KB