Two Parallel Approaches to Network Data Analysis
In this paper, the authors compare two alternative approaches to large-scale analytic applications. They focus on network data analysis and de ne four sample Jobs that operate over a publicly available dataset of a trans-Pacific Internet link. First, they present an approach based on a shared-nothing parallel database and discuss the key ingredients of its design. Then, they present an approach based on MapReduce, with focus on the design of data analysis Jobs and their optimization. Besides a mere performance comparison, the lessons they learned from several experiments with such systems highlight the challenges in performing the same computations over the same datasets, with two orthogonal approaches.