Association for Computing Machinery
It is cumbersome to write machine learning and graph algorithms in data-parallel models such as MapReduce and Dryad. The authors observe that these algorithms are based on matrix computations and, hence, are inefficient to implement with the restrictive programming and communication interface of such frameworks. In this paper, they show that array-based languages such as R are suitable for implementing complex algorithms and can outperform current data parallel solutions. Since R is single-threaded and does not scale to large datasets, they have built Presto, a distributed system that extends R and addresses many of its limitations.