University of Mary Washington
The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale data-stores are increasingly common, it is unclear whether \"Typical\" analytics tasks require more than a single high-end server. Additionally, the authors are seeing increased sophistication in analytics, e.g., machine learning, which generally operates over smaller and more refined datasets. To address these trends, they propose \"Scaling down\" Hadoop to run on shared-memory machines.