Shark: Fast Data Analysis Using Coarse-Grained Distributed Memory

Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets.

Provided by: Association for Computing Machinery Topic: Data Management Date Added: May 2012 Format: PDF

Find By Topic