In Search of an API for Scalable File Systems: Under the Table or Above It?
Source: Carnegie Mellon University
Big Data is everywhere - both the IT industry and the scientific computing community are routinely handling terabytes to petabytes of data. This preponderance of data has fueled the development of Data-Intensive Scalable Computing (DISC) systems that manage, process and store massive data-sets in a distributed manner. For example, Google and Yahoo have built their respective Internet services stack to distribute processing (MapReduce and Hadoop), to program computation (Sawzall and Pig) and to store the structured output data (Bigtable and HBase). Both these stacks are layered on their respective distributed file systems, GoogleFS [12] and Hadoop distributed FS [15], that are designed "From scratch" to deliver high performance primarily for their anticipated DISC workloads.
| Format: | Size: | 275.70 | |
| Date: | May 2009 |



