Institute of Electrical & Electronic Engineers
Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size.