Big Data Processing Using Apache Hadoop in Cloud System
The ever growing technology has resulted in the need for storing and processing excessively large amounts of data on cloud. The current volume of data is enormous and is expected to replicate over 650 times by the year 2014, out of which, 85% would be unstructured. This is known as the 'Big Data' problem. The techniques of Hadoop, an efficient resource scheduling method and a probabilistic redundant scheduling, are presented for the system to efficiently organize "Free" computer storage resources existing within enterprises to provide low-cost high-quality storage services. The proposed methods and system provide valuable reference for the implementation of cloud storage system. The proposed method includes a Linux based cloud.