Improving Efficiency of Geo-Distributed Data Sets Using Pact
In an Internet era, a report says every day 2.5 quintillion bytes of data are created. This data is obtained from many sources such as sensors to gather climate information, trajectory information, transaction records, web site usage data etc. This data is known as big data. Hadoop is only scalable that is it can reliably store and process petabytes. Hadoop plays an important role in processing and handling big data it includes MapReduce - offline computing engine, HDFS - Hadoop Distributed File System, HBase - online data access.