MapReduce is a widely used data-parallel programming model for large-scale data analysis. The framework is shown to be scalable to thousands of computing nodes and reliable on commodity clusters. MapReduce provides simple programming interfaces with two functions: map and reduce. The functions can be automatically executed in parallel on a cluster without requiring any intervention from the programmer. Moreover, MapReduce offers other benefits, including load balancing, high scalability, and fault tolerance. The challenge escalates when the authors consider that data are dynamically and continuously produced, from different geographical locations.