International Journal of Emerging Technology and Advanced Engineering (IJETAE)
Frequent itemset mining is one of the classical data mining problems in most of the data mining applications. It requires very large computations and I/O traffic capacity. Also resources like single processor's memory and CPU are very limited, which degrades the performance of algorithm. In this paper, the authors have proposed one such distributed algorithm which will run on Hadoop - one of the recent most popular distributed frameworks which mainly focus on MapReduce paradigm. The proposed approach takes into account inherent characteristics of the Apriori algorithm related to the frequent itemset generation and through a block-based partitioning uses a dynamic workload management.