Provided by: Science and Development Network (SciDev.Net)
Topic: Big Data
Extracting useful knowledge from data sets measuring in gigabytes and even terabytes is a challenging research area for the data mining community. Sequential approaches suffer from a performance problem due to the fact that they have to mine voluminous databases. Parallelism is introduced as an important solution that could improve the response time and the scalability of these approaches. However, parallelization process is not trivial and still facing many challenges including the workload balancing problem. In this paper, the authors propose a hierarchical dynamic load balancing strategy for parallel association rule mining algorithms in the context of a Grid computing environment. The French research grid "Grid'5000" is used as their experimental test-bed.