Date Added: Dec 2011
Sampling which is a powerful data reduction technique can be utilized to a variety of problems in database systems and data mining. In this paper, the authors have made an extensive analysis over the progressive sampling-based approach with the help of two different association rule mining algorithm (Apriori as well as FP-growth) and two different sampling data selection (random and systematic sampling). The dataset utilized in their experimental analysis are real-world data such as retail dataset and connect dataset along with the synthetic data, T10I4D100K that is obtained from the IBM dataset generator. The performance of the progressive sampling-based approach is evaluated with the aid of the evaluation metrics such as accuracy, computational time and the optimal sample size.