Download now Free registration required
Discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring itemsets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. This paper shows that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transaction to be considered. The paper may also be able to make the sampled database resident in main-memory.
- Format: PDF
- Size: 139.6 KB