Association for Computing Machinery
Data mining research is extensive, but most work has proposed efficient algorithms, data structures and optimizations that work outside a DBMS, mostly on flat files. In contrast, the authors present a data mining system that can work on top of a relational DBMS based on a combination of SQL queries and User-Defined Functions (UDFs), debuking the common perception that SQL is inefficient or inadequate for data mining. They show their system can analyze large data sets significantly faster than external data mining tools. Moreover, their UDF-based algorithms can process a data set in one pass and have linear scalability.