Integrating K-Means Clustering With a Relational DBMS Using SQL
Integrating data mining algorithms with a relational DBMS is an important problem for database programmers. The authors introduce three SQL implementations of the popular K-means clustering algorithm to integrate it with a relational DBMS: a straightforward translation of K-means computations into SQL. An optimized version based on improved data organization, efficient indexing, sufficient statistics and rewritten queries. An incremental version that uses the optimized version as a building block with fast convergence and automated reseeding. They experimentally show the proposed K-means implementations work correctly and can cluster large data sets. They identify which K-means computations are more critical for performance.