Integrating K-Means Clustering With a Relational DBMS Using SQL

Integrating data mining algorithms with a relational DBMS is an important problem for database programmers. The authors introduce three SQL implementations of the popular K-means clustering algorithm to integrate it with a relational DBMS: a straightforward translation of K-means computations into SQL. An optimized version based on improved data organization, efficient indexing, sufficient statistics and rewritten queries. An incremental version that uses the optimized version as a building block with fast convergence and automated reseeding. They experimentally show the proposed K-means implementations work correctly and can cluster large data sets. They identify which K-means computations are more critical for performance.

Provided by: Institute of Electrical & Electronic Engineers Topic: Data Management Date Added: Jan 2012 Format: PDF

Find By Topic