Comparative Study of Em and K-Means Clustering Techniques in Weka Inter-Face

Provided by: Creative Commons
Topic: Data Management
Format: PDF
The K-means algorithm is very bad at handling overlapping data points. This is because it is only able to classify a point based on its distance from the estimated means. When the data overlaps, there is no clear line that can be drawn to separate those points that are closest to one mean versus those points that are closest to another. On the other hand, EM does much better on the overlapping data. This is because the strength of EM lies in the fact that it is able to incorporate underlying assumptions about how the data was generated.

Find By Topic