Algorithm for Outlier Detection Based on Utility and Clustering (ODUC)
Outlier analysis is one of the applied data mining technique. Outliers are data objects which do not comply with the general behavior or model of data. Statistical approach, distance-based approach, deviation-based approaches are some of the outlier detection methods. Clustering data mining technique groups' similar data objects into clusters, which indirectly eliminates outliers as noise. The proposed system is to find outliers based on utilities and k-means clustering. i.e., first pruning the data objects whose utility value is lesser than user's minimum threshold value and the second step is to employ repeated k-means clustering. The repeated k-means clustering at each iteration prune's the data objects which lie near the centroid of the cluster.