International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
The current requirements to cluster real world data sets are scalability and ability to handle any kind of data like categorical and numerical. It should also have the capability to handle noisy and missing data. Traditional algorithm can cluster categorical or numerical data but not the both. In general it is tedious to cluster mixed data types but it gives the authors best clusters with more accurate results. Another important factor that affects the quality of clusters is pre-processing techniques.