International Journal of Computer Applications
The exponential growth in the amount of data brings in new challenges for data analysis. Gene expression dataset is one such type of data necessitating analytical methods to mine patterns implicit in it. Although clustering has been a popular way to analyze such dataset, the increase in size of dataset necessitates the need for improving the efficiency of clustering methods. In this paper, the authors study the use of using Principal Components (PCs) as a pre-processing step to provide a more efficient data structure to a parallel formulation of the sequential K-Means algorithm, utilizing multiple cores available in a desktop computer, via the Simple Network Of Workstations (SNOW) package.