Data Management

High Dimensional Data Clustering With Dynamic Error Threshold Estimation Model

Date Added: Mar 2012
Format: PDF

Genes are considered as high dimensional data. Clustering high dimensional data is very challenging especially when the data is skewed. The existing system consists of popular algorithms like k-means and CAST. Implementing these algorithms for a large genome-scale gene expression data set is practically impossible. A novel method for clustering large gene data set is introduced. Enhanced Tanimoto clustering method is implemented which feats the co-connectedness for efficiently clustering large, sparse expression data. Dynamic error threshold estimation model implements threshold values which filters data below the given threshold value. In the proposed work tree structure is constructed represent the input samples.