Data Management

On the Storage, Management and Analysis of (Multi) Similarity for Large Scale Protein Structure Datasets in the Grid

Assessment of the (Multi) Similarity among a set of protein structures is achieved through an ensemble of protein structure comparison methods/algorithms. This leads to the generation of a multitude of data that varies both in type and size. After passing through standardization and normalization, this data is further used in consensus development; providing domain independent and highly reliable view of the assessment of (di)similarities. This paper briefly describes some of the techniques used for the estimation of missing/invalid values resulting from the process of multicomparison of very large scale datasets in a distributed/grid environment. This is followed by an empirical study on the storage capacity and query processing time required to cope with the results of such comparisons.

