Data Integration via Constrained Clustering: An Application to Enzyme Clustering

Provided by: Rensselaer at Hartford
Topic: Big Data
Format: PDF
When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clustering, since important information is likely to be discarded. In this paper the authors propose constrained clustering as a strategy for integrating data sources without losing any information. It basically consists of adding the complementary data sources as constraints that the algorithm must satisfy. As a concrete application of their approach, they focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work.

Find By Topic