A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA

Provided by: RWTH Aachen University
Topic: Data Management
Format: PDF
Clustering is an essential data mining task with various types of applications. Traditional clustering algorithms are based on a vector space model representation. A relational database system often contains multi-relational information spread across multiple relations (tables). In order to cluster such data, one would require to restrict the analysis to a single representation, or to construct a feature space comprising all possible representations from the data stored in multiple tables. In this paper, the authors present a data summarization approach, borrowed from the information retrieval theory, to clustering in multi-relational environment.

