International Journal of Computing Science and Information Technology (IJCSIT)
Unsupervised learning involves exploring the unlabeled data to find some intrinsic or hidden structures. Duplicate detection enables to identify the records that represent the same real world entity. In the field of Data mining, there is an exponential growth in the amount data available. Thus, linking or matching records from various web databases is a major challenge as it involves complexity of comparing, each record in one database with all the records in other databases. Supervised learning methods fail in web database scenario as the records to be matched are query dependent.