An Unsupervised Approach for Mining Multiple Web Databases
Well Trained Record matching methods such as SVM, OSVM, PEBL, and Christen offers better performances when mining and filtering duplicate query results from multiple web databases. They require huge training data sets for pre-learning. Unsupervised Duplicate Detection (UDD) a query-dependent record matching method that requires no pre training was developed earlier. Non duplicate records from the same source can be used as training examples so for a given query UDD uses two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier that iteratively identifies duplicates in the query results from multiple Web databases.