International Journal Of Engineering And Computer Science
In data mining, duplicate detection is an important step in data integration and most state-of-the-art methods. Existing, record linkage techniques, SVM, OSVM, PEBL, christen are record matching methods. According to the web database scenario, records to match are greatly query-dependent; a pertained approach is not applicable as the set of records in each query's results is a biased subset of the full data set. Even if applicable for each new query, depending on the results returned, the field weights should probably change too, which makes supervised-learning based methods even less applicable.