A Technique for Data Deduplication using Q-Gram Concept with Support Vector Machine

Download Now Date Added: Jan 2013
Format: PDF

Several systems that rely on consistent data to offer high quality services, such as digital libraries and e-commerce brokers, may be affected by the existence of duplicates, quasi-replicas, or near-duplicate entries in their repositories. Because of that, there have been significant investments from private and government organizations in developing methods for removing replicas from its data repositories. In this paper, the authors have proposed accordingly. In the previous work, duplicate record detection was done using three different similarity measures and neural network. In the previous work, they have generated feature vector based on similarity measures and then, neural network was used to find the duplicate records.