A Technique for Data Deduplication using Q-Gram Concept with Support Vector Machine
Several systems that rely on consistent data to offer high quality services, such as digital libraries and e-commerce brokers, may be affected by the existence of duplicates, quasi-replicas, or near-duplicate entries in their repositories. Because of that, there have been significant investments from private and government organizations in developing methods for removing replicas from its data repositories. In this paper, the authors have proposed accordingly. In the previous work, duplicate record detection was done using three different similarity measures and neural network. In the previous work, they have generated feature vector based on similarity measures and then, neural network was used to find the duplicate records.