Data Management

Identifying Nearly Duplicate Records In Relational Database

Date Added: Jun 2012
Format: PDF

Entity resolution is an important precess for many database based applications. Accurately identifying duplicate records between multiple data sources is a persistent problem that is big challenge to organizations and researchers. The aim of this process is to detect the approximately duplicate records that refer to the same real-world entity to make the database more concrete and achieve higher data quality. Though ideally each record must be compared with every other record in dataset for finding duplicates, it is possible to reduce search space for record comparisons by using mutual exclusion property of tuples.