Efficient Record De-Duplication Identifying Using Febrl Framework

Record linkage is the problem of identifying similar records across different data sources. The similarity between two records is defined based on domain-specific similarity functions over several attributes. De-duplicating one data set or linking several data sets is increasingly important tasks in the data preparation steps of many data mining papers. The aim is to match all records relating to the same entity. Different measures have been used to characterize the quality and complexity of data linkage algorithms, and several new metrics have been proposed.

Provided by: Iosrjournals Topic: Big Data Date Added: Apr 2013 Format: PDF

Find By Topic