Large Web Search Engine for Cluster filtering Processing

Download Now
Provided by: The World
Topic: Big Data
Format: PDF
Data cleaning and integration is typically the most expensive step in the KDD process. A key part, known as record linkage or de-duplication, is identifying which records in a database refer to the same entities. This problem is traditionally solved separately for each candidate record pair. The authors propose to use instead a multi-relational approach, performing simultaneous inference for all candidate pairs, and allowing information to propagate from one candidate match to another via the attributes they have in common.
Download Now

Find By Topic