Data Management Investigate

Clean Answers Over Dirty Databases: A Probabilistic Approach

Download now Free registration required

Executive Summary

The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. The authors propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database.

  • Format: PDF
  • Size: 183.9 KB