Clean Answers Over Dirty Databases: A Probabilistic Approach

The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. The authors propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database.

Provided by: University of Trento Topic: Data Management Date Added: Jan 2011 Format: PDF

Find By Topic