Data Management

Set Similarity Join on Probabilistic Data

Date Added: Aug 2010
Format: PDF

Set similarity join has played an important role in many real-world applications such as data cleaning, near duplication detection, data integration, and so on. In these applications, set data often contain noises and are thus uncertain and imprecise. In this paper, the authors model such probabilistic set data on two uncertainty levels, that is, set and element levels. Based on them, they investigate the problem of Probabilistic Set Similarity Join (PS2J) over two probabilistic set databases, under the possible worlds semantics.