Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints

Source: VLDB Endowment

Favorite

Free registration required

There has been considerable interest in similarity join in the research community recently. Similarity join is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. The authors focus on efficient algorithms for similarity join with edit distance constraints. Existing approaches are mainly based on converting the edit distance constraint to a weaker constraint on the number of matching q-grams between pair of strings. In this paper, they propose the novel perspective of investigating mismatching q-grams.
Format:PDF Size:3205.12
Date:Aug 2008