Entity Matching: How Similar Is Similar
Entity matching that finds records referring to the same entity are an important operation in data cleaning and integration. Existing studies usually use a given similarity function to quantify the similarity of records, and focus on devising index structures and algorithms for efficient entity matching. However it is a big challenge to define "How similar is similar" for real applications, since it is rather hard to automatically select appropriate similarity functions. In this paper, the authors attempt to address this problem. As there are a large number of similarity functions, and even worse thresholds may have infinite values, it is rather expensive to find appropriate similarity functions and thresholds.