Record Matching for Web Databases by Domain-Specific Query Probing
Record matching refers to the task of finding entries that refer to the same entity in two or more files, is a vital process in data integration. Most of the record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for web database scenario, where query results dynamically generated on-the-fly. To address the problem of record matching in the Web database scenario, the authors present an unsupervised, online record matching method, UDD, which effectively identifies the duplicates from query result records of multiple web databases.