Data Management

Mining Document Collections to Facilitate Accurate Approximate Entity Matching

Date Added: Aug 2009
Format: PDF

Many entity extraction techniques leverage large reference entity tables to identify entities in documents. Often, an entity is referenced in document collections differently from that in the reference entity tables. Therefore, the authors study the problem of determining whether or not a substring "Approximately" matches with a reference entity. Similarity measures which exploit the correlation between candidate sub-strings and reference entities across a large number of documents are known to be more robust than traditional stand alone string-based similarity functions.