Entity Resolution: Theory, Practice & Open Challenges
Entity Resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Ironically, different sub-disciplines refer to it by a variety of names, including record linkage, deduplication, co-reference resolution, reference reconciliation, object consolidation, identity uncertainty and database hardening. This paper brings together perspectives on ER from a variety of fields, including databases, machine learning, natural language processing and information retrieval, to provide, in one setting, a survey of a large body of work. The authors discuss both the practical aspects and theoretical underpinnings of ER. They describe existing solutions, current challenges, and open research problems.