Discovering Linkage Points over Web Data
A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator that associates attributes of one database to another. However, the massive growth in the amount and variety of unstructured and semi-structured data on the web has created new challenges for this task. Such data sources often do not have a fixed pre-defined schema and contain large numbers of diverse attributes.