Integrating and Querying Web Databases and Documents
There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semi-structured (documents). A key challenge is to integrate, query and analyze such heterogeneous collections of information. In this paper, the authors defend the idea of building web metadata repositories using relational databases as the main source and central data management technology of structured data, enriched by the semi-structured data surrounding it. Their proposal rests on the assumption that heterogeneous relational databases can be integrated (i.e., entity resolution is assumed to work well) and thus can serve as references for external data.