A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration
In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. The authors term this challenge the truth finding problem. They observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this paper, they propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision.