Improving the Quality of Linked Data Using Statistical Distributions
Linked data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated linked data will likely be noisy and incomplete. In this paper, the authors present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy linked data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself.