Of Sampling and Smoothing: Approximating Distributions over Linked Open Data

Knowledge about the distribution of data provides the basis for various tasks in the context of Linked Open Data (LOD), e.g. for estimating the result set size of a query, for the purpose of statistical schema induction or for using information theoretic metrics to detect patterns. In this paper, the author investigate the potential of obtaining estimates for such distributions from samples of linked data. Therefore, the author consider three sampling methods applicable to public RDF data on the web as well as smoothing techniques to overcome the problem of unseen events in the sample space of a distribution.

Provided by: University of Koblenz-Landau Topic: Data Management Date Added: Apr 2014 Format: PDF

Find By Topic