Universitat Oberta de Catalunya
Dealing with sparsity is still an open question in data mining. As soon as the dimension of the sample space becomes high, the number of unseen events or rare configurations in the sample contributes a great amount of uncertainty. Existing methodologies offer partial solutions, often based on assumptions about certainly unknown prior distributions. In this paper, the authors present an assumption free approach. They define a statistic that has a clear interpretation in terms of a measure of certainty, and they build up a plausible hypothesis, that offers a comprehensible insight of knowledge, with a consistent algebraic structure and a consistent set of properties, yielding a native value of uncertainty for unseen events.