Date Added: Dec 2010
This paper deals with data mining in uncertain XML data models, this uncertainty typically coming from imprecise automatic processes. The authors first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semi-structured counterparts. They focus on a specific probabilistic XML model, that allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. They summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.