A Survey on Uncertain Data & Its Clustering
There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. Such databases are much more complex because of the additional challenges of representing the probabilistic information. This uncertainty is typically formalized as probability density functions over tuple values. Beyond storing and processing such data in a DBMS, it is necessary to perform other data analysis tasks such as data mining. When data mining techniques are applied to these data, their uncertainty has to be considered to obtain high quality results. A data object is represented by an uncertainty region over which a probability density function (pdf) is defined.