Date Added: Mar 2012
In this paper, the authors present a semantic annotation framework that is capable of extracting appropriate information from unstructured, ungrammatical and unintelligible data sources. The framework, named BNOSA, uses ontology to conceptualize a problem domain and to extract data from the given corpora and Bayesian networks to resolve conflicts and to predict missing data. The framework is extensible as it is capable of dynamically extracting data from any problem domain given a pre-defined ontology and a corresponding Bayesian network. Experiments have been conducted to analyze the performance of BNOSA on several problem domains. The sets of corpora used in the experiments belong to selling - purchasing websites where product information is entered by ordinary web users in a structure-free format.