International Journal of Computer Applications
The information on the WWW is available in various formats. The RDF and XML representation provides semantic knowledge about the document where as HTML mark-up only indicates the structure and lay-out of documents, but not the document semantics. The representation of the HTML document to semantic form can facilitate the extraction of knowledge from these documents in a more efficient manner. This paper proposes a technique for providing semantic structure to the HTML documents and stores it in the knowledge base as predicates, which helps in the retrieval of context related documents.