Semantic-Sensitive Web Information Retrieval Model for HTML Documents
Source: Cornell University
With the advent of the Internet, a new era of digital information exchange has begun. Currently, the Internet encompasses more than five billion online sites and this number is exponentially increasing every day. Fundamentally, Information Retrieval (IR) is the science and practice of storing documents and retrieving information from within these documents. Mathematically, IR systems are at the core based on a feature vector model coupled with a term weighting scheme that weights terms in a document according to their significance with respect to the context in which they appear. Practically, Vector Space Model (VSM), Term Frequency (TF), and Inverse Term Frequency (IDF) are among other long-established techniques employed in mainstream IR systems.