International Journal of Computer and Information Technology (IJCIT)
Relevant information from the web can quickly be retrieved if logically similar webpages are grouped together. Indeed, the clustering of web pages makes entire group available to the user, thereby increasing the efficiency of web browsing. Nevertheless, clustering largely depends on the accuracy of similarity computation among the pages. In this paper, the authors propose a new weighted keyword based similarity measure for discovering the alikeness among the pages. They present each page using a vector of extracted keywords, which is then converted into a weighted vector by considering both frequency and position of the keywords in the page.