Web Page Clustering Using Latent Semantic Analysis
Web mining techniques such as clustering help to organize the web content into appropriate subject based categories so that their efficient search and retrieval becomes manageable. Traditional web pages clustering typically uses only the page content (usually the page text) in an appropriate feature vector representation such as bags of words, term frequency/inverse document frequency, etc. and then applies standard clustering algorithms(e.g. k-means, suffix tree, query directed clustering). For example, users can provide captions for images on the internet; provide tags to web pages and other media content they regularly browse on the internet, etc.