Web Clustering Based On Tag Set Similarity
Tagging is a service that allows users to associate a set of freely determined tags with web content. Clustering web documents with tag sets can eliminate the time-consuming preprocess of word stemming. This paper proposes a novel method to compute the similarity between tag sets and use it as the distance measure to cluster web documents into groups. Major steps in this method include computing a tag similarity matrix with set-based vector space model, smoothing the similarity matrix to obtain a set of linearly independent vectors and compute the tag set similarity based on these vectors.