Date Added: Jun 2011
As the number of available Web pages grows; it is become more difficult for users finding documents relevant to their interests. Clustering is the classification of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure. Because of the short lengths of queries, approaches based on keywords are not suitable for document clustering. This paper describes a new Web Document Clustering method that makes use of user logs which allow identifying the documents the users have selected for a query. The similarity between two queries may be deduced from the common documents the users selected for them.