Incremental Clustering Crawler for Community-Limited Search
Source: University of Vermont
The authors propose incremental clustering crawler, a novel algorithm for finding communities for community-limited search in the web. A web community is a set of semantically related sites found through link-based clustering. The key idea of the proposed algorithm is to perform clustering incrementally while crawling is in progress. This algorithm does not need to crawl all the web pages a priori, but needs to crawl only as many web pages as are relevant to the clusters that are being formed. This ability to crawl on the fly is an important advantage since it is infeasible to crawl the entire set of web pages in the world and since people often do not even know which web pages or sites to crawl.