Download now Free registration required
A focused crawler is a web crawler that attempts to download only web pages that are relevant to a pre-defined topic or set of topics. Focused crawling also assumes that some labeled examples of relevant and not relevant pages are available. The topic can be represent by a set of keywords (the authors call them seed keywords) or example urls. The key for designing an efficient focus crawler is how to judge whether a web pages is relevant to the topic or not. It defines several relevance computation strategies and provides an empirical evaluation which has shown promising results. They developed a framework to fairly evaluate topical crawling algorithms under a number of performance metrics.
- Format: PDF
- Size: 506.1 KB