International Journal on Computer Science and Technology (IJCST)
In this paper, they present EWFPC (Extraction of Web Forums using Page type Classifier), a supervised web-scale forum crawler. EWFPC is to only trawl relevant forum content from the web with minimal overhead. Forum threads contain information content that is the target of forum crawlers. Although forums have different layouts or styles and are powered by different forum software packages, they always have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages.