International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
Crawl relevant forum content from the web with minimum overhead is crawl by the supervised web-scale forum crawler. Forum threads have information content that is collected by forum crawlers. Web forum crawling problem to a URL-type have been reduced to recognition problem. It shows how to learn accurate and effective regular expression patterns of constant navigation paths by automatically created training sets using aggregated results from weak page type classifiers. Every forum have different layouts or styles and have different forum software packages, they always have similar constant navigation paths connected by specific URL types to direct users from entry pages to thread page.