A Methodology for Template Extraction From Heterogeneous Web Pages

The World Wide Web is a vast and most useful collection of information. To achieve high productivity in publishing the web pages are automatically evaluated using common templates with contents. The templates are considered harmful because they compromise the relevance judgement of many web information retrieval and web mining methods such as clustering and classification and badly impact the performance and resources of tools that processes the web pages. Thus, the template detection techniques have received a lot of attention to improve the performance of search engines, clustering and classification of web documents.

Provided by: Indian Journal of Computer Science and Engineering (IJCSE) Topic: Developer Date Added: Jul 2012 Format: PDF

Find By Topic