Clustering and MDL Techniques to Improve the Efficiency and Scalability of Template Detection and Extraction Algorithms

Download Now
Provided by: International Journal of Engineering Associates
Topic: Data Management
Format: PDF
World Wide Web emerges valuable source of information. To produce quality of web pages in many websites using the common templates. These templates are considered harmful due to lack of accuracy and performance of web applications with irrelevant terms in templates. Thus, template detection techniques have to be improved the performance of search engines, clustering, and classification of web documents. In this paper, the authors present an algorithm for extracting templates from a large number of web documents which are generated from heterogeneous templates. Their goal is to manage an anonymous number of templates and to improve the efficiency and scalability of template detection and extraction algorithms. The unknown number of templates can be deal with good selection of partitioning of all web documents.
Download Now

Find By Topic