Cleaning Various Noise Patterns in Web Pages for Web Data Extraction

Cleaning Web pages before mining becomes critical for improving performance of information retrieval and information extraction. With the exponentially growing amount of information available on the Internet, an effective technique for users to discern the useful information from the unnecessary information is urgently required. So, the authors investigate to remove various noisy data patterns in Web pages instead of extracting relevant content from Web pages to get main content information. In this paper, they propose an approach Noise-Eliminator that detect multiple noise patterns and remove these noise patterns from Web pages of any Web sites.

Provided by: INTI University College Topic: Software Date Added: Nov 2010 Format: PDF

Find By Topic