Cleaning Various Noise Patterns in Web Pages for Web Data Extraction
Cleaning Web pages before mining becomes critical for improving performance of information retrieval and information extraction. With the exponentially growing amount of information available on the Internet, an effective technique for users to discern the useful information from the unnecessary information is urgently required. So, the authors investigate to remove various noisy data patterns in Web pages instead of extracting relevant content from Web pages to get main content information. In this paper, they propose an approach Noise-Eliminator that detect multiple noise patterns and remove these noise patterns from Web pages of any Web sites.