Extraction of Page-Level Data for Efficient Webpage Indexing

Download Now
Provided by: Auricle Technologies
Topic: Big Data
Format: PDF
A commercial web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements (for business purposes and for easy user access). The authors call these blocks that are not the main content blocks of the page, the noisy blocks. They show that the information contained in these noisy blocks can seriously harm web data mining. Eliminating these noises is thus of great importance.
Download Now

Find By Topic