Main Content Extraction from Detailed Web Pages

Provided by: Islamic Azad University
Topic: Big Data
Format: PDF
A web page structure and layout varies depend on different content type it will represent or the tastes of designer styling its content. Thereby main content position or the main tag containing main content differs in variety of websites. Even there might be some content in page view that are besides each other but actually in DOM tree they are not in the same level and same parents, so finding the main content in this area that doesn't follow any specific rules for arranging and positioning elements needs complicated and costly algorithms. Algorithms that could simulate a user visiting a website, in high probability could find informative content as result because in most cases actual users in internet could find the area of the main content.

Find By Topic