A Survey on Web Page Segmentation and Its Applications
A web page is a document it creates the html that shows up on the internet when one type in or go to the web page's address. The information extraction from the Web are webpage structure understanding and natural language sentences processing. The webpage understanding problem which consists of three subtasks, webpage segmentation, webpage structure labeling, and webpage text segmentation and labeling. The effectiveness of leveraging layout and tag-tree structure for segmenting webpages and labeling HTML elements.