Partitioning Based Web Content Mining
Today the Web has become the largest information source for people. Most information retrieval systems on the Web consider web pages as the smallest and undividable units, but a web page as a whole may not be appropriate to represent a single semantic. A web content structure analysis based on visual representation is proposed in this dissertation work. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. Furthermore, web page often contains multiple topics that are not necessarily relevant to each other.