International Journal of Computer Applications
In this paper, the authors present a robust unsupervised approach for extraction of data records from dynamic web pages using tag tree comparison. Extracting data records from the web pages involves following sequences. They first download the related web pages of interest on their system. Next, they construct DOM trees for those pages using a parser. They then compare two or more web pages to eliminate the noisy unwanted data such as header, menu bar, navigation bar, advertisements, etc. and find the region of interest called data region or object region.