IJCTT-International Journal of Computer Trends and Technology
An increasing number of databases have become web accessible through HTML form-based search interfaces. Hence the data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units which is to be machine processable, which is necessary for many applications such as deep web data collection and Internet comparison shopping, they need to extracted out and assigned meaningful labels. In this paper, the authors are presenting an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic.