International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
Data extraction from HTML is usually performed by software modules called wrappers. A key problem with manually coded wrappers is that writing them is usually a difficult and labor intensive task and that by their nature wrappers tend to be brittle and difficult to maintain. This paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing.