International Association of Engineers
Web is a huge reservoir of information. Data available is extremely diversified and abundant. Various types of data can be easily extracted from the Internet, although not all of the data is relevant to the users. Most web pages are in unstructured HTML format, making web data extraction process very time consuming and costly. There is a necessity to convert unstructured HTML format into a new structured format such as XML or XHTML. The authors propose an approach for implementing web data extraction and developing a Mashup from HTML web pages.