International Journal of Computer Applications
Web is a huge reservoir of information. Data available is extremely diversified and abundant. To search for specific information, the user has to go through many pages of the Internet, filter the data and download related documents and files. This task of searching and downloading is time consuming. Web pages are in unstructured HTML format. There is a necessity to convert unstructured HTML format into a new structured format such as XML or XHTML. The authors propose an approach for implementing web data extraction and developing a Mashup from HTML web pages.