Crawling the Web Surface Databases
The World Wide Web is growing at a rapid rate. A web crawler is a computer program which independently browses the World Wide Web. The size of web as on February 2007 was 29 billion pages. One of the most important uses of web page is in indexing purpose and keeping web pages up to date which can be used by search engine to serve the end user queries. Web is dynamic in nature; hence, the authors need to update the web pages constantly. In this paper, they put forward a technique to update a page stored in web repository. This paper put forward an efficient method to refresh a page. They are proposing two methods for refreshing the page by comparing the page structure. First method compares the page structure with the help of tags used in it. And second method creates a document tree compare structures of pages.
Provided by: International Journal of Computer Applications Topic: Developer Date Added: Aug 2012 Format: PDF