Differences in Caching of Robots.txt by Search Engine Crawlers

Web log mining gives insight to the behavior of search engine crawler accessing a website. Search engines depend on Web crawlers as they can generate 90% of the traffic on websites. Web crawlers access the websites for diverse purpose which includes security violations also. Crawlers periodically visit the website and update contents on the website. The behavior of search engine crawler gives vital information about the ethics of crawlers, dynamicity of crawling, how much they contribute to the server load and so on. Ethical crawlers initially access the "Robots.txt" file and then proceeds to the crawling process according to the permissions and restrictions given in this file. This paper is an attempt to identify the differences of various search engine crawler and the time delay in caching the â??Robots.txtâ?? file. In this data set time delay in seconds for caching the robots.txt of 4 searches engine crawlers are chosen for study.

Provided by: Creative Commons Topic: Developer Date Added: Sep 2015 Format: PDF

Find By Topic