Clickstream Data Warehousing for Web Crawlers Profiling
Web sites routinely monitor visitor traffic as a useful measure of their overall success. However, simple summaries such as the total number of visits per month provide little insight about individual site patterns, especially in a changing environment like the Web. In this paper it is described an approach to usage profiling based on clickstream data collected on several Web servers' sites and stored in a specialized clickstream data warehousing. The authors aim at providing valuable insights about common users, but also preventing unauthorised access to contents and any form of overload that might deteriorate site performance. Common crawler detection heuristics help to classify sessions, enabling the construction of site-specific profile training sets.