Finding Generalized Path Patterns for Web Log Data Mining
Conducting data mining on logs of web servers involves the determination of frequently occurring access sequences. This paper examines the problem of finding traversal patterns from web logs by considering the fact that irrelevant access to web documents may be interleaved within access patterns due to navigational purposes. A general type of pattern that takes into account this fact is defined and also a level-wise algorithm for the determination of these patterns, which is based on the underlying structure of the web site, is presented. The performance of the algorithm and its sensitivity to several parameters is examined experimentally with synthetic data.