Inferring Web User Sessions by Clustering Techniques
This paper focuses on the definition and identification of "Web user-sessions", aggregations of several TCP connections generated by the same source host. The identification of a user-session is non trivial. Traditional approaches rely on threshold based mechanisms. However, these techniques are very sensitive to the value chosen for the threshold, which may be difficult to set correctly. By applying clustering techniques, the authors define a novel methodology to identify Web user-sessions without requiring an a priori definition of threshold values. They define a clustering based approach, they discuss pros and cons of this approach, and they apply it to real traffic traces.