The Data Integration Architecture "Build the Foundation for Success"
For the purpose of many network measurement and monitoring tasks, it is expected to have an accurate yet information-compact profiling of the behavior of massive event sequences. The goal of this paper is to develop a general methodology for a complete profiling of massive event sequences. The primary challenge in achieving this goal is the multi-dimensional behavior exhibited by event sequences. This paper also provides a novel method to fully capture the behavior of massive amount of sequences by a compact model. The learned mixture model is information-compact as it classifies sequences into a set of behavior templates, each of which is described by a Markov Chain. The most salient feature of the model is that it simultaneously captures both the order of events and duration between events. Prior domain knowledge on the event sequences can be seamlessly integrated into the model to improve accuracy and to reduce complexity. To estimate parameters of the model, an iterative algorithm based on the Expectation Maximization algorithm is developed. This method is evaluated on multiple network traces, including a TCP packet trace, a VoIP call collection and Wi-Fi syslog. The paper also explores the applicability of this method to two network monitoring tasks. It also mentions about a visualization tool and an anomaly detection scheme.