Lightweight Hierarchical Clustering of Network Packets Using (p,N)-Grams
Source: Carleton University
The complexity of current Internet applications makes understanding network traffic a challenging task. By providing larger-scale aggregates for analysis, unsupervised clustering approaches can greatly aid in the identification of new applications, attacks, and other changes in network usage patterns. In this paper, the authors introduce ADHIC, a new algorithm that clusters similar network traffic together without prior knowledge of protocol structures. Packet similarity is determined through comparisons of (p,n)-grams (sub-strings within packets at distinguishing offsets).