Concept Mining of High Volume Data Streams in Network Traffic Using Hierarchical Clustering
This paper concerned with the problem of mining network traffic data discovering useful associations, relationships, and groupings in large collections of data. Mathematical transformation algorithms have proven effective at reducing the content of multilingual, unstructured data into a vector that describes the content. Such methods are particularly desirable in fields undergoing information explosions, such as network traffic analysis, bio-informatics, and the intelligence community. In response, traffic mining methodology is being extended to improve performance and sufficiently scalable. There is an additional need within the intelligence community to cluster related sets of data obtained from the network traffic. To allow this activity to happen at high speed, this work implements a system that hierarchically clusters streaming content.