Academy & Industry Research Collaboration Center
Network traffic data is huge, varying and imbalanced because various classes are not equally distributed. Machine Learning (ML) algorithms for traffic analysis uses the samples from this data to recommend the actions to be taken by the network administrators. Due to imbalances in dataset, machine learning algorithms may give biased or false results leading to serious degradation in performance of these algorithms. Since the network dataset is huge, during training machine learning algorithm takes more time and hence sampling should be used to reduce the training time.