Journal of Theoretical and Applied Information Technology
In anomaly intrusion detection systems, machine learning algorithms, e.g. KNN, SOM, and SVM, are widely used to construct a model of normal system activity that are designed to work with numeric data. Consequently, symbolic data (e.g., TCP, SMTP, FTP, OTH, etc.) need to be converted into numeric data prior to being analyzed. From the previous works, there were different methods proposed for handling the symbolic data; for example, excluding symbolic data, arbitrary assignment, and indicator variables. However, these methods may entail a very difficult classification problem, especially an increase of the dimensionality of data that directly affect the computational complexity of machine learning algorithm.