Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The train set and test set of NSL-KDD
the performance of the learners are not biased by the methods which have better detection rates on the frequent records.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set which are mentioned in [1]. Although, this new version of the KDD data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.
Furthermore, the number of records in the NSL-KDD train and test sets are reasonable. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable.
KDD_test+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format KDD_train+.TXT: The full NSL-KDD train set including attack-type labels and difficulty level in CSV format
It does not include redundant records in the train set, so the classifiers will not be biased towards more frequent records. There is no duplicate records in the proposed test sets; therefore, the performance of the learners are not biased by the methods which have better detection rates on the frequent records. The number of selected records from each difficultylevel group is inversely proportional to the percentage of records in the original KDD data set. As a result, the classification rates of distinct machine learning methods vary in a wider range, which makes it more efficient to have an accurate evaluation of different learning techniques. The number of records in the train and test sets are reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research works will be consistent and comparable. Statistical observations One of the most important deficiencies in the KDD data set is the huge number of redundant records, which causes the learning algorithms to be biased towards the frequent records, and thus prevent them from learning unfrequent records which are usually more harmful to networks such as U2R and R2L attacks. In addition, the existence of these repeated records in the test set will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records.
In addition, we analyzed the difficulty level of the records in KDD data set. Surprisingly, about 98% of the records in the train set and 86% of the records in the test set were correctly classified with all the 21 learners.
In order to perform our experiments, we randomly created three smaller subsets of the KDD train set each of which included fifty thousand records of information. Each of the learners where trained over the created train sets. We then employed the 21 learned machines (7 learners, each trained 3 times) to label the records of the entire KDD train and test sets, which provides us with 21 predicated labels for each record. Further, we annotated each record of the data set with a #successfulPrediction value, which was initialized to zero. Now, since the KDD data set provides the correct label for each record, we compared the predicated label of each record given by a specific learner with the actual label, where we incremented #successfulPrediction by one if a match was found. Through this process, we calculated the number of learners that were able to correctly label that given record. The highest value for #successfulPrediction is 21, which conveys the fact that all learners were able to correctly predict the label of that record.
Original records | Distinct records | Reduction rate
Attacks: 3,925,650 | 262,178 | 93.32% Normal: 972,781 | 812,814 | 16.44% Total: 4,898,431 | 1,074,992 | 78.05% Statistics of redundant records in the KDD test set Original records | Distinct records | Reduction rate
Attacks: 250,436 | 29,378 | 88.26% Normal: 60,591 | 47,911 | 20.92% Total: 311,027 | 77,289 | 75.15% License You may redistribute, republish, and mirror the NSL-KDD dataset in any form. However, any use or redistribution of the data must include a citation to the NSL-KDD dataset and the paper referenced below
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As the Internet of Things (IoT) continues to evolve
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NSL-KDD and CICIDS2017 datasets used on the research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of the NSL-KDD dataset attack categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The train set and test set of NSL-KDD