12 datasets found

i
NSL-KDD
ieee-dataport.org
Updated Feb 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RUIZHE ZHAO (2022). NSL-KDD [Dataset]. https://ieee-dataport.org/documents/nsl-kdd-0
Explore at:
Dataset updated
Feb 2, 2022
Authors
RUIZHE ZHAO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The train set and test set of NSL-KDD
i
NSL-KDD
ieee-dataport.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seong-Tae Kim (2025). NSL-KDD [Dataset]. https://ieee-dataport.org/documents/nsl-kdd-1
Explore at:
Dataset updated
Jun 4, 2025
Authors
Seong-Tae Kim
Description
the performance of the learners are not biased by the methods which have better detection rates on the frequent records.
NSL_KDD Intrsuin Detection Dataset
kaggle.com
Updated Jan 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shujaat Ali Shariati (2024). NSL_KDD Intrsuin Detection Dataset [Dataset]. https://www.kaggle.com/datasets/shujaatalishariati/nsl-kdd-intrsuin-detection-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shujaat Ali Shariati
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
NSL-KDD dataset

NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set which are mentioned in [1]. Although, this new version of the KDD data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.

Furthermore, the number of records in the NSL-KDD train and test sets are reasonable. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable.

Data files

KDD_test+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format KDD_train+.TXT: The full NSL-KDD train set including attack-type labels and difficulty level in CSV format

It does not include redundant records in the train set, so the classifiers will not be biased towards more frequent records. There is no duplicate records in the proposed test sets; therefore, the performance of the learners are not biased by the methods which have better detection rates on the frequent records. The number of selected records from each difficultylevel group is inversely proportional to the percentage of records in the original KDD data set. As a result, the classification rates of distinct machine learning methods vary in a wider range, which makes it more efficient to have an accurate evaluation of different learning techniques. The number of records in the train and test sets are reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research works will be consistent and comparable. Statistical observations One of the most important deficiencies in the KDD data set is the huge number of redundant records, which causes the learning algorithms to be biased towards the frequent records, and thus prevent them from learning unfrequent records which are usually more harmful to networks such as U2R and R2L attacks. In addition, the existence of these repeated records in the test set will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records.

In addition, we analyzed the difficulty level of the records in KDD data set. Surprisingly, about 98% of the records in the train set and 86% of the records in the test set were correctly classified with all the 21 learners.

In order to perform our experiments, we randomly created three smaller subsets of the KDD train set each of which included fifty thousand records of information. Each of the learners where trained over the created train sets. We then employed the 21 learned machines (7 learners, each trained 3 times) to label the records of the entire KDD train and test sets, which provides us with 21 predicated labels for each record. Further, we annotated each record of the data set with a #successfulPrediction value, which was initialized to zero. Now, since the KDD data set provides the correct label for each record, we compared the predicated label of each record given by a specific learner with the actual label, where we incremented #successfulPrediction by one if a match was found. Through this process, we calculated the number of learners that were able to correctly label that given record. The highest value for #successfulPrediction is 21, which conveys the fact that all learners were able to correctly predict the label of that record.

Statistics of redundant records in the KDD train set

Original records | Distinct records | Reduction rate

Attacks: 3,925,650 | 262,178 | 93.32% Normal: 972,781 | 812,814 | 16.44% Total: 4,898,431 | 1,074,992 | 78.05% Statistics of redundant records in the KDD test set Original records | Distinct records | Reduction rate

Attacks: 250,436 | 29,378 | 88.26% Normal: 60,591 | 47,911 | 20.92% Total: 311,027 | 77,289 | 75.15% License You may redistribute, republish, and mirror the NSL-KDD dataset in any form. However, any use or redistribution of the data must include a citation to the NSL-KDD dataset and the paper referenced below
f
Transformation of symbolic features in NSL-KDD.
plos.figshare.com
bin
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber (2023). Transformation of symbolic features in NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0284795.t003
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0284795.t003
Dataset updated
Aug 1, 2023
Dataset provided by
PLOS ONE
Authors
Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.
i
IoT Data from IoT-23
ieee-dataport.org
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Balega (2024). IoT Data from IoT-23 [Dataset]. https://ieee-dataport.org/documents/iot-data-iot-23-nsl-kdd-and-toniot
Explore at:
Dataset updated
Jul 8, 2024
Authors
Maria Balega
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As the Internet of Things (IoT) continues to evolve
f
Datasets Repository
figshare.com
html
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabio Schuartz (2024). Datasets Repository [Dataset]. http://doi.org/10.6084/m9.figshare.25656966.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25656966.v1
Dataset updated
Apr 20, 2024
Dataset provided by
figshare
Authors
Fabio Schuartz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The NSL-KDD and CICIDS2017 datasets used on the research.
f
Description of the NSL-KDD dataset attack categories.
plos.figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Description of the NSL-KDD dataset attack categories. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t002
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of the NSL-KDD dataset attack categories.
f
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t005
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
f
Summary of LITNET-2020 dataset.
plos.figshare.com
bin
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber (2023). Summary of LITNET-2020 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0284795.t006
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0284795.t006
Dataset updated
Aug 1, 2023
Dataset provided by
PLOS ONE
Authors
Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.
LSTM model parameters.
plos.figshare.com
xls
Updated Jan 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). LSTM model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295801.t007
Dataset updated
Jan 24, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
f
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...
plos.figshare.com
xls
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t010
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD.
f
Table of parameters of the PVDM.
plos.figshare.com
xls
Updated Jan 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). Table of parameters of the PVDM. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295801.t005
Dataset updated
Jan 24, 2024
Dataset provided by
PLOS ONE
Authors
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
Not seeing a result you expected?
Learn how you can add new datasets to our index.