12 datasets found
  1. i

    NSL-KDD

    • ieee-dataport.org
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RUIZHE ZHAO (2022). NSL-KDD [Dataset]. https://ieee-dataport.org/documents/nsl-kdd-0
    Explore at:
    Dataset updated
    Feb 2, 2022
    Authors
    RUIZHE ZHAO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The train set and test set of NSL-KDD

  2. i

    NSL-KDD

    • ieee-dataport.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seong-Tae Kim (2025). NSL-KDD [Dataset]. https://ieee-dataport.org/documents/nsl-kdd-1
    Explore at:
    Dataset updated
    Jun 4, 2025
    Authors
    Seong-Tae Kim
    Description

    the performance of the learners are not biased by the methods which have better detection rates on the frequent records.

  3. NSL_KDD Intrsuin Detection Dataset

    • kaggle.com
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shujaat Ali Shariati (2024). NSL_KDD Intrsuin Detection Dataset [Dataset]. https://www.kaggle.com/datasets/shujaatalishariati/nsl-kdd-intrsuin-detection-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shujaat Ali Shariati
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    NSL-KDD dataset

    NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set which are mentioned in [1]. Although, this new version of the KDD data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.

    Furthermore, the number of records in the NSL-KDD train and test sets are reasonable. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable.

    Data files

    KDD_test+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format KDD_train+.TXT: The full NSL-KDD train set including attack-type labels and difficulty level in CSV format

    It does not include redundant records in the train set, so the classifiers will not be biased towards more frequent records. There is no duplicate records in the proposed test sets; therefore, the performance of the learners are not biased by the methods which have better detection rates on the frequent records. The number of selected records from each difficultylevel group is inversely proportional to the percentage of records in the original KDD data set. As a result, the classification rates of distinct machine learning methods vary in a wider range, which makes it more efficient to have an accurate evaluation of different learning techniques. The number of records in the train and test sets are reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research works will be consistent and comparable. Statistical observations One of the most important deficiencies in the KDD data set is the huge number of redundant records, which causes the learning algorithms to be biased towards the frequent records, and thus prevent them from learning unfrequent records which are usually more harmful to networks such as U2R and R2L attacks. In addition, the existence of these repeated records in the test set will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records.

    In addition, we analyzed the difficulty level of the records in KDD data set. Surprisingly, about 98% of the records in the train set and 86% of the records in the test set were correctly classified with all the 21 learners.

    In order to perform our experiments, we randomly created three smaller subsets of the KDD train set each of which included fifty thousand records of information. Each of the learners where trained over the created train sets. We then employed the 21 learned machines (7 learners, each trained 3 times) to label the records of the entire KDD train and test sets, which provides us with 21 predicated labels for each record. Further, we annotated each record of the data set with a #successfulPrediction value, which was initialized to zero. Now, since the KDD data set provides the correct label for each record, we compared the predicated label of each record given by a specific learner with the actual label, where we incremented #successfulPrediction by one if a match was found. Through this process, we calculated the number of learners that were able to correctly label that given record. The highest value for #successfulPrediction is 21, which conveys the fact that all learners were able to correctly predict the label of that record.

    Statistics of redundant records in the KDD train set

    Original records | Distinct records | Reduction rate

    Attacks: 3,925,650 | 262,178 | 93.32% Normal: 972,781 | 812,814 | 16.44% Total: 4,898,431 | 1,074,992 | 78.05% Statistics of redundant records in the KDD test set Original records | Distinct records | Reduction rate

    Attacks: 250,436 | 29,378 | 88.26% Normal: 60,591 | 47,911 | 20.92% Total: 311,027 | 77,289 | 75.15% License You may redistribute, republish, and mirror the NSL-KDD dataset in any form. However, any use or redistribution of the data must include a citation to the NSL-KDD dataset and the paper referenced below

  4. f

    Transformation of symbolic features in NSL-KDD.

    • plos.figshare.com
    bin
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber (2023). Transformation of symbolic features in NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0284795.t003
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.

  5. i

    IoT Data from IoT-23

    • ieee-dataport.org
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Balega (2024). IoT Data from IoT-23 [Dataset]. https://ieee-dataport.org/documents/iot-data-iot-23-nsl-kdd-and-toniot
    Explore at:
    Dataset updated
    Jul 8, 2024
    Authors
    Maria Balega
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As the Internet of Things (IoT) continues to evolve

  6. f

    Datasets Repository

    • figshare.com
    html
    Updated Apr 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio Schuartz (2024). Datasets Repository [Dataset]. http://doi.org/10.6084/m9.figshare.25656966.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Apr 20, 2024
    Dataset provided by
    figshare
    Authors
    Fabio Schuartz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The NSL-KDD and CICIDS2017 datasets used on the research.

  7. f

    Description of the NSL-KDD dataset attack categories.

    • plos.figshare.com
    xls
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Description of the NSL-KDD dataset attack categories. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of the NSL-KDD dataset attack categories.

  8. f

    Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.

    • figshare.com
    xls
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.

  9. f

    Summary of LITNET-2020 dataset.

    • plos.figshare.com
    bin
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber (2023). Summary of LITNET-2020 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0284795.t006
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.

  10. LSTM model parameters.

    • plos.figshare.com
    xls
    Updated Jan 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). LSTM model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.

  11. f

    MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...

    • plos.figshare.com
    xls
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD.

  12. f

    Table of parameters of the PVDM.

    • plos.figshare.com
    xls
    Updated Jan 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). Table of parameters of the PVDM. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
RUIZHE ZHAO (2022). NSL-KDD [Dataset]. https://ieee-dataport.org/documents/nsl-kdd-0

NSL-KDD

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 2, 2022
Authors
RUIZHE ZHAO
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The train set and test set of NSL-KDD

Search
Clear search
Close search
Google apps
Main menu