9 datasets found
  1. f

    The accuracy result on NSL-KDD dataset.

    • plos.figshare.com
    xls
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila (2024). The accuracy result on NSL-KDD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299666.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.

  2. BETH Dataset

    • kaggle.com
    Updated Jul 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Highnam (2021). BETH Dataset [Dataset]. https://www.kaggle.com/katehighnam/beth-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kate Highnam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset corresponds to the paper "BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research" by Kate Highnam* (@jinxmirror13), Kai Arulkumaran* (@kaixhin), Zachary Hanif*, and Nicholas R. Jennings (@LboroVC).

    This paper was published in the ICML Workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security (CAMLIS 2021)

    THIS DATASET IS STILL BEING UPDATED

    Context

    When deploying machine learning (ML) models in the real world, anomalous data points and shifts in the data distribution are inevitable. From a cyber security perspective, these anomalies and dataset shifts are driven by both defensive and adversarial advancement. To withstand the cost of critical system failure, the development of robust models is therefore key to the performance, protection, and longevity of deployed defensive systems.

    We present the BPF-extended tracking honeypot (BETH) dataset as the first cybersecurity dataset for uncertainty and robustness benchmarking. Collected using a novel honeypot tracking system, our dataset has the following properties that make it attractive for the development of robust ML methods: 1. At over eight million data points, this is one of the largest cyber security datasets available 2. It contains modern host activity and attacks 3. It is fully labelled 4. It contains highly structured but heterogeneous features 5. Each host contains benign activity and at most a single attack, which is ideal for behavioural analysis and other research tasks. In addition to the described dataset

    Further data is currently being collected and analysed to add alternative attack vectors to the dataset.

    There are several existing cyber security datasets used in ML research, including the KDD Cup 1999 Data (Hettich & Bay, 1999), the 1998 DARPA Intrusion Detection Evaluation Dataset (Labs, 1998; Lippmann et al., 2000), the ISCX IDS 2012 dataset (Shiravi et al., 2012), and NSL-KDD (Tavallaee et al., 2009), which primarily removes duplicates from the KDD Cup 1999 Data. Each includes millions of records of realistic activity for enterprise applications, with labels for attacks or benign activity. The KDD1999, NSLKDD, and ISCX datasets contain network traffic, while the DARPA1998 dataset also includes limited process calls. However, these datasets are at best almost a decade old, and are collected on in-premise servers. In contrast, BETH contains modern host activity and activity collected from cloud services, making it relevant for current real-world deployments. In addition, some datasets include artificial user activity (Shiravi et al., 2012) while BETH contains only real activity. BETH is also one of the few datasets to include both kernel-process and network logs, providing a holistic view of malicious behaviour.

    Content

    The BETH dataset currently represents 8,004,918 events collected over 23 honeypots, running for about five noncontiguous hours on a major cloud provider. For benchmarking and discussion, we selected the initial subset of the process logs. This subset was further divided into training, validation, and testing sets with a rough 60/20/20 split based on host, quantity of logs generated, and the activity logged—only the test set includes an attack

    The dataset is composed of two sensor logs: kernel-level process calls and network traffic. The initial benchmark subset only includes process logs. Each process call consists of 14 raw features and 2 hand-crafted labels.

    See the paper for more details. For details on the events recorded within the logs, see this report.

    Benchmarks

    Code for our benchmarks, as detailed in the paper, are available through Github at: https://github.com/jinxmirror13/BETH_Dataset_Analysis

    Acknowledgements

    Thank you to Dr. Arinbjörn Kolbeinsson for his assistance in analysing the data and the reviewers for their positive feedback.

  3. f

    Confusion matrix of NSL-KDD dataset.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magdy M. Fadel; Sally M. El-Ghamrawy; Amr M. T. Ali-Eldin; Mohammed K. Hassan; Ali I. El-Desoky (2023). Confusion matrix of NSL-KDD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0271436.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Magdy M. Fadel; Sally M. El-Ghamrawy; Amr M. T. Ali-Eldin; Mohammed K. Hassan; Ali I. El-Desoky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Confusion matrix of NSL-KDD dataset.

  4. P

    MTA-KDD'19 Dataset

    • paperswithcode.com
    Updated Feb 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Letteri; Antonio Di Cecco; Abeer Dyoub; Giuseppe Della Penna (2020). MTA-KDD'19 Dataset [Dataset]. https://paperswithcode.com/dataset/mta-kdd-19
    Explore at:
    Dataset updated
    Feb 13, 2020
    Authors
    Ivan Letteri; Antonio Di Cecco; Abeer Dyoub; Giuseppe Della Penna
    Description

    Malware Traffic Analysis Knowledge Dataset 2019 (MTA-KDD'19) is an updated and refined dataset specifically tailored to train and evaluate machine learning based malware traffic analysis algorithms. To generate it, that authors started from the largest databases of network traffic captures available online, deriving a dataset with a set of widely-applicable features and then cleaning and preprocessing it to remove noise, handle missing data and keep its size as small as possible. The resulting dataset is not biased by any specific application (although specifically addressed to machine learning algorithms), and the entire process can run automatically to keep it updated.

  5. i

    CSE-CIC-IDS2018 and NSLKDD Image Dataset

    • ieee-dataport.org
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yonatan Embiza Tadesse (2023). CSE-CIC-IDS2018 and NSLKDD Image Dataset [Dataset]. https://ieee-dataport.org/documents/cse-cic-ids2018-and-nslkdd-image-dataset
    Explore at:
    Dataset updated
    Jun 19, 2023
    Authors
    Yonatan Embiza Tadesse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CSE-CIC-IDS2018

  6. f

    Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.

    • figshare.com
    xls
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.

  7. f

    Classification performance of our method with 6 and 4 attributes on NSL-KDD....

    • figshare.com
    xls
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). Classification performance of our method with 6 and 4 attributes on NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification performance of our method with 6 and 4 attributes on NSL-KDD.

  8. T3Set: Table Tennis Training Multimodal Dataset

    • zenodo.org
    bin, pdf
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ji Ma; Ji Ma (2025). T3Set: Table Tennis Training Multimodal Dataset [Dataset]. http://doi.org/10.5281/zenodo.15516144
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ji Ma; Ji Ma
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    May 26, 2025
    Description

    This is the dataset for the KDD'25 (dataset and benchmark track) full paper "T3Set: A Multimodal Dataset with Targeted Suggestions for LLM-based Virtual Coach in Table Tennis Training".

    T3Set (Table Tennis Training) is a multimodal dataset with aligned video-sensor-text data in table tennis training. The key features of T3Set include (1)temporal alignment between sensor data, video data, and text data. (2)high-quality targeted suggestions which are consistent with predefined suggestion taxonomy.

    The scripts we used for dataset construction and data cleaning processes, are provided in the Github Repo: https://github.com/jima-cs/t3set

    If you find this dataset useful, please cite our paper:
    ```bibtex
    @inproceedings{
    ma2025t3set,
    title={T3Set: A Multimodal Dataset with Targeted Suggestions for LLM-based Virtual Coach in Table Tennis Training},
    author={Ji Ma and Jiale Wu and Haoyu Wang and Yanze Zhang and Xiao Xie and Zheng Zhou and Jiachen Wang and Yingcai Wu},
    year={2025},
    booktitle={Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2},
    doi={10.1145/3711896.3737407}
    pages={}
    }
    ```

  9. f

    Datasets characteristics of KDDCup 2015 and XuetangX.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng Pan; Bingyao Huang; Chunhong Zhang; Xinning Zhu; Zhenyu Wu; Moyu Zhang; Yang Ji; Zhanfei Ma; Zhengchen Li (2023). Datasets characteristics of KDDCup 2015 and XuetangX. [Dataset]. http://doi.org/10.1371/journal.pone.0267138.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Feng Pan; Bingyao Huang; Chunhong Zhang; Xinning Zhu; Zhenyu Wu; Moyu Zhang; Yang Ji; Zhanfei Ma; Zhengchen Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets characteristics of KDDCup 2015 and XuetangX.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila (2024). The accuracy result on NSL-KDD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299666.t004

The accuracy result on NSL-KDD dataset.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 21, 2024
Dataset provided by
PLOS ONE
Authors
Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.

Search
Clear search
Close search
Google apps
Main menu