9 datasets found

f
The accuracy result on NSL-KDD dataset.
plos.figshare.com
xls
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila (2024). The accuracy result on NSL-KDD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299666.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299666.t004
Dataset updated
Jun 21, 2024
Dataset provided by
PLOS ONE
Authors
Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.
BETH Dataset
kaggle.com
Updated Jul 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kate Highnam (2021). BETH Dataset [Dataset]. https://www.kaggle.com/katehighnam/beth-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kate Highnam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset corresponds to the paper "BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research" by Kate Highnam* (@jinxmirror13), Kai Arulkumaran* (@kaixhin), Zachary Hanif*, and Nicholas R. Jennings (@LboroVC).

This paper was published in the ICML Workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security (CAMLIS 2021)

THIS DATASET IS STILL BEING UPDATED

Context

When deploying machine learning (ML) models in the real world, anomalous data points and shifts in the data distribution are inevitable. From a cyber security perspective, these anomalies and dataset shifts are driven by both defensive and adversarial advancement. To withstand the cost of critical system failure, the development of robust models is therefore key to the performance, protection, and longevity of deployed defensive systems.

We present the BPF-extended tracking honeypot (BETH) dataset as the first cybersecurity dataset for uncertainty and robustness benchmarking. Collected using a novel honeypot tracking system, our dataset has the following properties that make it attractive for the development of robust ML methods: 1. At over eight million data points, this is one of the largest cyber security datasets available 2. It contains modern host activity and attacks 3. It is fully labelled 4. It contains highly structured but heterogeneous features 5. Each host contains benign activity and at most a single attack, which is ideal for behavioural analysis and other research tasks. In addition to the described dataset

Further data is currently being collected and analysed to add alternative attack vectors to the dataset.

There are several existing cyber security datasets used in ML research, including the KDD Cup 1999 Data (Hettich & Bay, 1999), the 1998 DARPA Intrusion Detection Evaluation Dataset (Labs, 1998; Lippmann et al., 2000), the ISCX IDS 2012 dataset (Shiravi et al., 2012), and NSL-KDD (Tavallaee et al., 2009), which primarily removes duplicates from the KDD Cup 1999 Data. Each includes millions of records of realistic activity for enterprise applications, with labels for attacks or benign activity. The KDD1999, NSLKDD, and ISCX datasets contain network traffic, while the DARPA1998 dataset also includes limited process calls. However, these datasets are at best almost a decade old, and are collected on in-premise servers. In contrast, BETH contains modern host activity and activity collected from cloud services, making it relevant for current real-world deployments. In addition, some datasets include artificial user activity (Shiravi et al., 2012) while BETH contains only real activity. BETH is also one of the few datasets to include both kernel-process and network logs, providing a holistic view of malicious behaviour.

Content

The BETH dataset currently represents 8,004,918 events collected over 23 honeypots, running for about five noncontiguous hours on a major cloud provider. For benchmarking and discussion, we selected the initial subset of the process logs. This subset was further divided into training, validation, and testing sets with a rough 60/20/20 split based on host, quantity of logs generated, and the activity logged—only the test set includes an attack

The dataset is composed of two sensor logs: kernel-level process calls and network traffic. The initial benchmark subset only includes process logs. Each process call consists of 14 raw features and 2 hand-crafted labels.

See the paper for more details. For details on the events recorded within the logs, see this report.

Benchmarks

Code for our benchmarks, as detailed in the paper, are available through Github at: https://github.com/jinxmirror13/BETH_Dataset_Analysis

Acknowledgements

Thank you to Dr. Arinbjörn Kolbeinsson for his assistance in analysing the data and the reviewers for their positive feedback.
f
Confusion matrix of NSL-KDD dataset.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Magdy M. Fadel; Sally M. El-Ghamrawy; Amr M. T. Ali-Eldin; Mohammed K. Hassan; Ali I. El-Desoky (2023). Confusion matrix of NSL-KDD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0271436.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0271436.t005
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Magdy M. Fadel; Sally M. El-Ghamrawy; Amr M. T. Ali-Eldin; Mohammed K. Hassan; Ali I. El-Desoky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Confusion matrix of NSL-KDD dataset.
P
MTA-KDD'19 Dataset
paperswithcode.com
Updated Feb 13, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Letteri; Antonio Di Cecco; Abeer Dyoub; Giuseppe Della Penna (2020). MTA-KDD'19 Dataset [Dataset]. https://paperswithcode.com/dataset/mta-kdd-19
Explore at:
Dataset updated
Feb 13, 2020
Authors
Ivan Letteri; Antonio Di Cecco; Abeer Dyoub; Giuseppe Della Penna
Description
Malware Traffic Analysis Knowledge Dataset 2019 (MTA-KDD'19) is an updated and refined dataset specifically tailored to train and evaluate machine learning based malware traffic analysis algorithms. To generate it, that authors started from the largest databases of network traffic captures available online, deriving a dataset with a set of widely-applicable features and then cleaning and preprocessing it to remove noise, handle missing data and keep its size as small as possible. The resulting dataset is not biased by any specific application (although specifically addressed to machine learning algorithms), and the entire process can run automatically to keep it updated.
i
CSE-CIC-IDS2018 and NSLKDD Image Dataset
ieee-dataport.org
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yonatan Embiza Tadesse (2023). CSE-CIC-IDS2018 and NSLKDD Image Dataset [Dataset]. https://ieee-dataport.org/documents/cse-cic-ids2018-and-nslkdd-image-dataset
Explore at:
Dataset updated
Jun 19, 2023
Authors
Yonatan Embiza Tadesse
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CSE-CIC-IDS2018
f
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t005
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
f
Classification performance of our method with 6 and 4 attributes on NSL-KDD....
figshare.com
xls
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). Classification performance of our method with 6 and 4 attributes on NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295801.t008
Dataset updated
Jan 24, 2024
Dataset provided by
PLOS ONE
Authors
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification performance of our method with 6 and 4 attributes on NSL-KDD.
T3Set: Table Tennis Training Multimodal Dataset
zenodo.org
bin, pdf
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ji Ma; Ji Ma (2025). T3Set: Table Tennis Training Multimodal Dataset [Dataset]. http://doi.org/10.5281/zenodo.15516144
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15516144
Dataset updated
May 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ji Ma; Ji Ma
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
May 26, 2025
Description
This is the dataset for the KDD'25 (dataset and benchmark track) full paper "T3Set: A Multimodal Dataset with Targeted Suggestions for LLM-based Virtual Coach in Table Tennis Training".

T3Set (Table Tennis Training) is a multimodal dataset with aligned video-sensor-text data in table tennis training. The key features of T3Set include (1)temporal alignment between sensor data, video data, and text data. (2)high-quality targeted suggestions which are consistent with predefined suggestion taxonomy.

The scripts we used for dataset construction and data cleaning processes, are provided in the Github Repo: https://github.com/jima-cs/t3set

If you find this dataset useful, please cite our paper:
```bibtex
@inproceedings{
ma2025t3set,
title={T3Set: A Multimodal Dataset with Targeted Suggestions for LLM-based Virtual Coach in Table Tennis Training},
author={Ji Ma and Jiale Wu and Haoyu Wang and Yanze Zhang and Xiao Xie and Zheng Zhou and Jiachen Wang and Yingcai Wu},
year={2025},
booktitle={Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2},
doi={10.1145/3711896.3737407}
pages={}
}
```
f
Datasets characteristics of KDDCup 2015 and XuetangX.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Feng Pan; Bingyao Huang; Chunhong Zhang; Xinning Zhu; Zhenyu Wu; Moyu Zhang; Yang Ji; Zhanfei Ma; Zhengchen Li (2023). Datasets characteristics of KDDCup 2015 and XuetangX. [Dataset]. http://doi.org/10.1371/journal.pone.0267138.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0267138.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Feng Pan; Bingyao Huang; Chunhong Zhang; Xinning Zhu; Zhenyu Wu; Moyu Zhang; Yang Ji; Zhanfei Ma; Zhengchen Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets characteristics of KDDCup 2015 and XuetangX.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila (2024). The accuracy result on NSL-KDD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299666.t004

The accuracy result on NSL-KDD dataset.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0299666.t004

Dataset updated

Jun 21, 2024

Dataset provided by

PLOS ONE

Authors

Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.

Clear search

Close search

Google apps

Main menu

The accuracy result on NSL-KDD dataset.

BETH Dataset

THIS DATASET IS STILL BEING UPDATED

Context

Content

Benchmarks

Acknowledgements

Confusion matrix of NSL-KDD dataset.

MTA-KDD'19 Dataset

CSE-CIC-IDS2018 and NSLKDD Image Dataset

Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.

Classification performance of our method with 6 and 4 attributes on NSL-KDD....

T3Set: Table Tennis Training Multimodal Dataset

Datasets characteristics of KDDCup 2015 and XuetangX.

The accuracy result on NSL-KDD dataset.See More Versions

The accuracy result on NSL-KDD dataset.