Source
https://www.kaggle.com/datasets/dhoogla/unswnb15?resource=download
Dataset
This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay. Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page. The full dataset also offers the pcap, BRO and Argus files along with additional documentation. The modifications to the predesignated train-test sets are minimal… See the full description on the dataset page: https://huggingface.co/datasets/wwydmanski/UNSW-NB15.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
KDD98
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
UNSW-NB15
This data is provided through the Train, Test CSV file provided by UNSW-NB15.
link
Labels
The label of the data set is as follows.
# Column Non-Null Count Dtype
0 id 82332 non-null int64
1 dur 82332 non-null float64
2 proto 82332 non-null object
3 service 82332 non-null object
4 state 82332 non-null object
5 spkts 82332 non-null int64
6 dpkts 82332 non-null int64
7 sbytes 82332 non-null int64
8 dbytes 82332 non-null int64
9 rate… See the full description on the dataset page: https://huggingface.co/datasets/Mireu-Lab/UNSW-NB15.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset is an extended version of UNSW-NB 15. It has 1 additional class synthesised and the data is normalised for ease of use. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here: https://www.researchgate.net/publication/382034618_Blender-GAN_Multi-Target_Conditional_Generative_Adversarial_Network_for_Novel_Class_Synthetic_Data_Generation. Citation info: Madhubalan, Akshayraj & Gautam, Amit & Tiwary, Priya. (2024). Blender-GAN: Multi-Target Conditional Generative Adversarial Network for Novel Class Synthetic Data Generation. 1-7. 10.1109/SmartNets61466.2024.10577645. This dataset was made by Abluva Inc, a Palo Alto based, research-driven Data Protection firm. Our data protection platform empowers customers to secure data through advanced security mechanisms such as Fine Grained Access control and sophisticated depersonalization algorithms (e.g. Pseudonymization, Anonymization and Randomization). Abluva's Data Protection solutions facilitate data democratization within and outside the organizations, mitigating the concerns related to theft and compliance. The innovative intrusion detection algorithm by Abluva employs patented technologies for an intricately balanced approach that excludes normal access deviations, ensuring intrusion detection without disrupting the business operations. Abluva’s Solution enables organizations to extract further value from their data by enabling secure Knowledge Graphs and deploying Secure Data as a Service among other novel uses of data. Committed to providing a safe and secure environment, Abluva empowers organizations to unlock the full potential of their data.
The UNSW-NB 15 dataset is a hybrid dataset of real-world normal activities and synthetic contemporary attack behaviors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of training and testing data by connection type from the UNSW-NB15 dataset [25].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The swift proliferation and extensive incorporation of the Internet into worldwide networks have rendered the utilization of Intrusion Detection Systems (IDS) essential for preserving network security. Nonetheless, Intrusion Detection Systems have considerable difficulties, especially in precisely identifying attacks from minority classes. Current methodologies in the literature predominantly adhere to one of two strategies: either disregarding minority classes or use resampling techniques to equilibrate class distributions. Nonetheless, these methods may constrain overall system efficacy. This research utilizes Shapley Additive Explanations (SHAP) for feature selection with Recursive Feature Elimination with Cross-Validation (RFECV), employing XGBoost as the classifier. The model attained precision, recall, and F1-scores of 0.8095, 0.8293, and 0.8193, respectively, signifying improved identification of minority class attacks, namely “worms,” within the UNSW NB15 dataset. To enhance the validation of the proposed approach, we utilized the CICIDS2019 and CICIoT2023 datasets, with findings affirming its efficacy in detecting and classifying minority class attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preprocessed UNSW-NB15 dataset without header. This dataset is presented NUMPY ARRAY for optimization. Header is in a separated file for ease of loading.
Train and tests sets are identical to original dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
http://guides.library.uq.edu.au/deposit_your_data/terms_and_conditionshttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions
This dataset is an enhanced version of NetFlow-based datasets, incorporating 53 extracted features that provide detailed insights into network flows. The dataset includes binary and multi-class labels, distinguishing between normal traffic and nine different types of attacks. It is structured in CSV format, with each row representing a single network flow, labeled accordingly. One of the key aspects of this dataset is the inclusion of temporal features, which allow for a more detailed analysis of traffic over time. The dataset records precise timestamps for each flow, including start and end times, enabling a more structured understanding of flow duration and activity patterns. Additionally, it captures inter-packet arrival time (IAT) statistics, including minimum, maximum, average, and standard deviation values for both source-to-destination and destination-to-source packet transmissions.Note, there are minor changes to the dataset description in this data record, which is slightly different from the information in the download files description. The information presented in this data record is the most up-to-date.
Mouwiya/UNSW-NB15-small dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The swift proliferation and extensive incorporation of the Internet into worldwide networks have rendered the utilization of Intrusion Detection Systems (IDS) essential for preserving network security. Nonetheless, Intrusion Detection Systems have considerable difficulties, especially in precisely identifying attacks from minority classes. Current methodologies in the literature predominantly adhere to one of two strategies: either disregarding minority classes or use resampling techniques to equilibrate class distributions. Nonetheless, these methods may constrain overall system efficacy. This research utilizes Shapley Additive Explanations (SHAP) for feature selection with Recursive Feature Elimination with Cross-Validation (RFECV), employing XGBoost as the classifier. The model attained precision, recall, and F1-scores of 0.8095, 0.8293, and 0.8193, respectively, signifying improved identification of minority class attacks, namely “worms,” within the UNSW NB15 dataset. To enhance the validation of the proposed approach, we utilized the CICIDS2019 and CICIoT2023 datasets, with findings affirming its efficacy in detecting and classifying minority class attacks.
This dataset was created by Phạm Xuân Hinh
FibonacciNeu/TestTemplateLLM-UNSW-NB15 dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mukhlis nst
Released under CC0: Public Domain
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
bastyje/UNSW-NB15 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The swift proliferation and extensive incorporation of the Internet into worldwide networks have rendered the utilization of Intrusion Detection Systems (IDS) essential for preserving network security. Nonetheless, Intrusion Detection Systems have considerable difficulties, especially in precisely identifying attacks from minority classes. Current methodologies in the literature predominantly adhere to one of two strategies: either disregarding minority classes or use resampling techniques to equilibrate class distributions. Nonetheless, these methods may constrain overall system efficacy. This research utilizes Shapley Additive Explanations (SHAP) for feature selection with Recursive Feature Elimination with Cross-Validation (RFECV), employing XGBoost as the classifier. The model attained precision, recall, and F1-scores of 0.8095, 0.8293, and 0.8193, respectively, signifying improved identification of minority class attacks, namely “worms,” within the UNSW NB15 dataset. To enhance the validation of the proposed approach, we utilized the CICIDS2019 and CICIoT2023 datasets, with findings affirming its efficacy in detecting and classifying minority class attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets created by Single Flow Time Series Analysis
Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.
This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf
In the following table is a description of each dataset file:
File name Detection problem Citation of original raw dataset
botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv Binary detection of DoH
Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
vpn_vnat_multiclass.csv Multi-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance evaluation of the proposed model across various classes in the CICIDS2019 and CICIoT2023 datasets.
This dataset was created by M Dil-Khan
Source
https://www.kaggle.com/datasets/dhoogla/unswnb15?resource=download
Dataset
This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay. Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page. The full dataset also offers the pcap, BRO and Argus files along with additional documentation. The modifications to the predesignated train-test sets are minimal… See the full description on the dataset page: https://huggingface.co/datasets/wwydmanski/UNSW-NB15.