Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay.
Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page.
The full dataset also offers the pcap, BRO and Argus files along with additional documentation.
V1: Original CSVs obtained from here V2: Cleaning -> parquet V3: Reorganize to save storage, only keep original CSVs in V1/V2 V4: Update to remove contaminating features [presentation] & [conference article]
My modifications to the predesignated train-test sets are minimal and designed to decrease disk storage and increase performance & reliability.
In its current iteration, the dataset can be loaded trivially with pd.read_parquet()
. All data types are already set correctly and there are 0 records with missing information.
Reading parquet files does require fastparquet and / or pyarrow
Exploratory Data Analysis (EDA) through classification with very simple models to .877 AUROC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Binarized version of the UNSW-NB15 dataset, where the original features (a mix of strings, categorical values, floating point values etc) are converted to a bit string of 593 bits. Each value in each feature is either 0 or 1, stored as a uint8 value. The uint8 values are represented as numpy arrays, provided separately for training and test data (same train/test split as the original dataset is used). The final binary value in each sample is the expected output.
Among others, this dataset has been used for quantized neural network research:
Umuroglu, Y., Akhauri, Y., Fraser, N. J., & Blott, M. (2020, August). LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL) (pp. 291-297). IEEE.
The method for binarization is identical to the one described in 10.5281/zenodo.3258657 :
"T. Murovič, A. Trost, Massively Parallel Combinational Binary Neural Networks for Edge Processing, Elektrotehniški vestnik, vol. 86, no. 1-2, pp. 47-53, 2019"
The original UNSW-NB15 dataaset is by:
Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
This dataset was created by Mahdi Mesfar
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset is an extended version of UNSW-NB 15. It has 1 additional class synthesised and the data is normalised for ease of use. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here: https://www.researchgate.net/publication/382034618_Blender-GAN_Multi-Target_Conditional_Generative_Adversarial_Network_for_Novel_Class_Synthetic_Data_Generation. Citation info: Madhubalan… See the full description on the dataset page: https://huggingface.co/datasets/abluva/UNSW-NB15-V3.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
```
If you are using our tool or dataset, kindly cite our related paper which outlines the details of the tools and its processing.
This dataset was created by Saba898
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets provide packet-level labeling of the payloads in the CIC-IDS-2017 and UNSW-NB15 network intrusion detection datasets. A full discussion of the data processing can be found in our Transactions on Machine Learning Research journal paper SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection. Code for additional processing and experimentation can be found here. The UNSW-NB15 dataset contains over 50 million non-empty payloads coming from nine attack classes with benign background traffic. The CIC-IDS-2017 dataset contains over 30 million non-empty payloads coming from fourteen attack classes with benign background traffic. Both datasets are highly imbalanced, with 20-25x more benign packets than malicious ones.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 or NSL-KDD.
This dataset was created by lengxingxin
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets created by Single Flow Time Series Analysis
Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.
This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf
In the following table is a description of each dataset file:
File name | Detection problem | Citation of original raw dataset |
botnet_binary.csv | Binary detection of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
botnet_multiclass.csv | Multi-class classification of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
cryptomining_design.csv | Binary detection of cryptomining; the design part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
cryptomining_evaluation.csv | Binary detection of cryptomining; the evaluation part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
dns_malware.csv | Binary detection of malware DNS | Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021. |
doh_cic.csv | Binary detection of DoH |
Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020 |
doh_real_world.csv | Binary detection of DoH | Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022 |
dos.csv | Binary detection of DoS | Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019. |
edge_iiot_binary.csv | Binary detection of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
edge_iiot_multiclass.csv | Multi-class classification of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
https_brute_force.csv | Binary detection of HTTPS Brute Force | Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020 |
ids_cic_binary.csv | Binary detection of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_cic_multiclass.csv | Multi-class classification of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_unsw_nb_15_binary.csv | Binary detection of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
ids_unsw_nb_15_multiclass.csv | Multi-class classification of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
iot_23.csv | Binary detection of IoT malware | Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23 |
ton_iot_binary.csv | Binary detection of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
ton_iot_multiclass.csv | Multi-class classification of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
tor_binary.csv | Binary detection of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
tor_multiclass.csv | Multi-class classification of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
vpn_iscx_binary.csv | Binary detection of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_iscx_multiclass.csv | Multi-class classification of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_vnat_binary.csv | Binary detection of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
vpn_vnat_multiclass.csv | Multi-class classification of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
This dataset was created by Galan Ramadan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the detection performance of different classification methods and oversampling methods with the strategy proposed on the UNSW-NB15 dataset.
This dataset was created by bahar n
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intrusion detection plays a significant role in the provision of information security. The most critical element is the ability to precisely identify different types of intrusions into the network. However, the detection of intrusions poses a important challenge, as many new types of intrusion are now generated by cyber-attackers every day. A robust system is still elusive, despite the various strategies that have been proposed in recent years. Hence, a novel deep-learning-based architecture for detecting intrusions into a computer network is proposed in this paper. The aim is to construct a hybrid system that enhances the efficiency and accuracy of intrusion detection. The main contribution of our work is a novel deep learning-based hybrid architecture in which PSO is used for hyperparameter optimisation and three well-known pre-trained network models are combined in an optimised way. The suggested method involves six key stages: data gathering, pre-processing, deep neural network (DNN) architecture design, optimisation of hyperparameters, training, and evaluation of the trained DNN. To verify the superiority of the suggested method over alternative state-of-the-art schemes, it was evaluated on the KDDCUP’99, NSL-KDD and UNSW-NB15 datasets. Our empirical findings show that the proposed model successfully and correctly classifies different types of attacks with 82.44%, 90.42% and 93.55% accuracy values obtained on UNSW-B15, NSL-KDD and KDDCUP’99 datasets, respectively, and outperforms alternative schemes in the literature.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In response to the rapidly evolving threat landscape in network security, this paper proposes an Evolutionary Machine Learning Algorithm designed for robust intrusion detection. We specifically address challenges such as adaptability to new threats and scalability across diverse network environments. Our approach is validated using two distinct datasets: BoT-IoT, reflecting a range of IoT-specific attacks, and UNSW-NB15, offering a broader context of network intrusion scenarios using GA based hybrid DT-SVM. This selection facilitates a comprehensive evaluation of the algorithm’s effectiveness across varying attack vectors. Performance metrics including accuracy, recall, and false positive rates are meticulously chosen to demonstrate the algorithm’s capability to accurately identify and adapt to both known and novel threats, thereby substantiating the algorithm’s potential as a scalable and adaptable security solution. This study aims to advance the development of intrusion detection systems that are not only reactive but also preemptively adaptive to emerging cyber threats.” During the feature selection step, a GA is used to discover and preserve the most relevant characteristics from the dataset by using evolutionary principles. Through the use of this technology based on genetic algorithms, the subset of features is optimised, enabling the subsequent classification model to focus on the most relevant components of network data. In order to accomplish this, DT-SVM classification and GA-driven feature selection are integrated in an effort to strike a balance between efficiency and accuracy. The system has been purposefully designed to efficiently handle data streams in real-time, ensuring that intrusions are promptly and precisely detected. The empirical results corroborate the study’s assertion that the IDS outperforms traditional methodologies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay.
Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page.
The full dataset also offers the pcap, BRO and Argus files along with additional documentation.
V1: Original CSVs obtained from here V2: Cleaning -> parquet V3: Reorganize to save storage, only keep original CSVs in V1/V2 V4: Update to remove contaminating features [presentation] & [conference article]
My modifications to the predesignated train-test sets are minimal and designed to decrease disk storage and increase performance & reliability.
In its current iteration, the dataset can be loaded trivially with pd.read_parquet()
. All data types are already set correctly and there are 0 records with missing information.
Reading parquet files does require fastparquet and / or pyarrow
Exploratory Data Analysis (EDA) through classification with very simple models to .877 AUROC.