20 datasets found

UNSW-NB15
kaggle.com
Updated Sep 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
StrGenIx | Laurens D'hooge (2024). UNSW-NB15 [Dataset]. http://doi.org/10.34740/kaggle/dsv/9350725
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9350725
Dataset updated
Sep 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
StrGenIx | Laurens D'hooge
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay.

Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page.

The full dataset also offers the pcap, BRO and Argus files along with additional documentation.

V1: Original CSVs obtained from here V2: Cleaning -> parquet V3: Reorganize to save storage, only keep original CSVs in V1/V2 V4: Update to remove contaminating features [presentation] & [conference article]

My modifications to the predesignated train-test sets are minimal and designed to decrease disk storage and increase performance & reliability.

In its current iteration, the dataset can be loaded trivially with pd.read_parquet(). All data types are already set correctly and there are 0 records with missing information. Reading parquet files does require fastparquet and / or pyarrow

Exploratory Data Analysis (EDA) through classification with very simple models to .877 AUROC.
Z
The UNSW-NB15 dataset with binarized features
data.niaid.nih.gov
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umuroglu, Yaman (2021). The UNSW-NB15 dataset with binarized features [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4519766
Explore at:
Dataset updated
Feb 9, 2021
Dataset authored and provided by
Umuroglu, Yaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Binarized version of the UNSW-NB15 dataset, where the original features (a mix of strings, categorical values, floating point values etc) are converted to a bit string of 593 bits. Each value in each feature is either 0 or 1, stored as a uint8 value. The uint8 values are represented as numpy arrays, provided separately for training and test data (same train/test split as the original dataset is used). The final binary value in each sample is the expected output.

Among others, this dataset has been used for quantized neural network research:

Umuroglu, Y., Akhauri, Y., Fraser, N. J., & Blott, M. (2020, August). LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL) (pp. 291-297). IEEE.

The method for binarization is identical to the one described in 10.5281/zenodo.3258657 :

"T. Murovič, A. Trost, Massively Parallel Combinational Binary Neural Networks for Edge Processing, Elektrotehniški vestnik, vol. 86, no. 1-2, pp. 47-53, 2019"

The original UNSW-NB15 dataaset is by:

Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
UNSW-NB15-Dataset
kaggle.com
zip
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahdi Mesfar (2024). UNSW-NB15-Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimesfar0123/unsw-nb15-dataset/data
Explore at:
zip(164483690 bytes)Available download formats
Dataset updated
Oct 18, 2024
Authors
Mahdi Mesfar
Description
Dataset

This dataset was created by Mahdi Mesfar

Contents
h
UNSW-NB15-V3
huggingface.co
dataverse.harvard.edu
Updated Nov 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abluva Inc (2024). UNSW-NB15-V3 [Dataset]. https://huggingface.co/datasets/abluva/UNSW-NB15-V3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 27, 2024
Dataset authored and provided by
Abluva Inc
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset is an extended version of UNSW-NB 15. It has 1 additional class synthesised and the data is normalised for ease of use. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here: https://www.researchgate.net/publication/382034618_Blender-GAN_Multi-Target_Conditional_Generative_Adversarial_Network_for_Novel_Class_Synthetic_Data_Generation. Citation info: Madhubalan… See the full description on the dataset page: https://huggingface.co/datasets/abluva/UNSW-NB15-V3.
UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data
kaggle.com
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasir-Ali (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/4170054
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4170054
Dataset updated
Sep 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yasir-Ali
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

https://github.com/Yasir-ali-farrukh/Payload-Byte

You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

@article{Payload, author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian", title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}", year = "2022", month = "9", url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221", doi = "10.36227/techrxiv.20714221.v1" } ``` If you are using our tool or dataset, kindly cite our related paper which outlines the details of the tools and its processing.
UNSW-NB15
kaggle.com
Updated May 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saba898 (2024). UNSW-NB15 [Dataset]. https://www.kaggle.com/datasets/saba898/unsw-nb15/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saba898
Description
Dataset

This dataset was created by Saba898

Contents
SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network...
zenodo.org
tar
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha (2025). SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.15046995
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15046995
Dataset updated
Mar 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These datasets provide packet-level labeling of the payloads in the CIC-IDS-2017 and UNSW-NB15 network intrusion detection datasets. A full discussion of the data processing can be found in our Transactions on Machine Learning Research journal paper SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection. Code for additional processing and experimentation can be found here. The UNSW-NB15 dataset contains over 50 million non-empty payloads coming from nine attack classes with benign background traffic. The CIC-IDS-2017 dataset contains over 30 million non-empty payloads coming from fourteen attack classes with benign background traffic. Both datasets are highly imbalanced, with 20-25x more benign packets than malicious ones.
f
Features of reduced datasets.
plos.figshare.com
xls
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). Features of reduced datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295801.t004
Dataset updated
Jan 24, 2024
Dataset provided by
PLOS ONE
Authors
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.
f
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...
plos.figshare.com
xls
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 or NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t011
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t011
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 or NSL-KDD.
unsw-nb15-20000.json
kaggle.com
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lengxingxin (2024). unsw-nb15-20000.json [Dataset]. https://www.kaggle.com/datasets/lengxingxin/unsw-nb15-20000-json/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 17, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
lengxingxin
Description
Dataset

This dataset was created by lengxingxin

Contents
f
LSTM model parameters.
plos.figshare.com
xls
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). LSTM model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295801.t007
Dataset updated
Jan 24, 2024
Dataset provided by
PLOS ONE
Authors
Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.

Network traffic datasets created by Single Flow Time Series Analysis

zenodo.org
explore.openaire.eu

csv, pdf

Updated Jul 11, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724

Explore at:

csv, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8035724

Dataset updated

Jul 11, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Network traffic datasets created by Single Flow Time Series Analysis

Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

In the following table is a description of each dataset file:

File name	Detection problem	Citation of original raw dataset
botnet_binary.csv	Binary detection of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv	Multi-class classification of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv	Binary detection of cryptomining; the design part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv	Binary detection of cryptomining; the evaluation part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv	Binary detection of malware DNS	Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv	Binary detection of DoH	Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv	Binary detection of DoH	Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv	Binary detection of DoS	Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv	Binary detection of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv	Multi-class classification of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv	Binary detection of HTTPS Brute Force	Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv	Binary detection of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv	Multi-class classification of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_unsw_nb_15_binary.csv	Binary detection of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
ids_unsw_nb_15_multiclass.csv	Multi-class classification of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv	Binary detection of IoT malware	Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv	Binary detection of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv	Multi-class classification of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
tor_binary.csv	Binary detection of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
tor_multiclass.csv	Multi-class classification of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
vpn_iscx_binary.csv	Binary detection of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_iscx_multiclass.csv	Multi-class classification of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_vnat_binary.csv	Binary detection of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
vpn_vnat_multiclass.csv	Multi-class classification of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

UNSW-NB15
kaggle.com
zip
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Galan Ramadan (2024). UNSW-NB15 [Dataset]. https://www.kaggle.com/datasets/galanramadan/unsw-nb15/discussion
Explore at:
zip(156257637 bytes)Available download formats
Dataset updated
Nov 4, 2024
Authors
Galan Ramadan
Description
Dataset

This dataset was created by Galan Ramadan

Contents
f
Comparison of the detection performance of different classification methods...
figshare.com
xls
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guangyu Zhao; Peng Liu; Ke Sun; Yang Yang; Tianyu Lan; Han Yang (2023). Comparison of the detection performance of different classification methods and oversampling methods with the strategy proposed on the UNSW-NB15 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0291750.t011
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291750.t011
Dataset updated
Oct 10, 2023
Dataset provided by
PLOS ONE
Authors
Guangyu Zhao; Peng Liu; Ke Sun; Yang Yang; Tianyu Lan; Han Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the detection performance of different classification methods and oversampling methods with the strategy proposed on the UNSW-NB15 dataset.
unsw_nb15_mydata
kaggle.com
zip
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bahar n (2025). unsw_nb15_mydata [Dataset]. https://www.kaggle.com/datasets/baharn/unsw-nb15-mydata/code
Explore at:
zip(12487688 bytes)Available download formats
Dataset updated
Jan 15, 2025
Authors
bahar n
Description
Dataset

This dataset was created by bahar n

Contents
f
The evaluation metrics formulas.
plos.figshare.com
bin
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah Asım Yılmaz (2025). The evaluation metrics formulas. [Dataset]. http://doi.org/10.1371/journal.pone.0316253.t006
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316253.t006
Dataset updated
Feb 12, 2025
Dataset provided by
PLOS ONE
Authors
Abdullah Asım Yılmaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Intrusion detection plays a significant role in the provision of information security. The most critical element is the ability to precisely identify different types of intrusions into the network. However, the detection of intrusions poses a important challenge, as many new types of intrusion are now generated by cyber-attackers every day. A robust system is still elusive, despite the various strategies that have been proposed in recent years. Hence, a novel deep-learning-based architecture for detecting intrusions into a computer network is proposed in this paper. The aim is to construct a hybrid system that enhances the efficiency and accuracy of intrusion detection. The main contribution of our work is a novel deep learning-based hybrid architecture in which PSO is used for hyperparameter optimisation and three well-known pre-trained network models are combined in an optimised way. The suggested method involves six key stages: data gathering, pre-processing, deep neural network (DNN) architecture design, optimisation of hyperparameters, training, and evaluation of the trained DNN. To verify the superiority of the suggested method over alternative state-of-the-art schemes, it was evaluated on the KDDCUP’99, NSL-KDD and UNSW-NB15 datasets. Our empirical findings show that the proposed model successfully and correctly classifies different types of attacks with 82.44%, 90.42% and 93.55% accuracy values obtained on UNSW-B15, NSL-KDD and KDDCUP’99 datasets, respectively, and outperforms alternative schemes in the literature.
f
UNSWB15 multi-classification results.
plos.figshare.com
xls
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Tawfik (2024). UNSWB15 multi-classification results. [Dataset]. http://doi.org/10.1371/journal.pone.0304082.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304082.t010
Dataset updated
Aug 1, 2024
Dataset provided by
PLOS ONE
Authors
Mohammed Tawfik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.
f
Evolution parameters.
plos.figshare.com
xls
Updated Sep 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankita Sharma; Shalli Rani; Maha Driss (2024). Evolution parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0308206.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0308206.t005
Dataset updated
Sep 12, 2024
Dataset provided by
PLOS ONE
Authors
Ankita Sharma; Shalli Rani; Maha Driss
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In response to the rapidly evolving threat landscape in network security, this paper proposes an Evolutionary Machine Learning Algorithm designed for robust intrusion detection. We specifically address challenges such as adaptability to new threats and scalability across diverse network environments. Our approach is validated using two distinct datasets: BoT-IoT, reflecting a range of IoT-specific attacks, and UNSW-NB15, offering a broader context of network intrusion scenarios using GA based hybrid DT-SVM. This selection facilitates a comprehensive evaluation of the algorithm’s effectiveness across varying attack vectors. Performance metrics including accuracy, recall, and false positive rates are meticulously chosen to demonstrate the algorithm’s capability to accurately identify and adapt to both known and novel threats, thereby substantiating the algorithm’s potential as a scalable and adaptable security solution. This study aims to advance the development of intrusion detection systems that are not only reactive but also preemptively adaptive to emerging cyber threats.” During the feature selection step, a GA is used to discover and preserve the most relevant characteristics from the dataset by using evolutionary principles. Through the use of this technology based on genetic algorithms, the subset of features is optimised, enabling the subsequent classification model to focus on the most relevant components of network data. In order to accomplish this, DT-SVM classification and GA-driven feature selection are integrated in an effort to strike a balance between efficiency and accuracy. The system has been purposefully designed to efficiently handle data streams in real-time, ensuring that intrusions are promptly and precisely detected. The empirical results corroborate the study’s assertion that the IDS outperforms traditional methodologies.
f
Comparison with other methods.
plos.figshare.com
xls
Updated Aug 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Tawfik (2024). Comparison with other methods. [Dataset]. http://doi.org/10.1371/journal.pone.0304082.t015
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304082.t015
Dataset updated
Aug 1, 2024
Dataset provided by
PLOS ONE
Authors
Mohammed Tawfik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.
f
AWID multi-classification results.
plos.figshare.com
xls
Updated Aug 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Tawfik (2024). AWID multi-classification results. [Dataset]. http://doi.org/10.1371/journal.pone.0304082.t012
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304082.t012
Dataset updated
Aug 1, 2024
Dataset provided by
PLOS ONE
Authors
Mohammed Tawfik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

StrGenIx | Laurens D'hooge (2024). UNSW-NB15 [Dataset]. http://doi.org/10.34740/kaggle/dsv/9350725

UNSW-NB15

Network Intrusion Detection, ISG group @UNSW Canberra

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/9350725

Dataset updated

Sep 9, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

StrGenIx | Laurens D'hooge

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay.

Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page.

The full dataset also offers the pcap, BRO and Argus files along with additional documentation.

V1: Original CSVs obtained from here V2: Cleaning -> parquet V3: Reorganize to save storage, only keep original CSVs in V1/V2 V4: Update to remove contaminating features [presentation] & [conference article]

My modifications to the predesignated train-test sets are minimal and designed to decrease disk storage and increase performance & reliability.

In its current iteration, the dataset can be loaded trivially with pd.read_parquet(). All data types are already set correctly and there are 0 records with missing information. Reading parquet files does require fastparquet and / or pyarrow

Exploratory Data Analysis (EDA) through classification with very simple models to .877 AUROC.

Clear search

Close search

Google apps

Main menu

UNSW-NB15

The UNSW-NB15 dataset with binarized features

UNSW-NB15-Dataset

Dataset

Contents

UNSW-NB15-V3

UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

UNSW-NB15

Dataset

Contents

SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network...

Features of reduced datasets.

MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...

unsw-nb15-20000.json

Dataset

Contents

LSTM model parameters.

Network traffic datasets created by Single Flow Time Series Analysis

UNSW-NB15

Dataset

Contents

Comparison of the detection performance of different classification methods...

unsw_nb15_mydata

Dataset

Contents

The evaluation metrics formulas.

UNSWB15 multi-classification results.

Evolution parameters.

Comparison with other methods.

AWID multi-classification results.

UNSW-NB15See More Versions

Network Intrusion Detection, ISG group @UNSW Canberra

UNSW-NB15