20 datasets found
  1. UNSW-NB15

    • kaggle.com
    Updated Sep 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    StrGenIx | Laurens D'hooge (2024). UNSW-NB15 [Dataset]. http://doi.org/10.34740/kaggle/dsv/9350725
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    StrGenIx | Laurens D'hooge
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay.

    Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page.

    The full dataset also offers the pcap, BRO and Argus files along with additional documentation.

    V1: Original CSVs obtained from here V2: Cleaning -> parquet V3: Reorganize to save storage, only keep original CSVs in V1/V2 V4: Update to remove contaminating features [presentation] & [conference article]

    My modifications to the predesignated train-test sets are minimal and designed to decrease disk storage and increase performance & reliability.

    In its current iteration, the dataset can be loaded trivially with pd.read_parquet(). All data types are already set correctly and there are 0 records with missing information. Reading parquet files does require fastparquet and / or pyarrow

    Exploratory Data Analysis (EDA) through classification with very simple models to .877 AUROC.

  2. Z

    The UNSW-NB15 dataset with binarized features

    • data.niaid.nih.gov
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umuroglu, Yaman (2021). The UNSW-NB15 dataset with binarized features [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4519766
    Explore at:
    Dataset updated
    Feb 9, 2021
    Dataset authored and provided by
    Umuroglu, Yaman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Binarized version of the UNSW-NB15 dataset, where the original features (a mix of strings, categorical values, floating point values etc) are converted to a bit string of 593 bits. Each value in each feature is either 0 or 1, stored as a uint8 value. The uint8 values are represented as numpy arrays, provided separately for training and test data (same train/test split as the original dataset is used). The final binary value in each sample is the expected output.

    Among others, this dataset has been used for quantized neural network research:

    Umuroglu, Y., Akhauri, Y., Fraser, N. J., & Blott, M. (2020, August). LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL) (pp. 291-297). IEEE.

    The method for binarization is identical to the one described in 10.5281/zenodo.3258657 :

    "T. Murovič, A. Trost, Massively Parallel Combinational Binary Neural Networks for Edge Processing, Elektrotehniški vestnik, vol. 86, no. 1-2, pp. 47-53, 2019"

    The original UNSW-NB15 dataaset is by:

    Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.

  3. UNSW-NB15-Dataset

    • kaggle.com
    zip
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi Mesfar (2024). UNSW-NB15-Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimesfar0123/unsw-nb15-dataset/data
    Explore at:
    zip(164483690 bytes)Available download formats
    Dataset updated
    Oct 18, 2024
    Authors
    Mahdi Mesfar
    Description

    Dataset

    This dataset was created by Mahdi Mesfar

    Contents

  4. h

    UNSW-NB15-V3

    • huggingface.co
    • dataverse.harvard.edu
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abluva Inc (2024). UNSW-NB15-V3 [Dataset]. https://huggingface.co/datasets/abluva/UNSW-NB15-V3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2024
    Dataset authored and provided by
    Abluva Inc
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The dataset is an extended version of UNSW-NB 15. It has 1 additional class synthesised and the data is normalised for ease of use. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here: https://www.researchgate.net/publication/382034618_Blender-GAN_Multi-Target_Conditional_Generative_Adversarial_Network_for_Novel_Class_Synthetic_Data_Generation. Citation info: Madhubalan… See the full description on the dataset page: https://huggingface.co/datasets/abluva/UNSW-NB15-V3.

  5. UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

    • kaggle.com
    Updated Sep 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasir-Ali (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/4170054
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yasir-Ali
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

    https://github.com/Yasir-ali-farrukh/Payload-Byte

    You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

    @article{Payload, 
    author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian", 
    title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}", 
    year = "2022", 
    month = "9", 
    url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221", 
    doi = "10.36227/techrxiv.20714221.v1" 
    }
    ```
    
    If you are using our tool or dataset, kindly cite our related paper which outlines the details of the tools and its processing.
    
  6. UNSW-NB15

    • kaggle.com
    Updated May 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saba898 (2024). UNSW-NB15 [Dataset]. https://www.kaggle.com/datasets/saba898/unsw-nb15/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saba898
    Description

    Dataset

    This dataset was created by Saba898

    Contents

  7. SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network...

    • zenodo.org
    tar
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha (2025). SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.15046995
    Explore at:
    tarAvailable download formats
    Dataset updated
    Mar 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These datasets provide packet-level labeling of the payloads in the CIC-IDS-2017 and UNSW-NB15 network intrusion detection datasets. A full discussion of the data processing can be found in our Transactions on Machine Learning Research journal paper SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection. Code for additional processing and experimentation can be found here. The UNSW-NB15 dataset contains over 50 million non-empty payloads coming from nine attack classes with benign background traffic. The CIC-IDS-2017 dataset contains over 30 million non-empty payloads coming from fourteen attack classes with benign background traffic. Both datasets are highly imbalanced, with 20-25x more benign packets than malicious ones.

  8. f

    Features of reduced datasets.

    • plos.figshare.com
    xls
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). Features of reduced datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.

  9. f

    MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...

    • plos.figshare.com
    xls
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 or NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 or NSL-KDD.

  10. unsw-nb15-20000.json

    • kaggle.com
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lengxingxin (2024). unsw-nb15-20000.json [Dataset]. https://www.kaggle.com/datasets/lengxingxin/unsw-nb15-20000-json/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    lengxingxin
    Description

    Dataset

    This dataset was created by lengxingxin

    Contents

  11. f

    LSTM model parameters.

    • plos.figshare.com
    xls
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi (2024). LSTM model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0295801.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Chadia E. L. Asry; Ibtissam Benchaji; Samira Douzi; Bouabid E. L. Ouahidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The escalating prevalence of cybersecurity risks calls for a focused strategy in order to attain efficient resolutions. This study introduces a detection model that employs a tailored methodology integrating feature selection using SHAP values, a shallow learning algorithm called PV-DM, and machine learning classifiers like XGBOOST. The efficacy of our suggested methodology is highlighted by employing the NSL-KDD and UNSW-NB15 datasets. Our approach in the NSL-KDD dataset exhibits exceptional performance, with an accuracy of 98.92%, precision of 98.92%, recall of 95.44%, and an F1-score of 96.77%. Notably, this performance is achieved by utilizing only four characteristics, indicating the efficiency of our approach. The proposed methodology achieves an accuracy of 82.86%, precision of 84.07%, recall of 77.70%, and an F1-score of 80.20% in the UNSW-NB15 dataset, using only six features. Our research findings provide substantial evidence of the enhanced performance of the proposed model compared to a traditional deep-learning model across all performance metrics.

  12. Network traffic datasets created by Single Flow Time Series Analysis

    • zenodo.org
    • explore.openaire.eu
    csv, pdf
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets created by Single Flow Time Series Analysis

    Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

    J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

    In the following table is a description of each dataset file:

    File nameDetection problemCitation of original raw dataset
    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    cryptomining_design.csvBinary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
    doh_cic.csv Binary detection of DoH

    Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    edge_iiot_multiclass.csvMulti-class classification of IoT malwareMohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    https_brute_force.csvBinary detection of HTTPS Brute ForceJan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
    ids_cic_binary.csvBinary detection of intrusion in IDSIman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
    vpn_vnat_multiclass.csvMulti-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

  13. UNSW-NB15

    • kaggle.com
    zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Galan Ramadan (2024). UNSW-NB15 [Dataset]. https://www.kaggle.com/datasets/galanramadan/unsw-nb15/discussion
    Explore at:
    zip(156257637 bytes)Available download formats
    Dataset updated
    Nov 4, 2024
    Authors
    Galan Ramadan
    Description

    Dataset

    This dataset was created by Galan Ramadan

    Contents

  14. f

    Comparison of the detection performance of different classification methods...

    • figshare.com
    xls
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guangyu Zhao; Peng Liu; Ke Sun; Yang Yang; Tianyu Lan; Han Yang (2023). Comparison of the detection performance of different classification methods and oversampling methods with the strategy proposed on the UNSW-NB15 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0291750.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Guangyu Zhao; Peng Liu; Ke Sun; Yang Yang; Tianyu Lan; Han Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the detection performance of different classification methods and oversampling methods with the strategy proposed on the UNSW-NB15 dataset.

  15. unsw_nb15_mydata

    • kaggle.com
    zip
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bahar n (2025). unsw_nb15_mydata [Dataset]. https://www.kaggle.com/datasets/baharn/unsw-nb15-mydata/code
    Explore at:
    zip(12487688 bytes)Available download formats
    Dataset updated
    Jan 15, 2025
    Authors
    bahar n
    Description

    Dataset

    This dataset was created by bahar n

    Contents

  16. f

    The evaluation metrics formulas.

    • plos.figshare.com
    bin
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah Asım Yılmaz (2025). The evaluation metrics formulas. [Dataset]. http://doi.org/10.1371/journal.pone.0316253.t006
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Abdullah Asım Yılmaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Intrusion detection plays a significant role in the provision of information security. The most critical element is the ability to precisely identify different types of intrusions into the network. However, the detection of intrusions poses a important challenge, as many new types of intrusion are now generated by cyber-attackers every day. A robust system is still elusive, despite the various strategies that have been proposed in recent years. Hence, a novel deep-learning-based architecture for detecting intrusions into a computer network is proposed in this paper. The aim is to construct a hybrid system that enhances the efficiency and accuracy of intrusion detection. The main contribution of our work is a novel deep learning-based hybrid architecture in which PSO is used for hyperparameter optimisation and three well-known pre-trained network models are combined in an optimised way. The suggested method involves six key stages: data gathering, pre-processing, deep neural network (DNN) architecture design, optimisation of hyperparameters, training, and evaluation of the trained DNN. To verify the superiority of the suggested method over alternative state-of-the-art schemes, it was evaluated on the KDDCUP’99, NSL-KDD and UNSW-NB15 datasets. Our empirical findings show that the proposed model successfully and correctly classifies different types of attacks with 82.44%, 90.42% and 93.55% accuracy values obtained on UNSW-B15, NSL-KDD and KDDCUP’99 datasets, respectively, and outperforms alternative schemes in the literature.

  17. f

    UNSWB15 multi-classification results.

    • plos.figshare.com
    xls
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Tawfik (2024). UNSWB15 multi-classification results. [Dataset]. http://doi.org/10.1371/journal.pone.0304082.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Mohammed Tawfik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.

  18. f

    Evolution parameters.

    • plos.figshare.com
    xls
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankita Sharma; Shalli Rani; Maha Driss (2024). Evolution parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0308206.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 12, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Ankita Sharma; Shalli Rani; Maha Driss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In response to the rapidly evolving threat landscape in network security, this paper proposes an Evolutionary Machine Learning Algorithm designed for robust intrusion detection. We specifically address challenges such as adaptability to new threats and scalability across diverse network environments. Our approach is validated using two distinct datasets: BoT-IoT, reflecting a range of IoT-specific attacks, and UNSW-NB15, offering a broader context of network intrusion scenarios using GA based hybrid DT-SVM. This selection facilitates a comprehensive evaluation of the algorithm’s effectiveness across varying attack vectors. Performance metrics including accuracy, recall, and false positive rates are meticulously chosen to demonstrate the algorithm’s capability to accurately identify and adapt to both known and novel threats, thereby substantiating the algorithm’s potential as a scalable and adaptable security solution. This study aims to advance the development of intrusion detection systems that are not only reactive but also preemptively adaptive to emerging cyber threats.” During the feature selection step, a GA is used to discover and preserve the most relevant characteristics from the dataset by using evolutionary principles. Through the use of this technology based on genetic algorithms, the subset of features is optimised, enabling the subsequent classification model to focus on the most relevant components of network data. In order to accomplish this, DT-SVM classification and GA-driven feature selection are integrated in an effort to strike a balance between efficiency and accuracy. The system has been purposefully designed to efficiently handle data streams in real-time, ensuring that intrusions are promptly and precisely detected. The empirical results corroborate the study’s assertion that the IDS outperforms traditional methodologies.

  19. f

    Comparison with other methods.

    • plos.figshare.com
    xls
    Updated Aug 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Tawfik (2024). Comparison with other methods. [Dataset]. http://doi.org/10.1371/journal.pone.0304082.t015
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Mohammed Tawfik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.

  20. f

    AWID multi-classification results.

    • plos.figshare.com
    xls
    Updated Aug 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Tawfik (2024). AWID multi-classification results. [Dataset]. http://doi.org/10.1371/journal.pone.0304082.t012
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Mohammed Tawfik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The proliferation of Internet of Things (IoT) devices and fog computing architectures has introduced major security and cyber threats. Intrusion detection systems have become effective in monitoring network traffic and activities to identify anomalies that are indicative of attacks. However, constraints such as limited computing resources at fog nodes render conventional intrusion detection techniques impractical. This paper proposes a novel framework that integrates stacked autoencoders, CatBoost, and an optimised transformer-CNN-LSTM ensemble tailored for intrusion detection in fog and IoT networks. Autoencoders extract robust features from high-dimensional traffic data while reducing the dimensionality of the efficiency at fog nodes. CatBoost refines features through predictive selection. The ensemble model combines self-attention, convolutions, and recurrence for comprehensive traffic analysis in the cloud. Evaluations of the NSL-KDD, UNSW-NB15, and AWID benchmarks demonstrate an accuracy of over 99% in detecting threats across traditional, hybrid enterprises and wireless environments. Integrated edge preprocessing and cloud-based ensemble learning pipelines enable efficient and accurate anomaly detection. The results highlight the viability of securing real-world fog and the IoT infrastructure against continuously evolving cyber-attacks.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
StrGenIx | Laurens D'hooge (2024). UNSW-NB15 [Dataset]. http://doi.org/10.34740/kaggle/dsv/9350725
Organization logo

UNSW-NB15

Network Intrusion Detection, ISG group @UNSW Canberra

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
StrGenIx | Laurens D'hooge
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay.

Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page.

The full dataset also offers the pcap, BRO and Argus files along with additional documentation.

V1: Original CSVs obtained from here V2: Cleaning -> parquet V3: Reorganize to save storage, only keep original CSVs in V1/V2 V4: Update to remove contaminating features [presentation] & [conference article]

My modifications to the predesignated train-test sets are minimal and designed to decrease disk storage and increase performance & reliability.

In its current iteration, the dataset can be loaded trivially with pd.read_parquet(). All data types are already set correctly and there are 0 records with missing information. Reading parquet files does require fastparquet and / or pyarrow

Exploratory Data Analysis (EDA) through classification with very simple models to .877 AUROC.

Search
Clear search
Close search
Google apps
Main menu