56 datasets found
  1. i

    CICIDS2017

    • ieee-dataport.org
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haolei Chen (2025). CICIDS2017 [Dataset]. https://ieee-dataport.org/documents/cicids2017
    Explore at:
    Dataset updated
    Jul 21, 2025
    Authors
    Haolei Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS.

  2. i

    CICIDS2017 and UNBSW-NB15

    • ieee-dataport.org
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xinpeng chen (2023). CICIDS2017 and UNBSW-NB15 [Dataset]. https://ieee-dataport.org/documents/cicids2017-and-unbsw-nb15
    Explore at:
    Dataset updated
    Dec 13, 2023
    Authors
    xinpeng chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DoS

  3. h

    cyberbert_dataset

    • huggingface.co
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaitany Agrawal (2025). cyberbert_dataset [Dataset]. https://huggingface.co/datasets/agrawalchaitany/cyberbert_dataset
    Explore at:
    Dataset updated
    Apr 10, 2025
    Authors
    Chaitany Agrawal
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Cleaned CICIDS2017 Dataset

    This dataset is a cleaned and preprocessed version of the CICIDS2017 dataset created by the Canadian Institute for Cybersecurity, University of New Brunswick.

      Modifications
    

    Removed duplicate records Normalized feature names Filtered specific attack types Piviot the different attack data into single dataset

      Source
    

    Original dataset: CICIDS2017

      License & Citation
    

    This dataset is provided for research purposes. Please refer… See the full description on the dataset page: https://huggingface.co/datasets/agrawalchaitany/cyberbert_dataset.

  4. CIC-IDS-2017 V2

    • zenodo.org
    zip
    Updated Nov 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshayraj Madhubalan; Akshayraj Madhubalan; Amit Gautam; Amit Gautam; Priya Tiwary; Priya Tiwary (2024). CIC-IDS-2017 V2 [Dataset]. http://doi.org/10.5281/zenodo.10141593
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Akshayraj Madhubalan; Akshayraj Madhubalan; Amit Gautam; Amit Gautam; Priya Tiwary; Priya Tiwary
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CIC-IDS-V2 is an extended version of the original CIC-IDS 2017 dataset. The dataset is normalised and 1 new class called "Comb" is added which is a combination of synthesised data of multiple non-benign classes.

    To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here.

    Citation info:

    Madhubalan, Akshayraj & Gautam, Amit & Tiwary, Priya. (2024). Blender-GAN: Multi-Target Conditional Generative Adversarial Network for Novel Class Synthetic Data Generation. 1-7. 10.1109/SmartNets61466.2024.10577645.

    This dataset was made by Abluva Inc, a Palo Alto based, research-driven Data Protection firm. Our data protection platform empowers customers to secure data through advanced security mechanisms such as Fine Grained Access control and sophisticated depersonalization algorithms (e.g. Pseudonymization, Anonymization and Randomization). Abluva's Data Protection solutions facilitate data democratization within and outside the organizations, mitigating the concerns related to theft and compliance. The innovative intrusion detection algorithm by Abluva employs patented technologies for an intricately balanced approach that excludes normal access deviations, ensuring intrusion detection without disrupting the business operations. Abluva’s Solution enables organizations to extract further value from their data by enabling secure Knowledge Graphs and deploying Secure Data as a Service among other novel uses of data. Committed to providing a safe and secure environment, Abluva empowers organizations to unlock the full potential of their data.

  5. f

    Detection results on the CICIDS2017 dataset (K = 10).

    • plos.figshare.com
    xls
    Updated Jul 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Donghui Li; Zihao Liu; Jun Yang; Qinfeng Shen; Ningbing Tong (2025). Detection results on the CICIDS2017 dataset (K = 10). [Dataset]. http://doi.org/10.1371/journal.pone.0327161.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Congyuan Xu; Donghui Li; Zihao Liu; Jun Yang; Qinfeng Shen; Ningbing Tong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Detection results on the CICIDS2017 dataset (K = 10).

  6. h

    CIC-IDS-2017

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    makekali, CIC-IDS-2017 [Dataset]. https://huggingface.co/datasets/makekali/CIC-IDS-2017
    Explore at:
    Authors
    makekali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    makekali/CIC-IDS-2017 dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. cicids2017

    • kaggle.com
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohaned Mohammed Naji (2025). cicids2017 [Dataset]. https://www.kaggle.com/datasets/mohanedmohammednaji/cicids2017/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohaned Mohammed Naji
    Description

    Dataset

    This dataset was created by Mohaned Mohammed Naji

    Contents

  8. f

    Model performance results based on the CICIDS2017 dataset.

    • plos.figshare.com
    xls
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi (2025). Model performance results based on the CICIDS2017 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0327137.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model performance results based on the CICIDS2017 dataset.

  9. h

    CIC-IDS-2017

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fikri Mulyana Setiawan, CIC-IDS-2017 [Dataset]. https://huggingface.co/datasets/fikrimulyana/CIC-IDS-2017
    Explore at:
    Authors
    Fikri Mulyana Setiawan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    fikrimulyana/CIC-IDS-2017 dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    cic-ids-2017-textual

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiva Prasad Gyawali, cic-ids-2017-textual [Dataset]. https://huggingface.co/datasets/gyawalishiva/cic-ids-2017-textual
    Explore at:
    Authors
    Shiva Prasad Gyawali
    Description

    gyawalishiva/cic-ids-2017-textual dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. f

    CICID2017 dataset information.

    • plos.figshare.com
    xls
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi (2025). CICID2017 dataset information. [Dataset]. http://doi.org/10.1371/journal.pone.0327137.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information technology has significantly impacted society. IoT and its specialized variant, IoMT, enable remote patient monitoring and improve healthcare. While it contributes to improving healthcare services, it may pose significant security challenges, especially due to the growing interconnectivity of IoMT devices. Hence, a robust IDS is required to handle these issues and prevent future intrusions in a appropriate time. This study proposes an IDS model for the IoMT that integrates advanced feature selection techniques and deep learning to enhance detection performance. The proposed model employs Information Gain (IG) and Recursive Feature Elimination (RFE) in parallel to select the top 50% of features, from which intersection and union subsets are created, followed by a deep autoencoder (DAE) to reduce dimensionality without losing important data. Finally, a deep neural network (DNN) classifies traffic as normal or anomalous. The Experimental results demonstrate superior performance in terms of accuracy, precision, recall, and F1 score. It achieves an accuracy of 99.93% on the WUSTL-EHMS-2020 dataset while reducing training time and attains 99.61% accuracy on the CICIDS2017 dataset. The model performance was validated with an average accuracy of 99.82% ± 0.16% and a statistically significant p-value of 0.0001 on the WUSTL-EHMS-2020 dataset, which refers to stable statistical improvement. This study indicates that the proposed strategy decreases computational complexity and enhances IDS efficiency in resource-constrained IoMT environments.

  12. CICIDS-2017 TUE

    • kaggle.com
    zip
    Updated May 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sweety (2020). CICIDS-2017 TUE [Dataset]. https://www.kaggle.com/sweety18/cicids2017-tue
    Explore at:
    zip(40046227 bytes)Available download formats
    Dataset updated
    May 14, 2020
    Authors
    Sweety
    Description

    Dataset

    This dataset was created by Sweety

    Contents

  13. UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

    • zenodo.org
    • explore.openaire.eu
    csv
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.5281/zenodo.7258579
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

    https://github.com/Yasir-ali-farrukh/Payload-Byte

    You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

    ```yaml
    @article{Payload,
    author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
    title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
    year = "2022",
    month = "9",
    url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
    doi = "10.36227/techrxiv.20714221.v1"
    }

  14. f

    Selected Features via a proposed approach from CICIDS2017.

    • plos.figshare.com
    xls
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi (2025). Selected Features via a proposed approach from CICIDS2017. [Dataset]. http://doi.org/10.1371/journal.pone.0327137.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Selected Features via a proposed approach from CICIDS2017.

  15. h

    CICIDS-2017-plus

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bert van keulen, CICIDS-2017-plus [Dataset]. https://huggingface.co/datasets/bvk/CICIDS-2017-plus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    bert van keulen
    Description

    bvk/CICIDS-2017-plus dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. f

    Comparison with recent methods on the CICIDS2017 dataset.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi (2025). Comparison with recent methods on the CICIDS2017 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0327137.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison with recent methods on the CICIDS2017 dataset.

  17. CICIDS2017_reduced

    • kaggle.com
    Updated May 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will (2021). CICIDS2017_reduced [Dataset]. https://www.kaggle.com/wmanka/cicids2017-reduced/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Will
    Description

    Dataset

    This dataset was created by Will

    Contents

  18. CICIDS2017_18

    • kaggle.com
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    saifullah saif (2021). CICIDS2017_18 [Dataset]. https://www.kaggle.com/saifullahsaif/cicids2017-18/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 24, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    saifullah saif
    Description

    Dataset

    This dataset was created by saifullah saif

    Contents

  19. h

    gtl-hids-embeddings

    • huggingface.co
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasan Mehdi (2025). gtl-hids-embeddings [Dataset]. https://huggingface.co/datasets/Hmehdi515/gtl-hids-embeddings
    Explore at:
    Dataset updated
    May 25, 2025
    Authors
    Hasan Mehdi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Network Traffic Embeddings Dataset

      Model Description
    

    This dataset contains embeddings generated from the CICIDS2017 network traffic dataset using a fine-tuned Meta-Llama-3.1-70B-Instruct model. The embeddings represent network traffic flows formatted in a structured way to capture key network traffic characteristics.

      Structure of Embeddings Files
    
    
    
    
    
      combined.npy
    

    The combined.npy file contains a NumPy array of shape (N, D) where:

    N is the total number… See the full description on the dataset page: https://huggingface.co/datasets/Hmehdi515/gtl-hids-embeddings.

  20. n

    Composed Encrypted Malicious Traffic Dataset for machine learning based...

    • narcis.nl
    • data.mendeley.com
    Updated Oct 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Z (via Mendeley Data) (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.1
    Explore at:
    Dataset updated
    Oct 6, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Wang, Z (via Mendeley Data)
    Description

    This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

    Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% from selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Haolei Chen (2025). CICIDS2017 [Dataset]. https://ieee-dataport.org/documents/cicids2017

CICIDS2017

Explore at:
Dataset updated
Jul 21, 2025
Authors
Haolei Chen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS.

Search
Clear search
Close search
Google apps
Main menu