14 datasets found
  1. CICIDS2017 Dataset

    • kaggle.com
    zip
    Updated Dec 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naveen Gill (2024). CICIDS2017 Dataset [Dataset]. https://www.kaggle.com/datasets/naveengill/cicids2017-dataset/code
    Explore at:
    zip(180271700 bytes)Available download formats
    Dataset updated
    Dec 7, 2024
    Authors
    Naveen Gill
    Description

    Dataset

    This dataset was created by Naveen Gill

    Released under Other (specified in description)

    Contents

  2. UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

    • kaggle.com
    Updated Sep 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasir-Ali (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/4170054
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yasir-Ali
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

    https://github.com/Yasir-ali-farrukh/Payload-Byte

    You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

    @article{Payload, 
    author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian", 
    title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}", 
    year = "2022", 
    month = "9", 
    url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221", 
    doi = "10.36227/techrxiv.20714221.v1" 
    }
    ```
    
    If you are using our tool or dataset, kindly cite our related paper which outlines the details of the tools and its processing.
    
  3. f

    Summary of hyperparameters.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh (2025). Summary of hyperparameters. [Dataset]. http://doi.org/10.1371/journal.pone.0312752.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Autonomous transportation systems have the potential to greatly impact the way we travel. A vital aspect of these systems is their connectivity, facilitated by intelligent transport applications. However, the safety ensured by the vehicular network can be easily compromised by malicious traffic with the exponential growth of IoT devices. One aspect is malicious traffic identification in Vehicular networks. We proposed a hybrid approach uses automated feature engineering via correlation-based feature selection (CFS) and principal component analysis (PCA)-based dimensionality reduction to reduce feature matrix size before a series of dense layers are used for classification. The intended use of CFS and PCA in the machine learning pipeline serves two folds benefit, first is that the resultant feature matrix contains attributes that are most useful for recognizing malicious traffic, and second that after CFS and PCA, the feature matrix has a smaller dimensionality which in turn means that smaller number of weights need to be trained for the dense layers (connections are required for the dense layers) which resulting in smaller model size. Furthermore, we show the impact of post-training model weight quantization to further reduce the model size. Results demonstrate the effectiveness of feature engineering which improves the classification f1score from 96.48% to 98.43%. It also reduces the model size from 28.09 KB to 20.34 KB thus optimizing the model in terms of both classification performance and model size. Post-training quantization further optimizes the model size to 9 KB. The experimental results using CICIDS2017 dataset demonstrate that proposed hybrid model performs well not only in terms of classification performance but also yields trained models that have a low parameter count and model size. Thus, the proposed low-complexity models can be used for intrusion detection in VANET scenario.

  4. CICIDS2017

    • kaggle.com
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zufar Asyraf (2024). CICIDS2017 [Dataset]. https://www.kaggle.com/zufarasyraf/cicids2017/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Zufar Asyraf
    Description

    Dataset

    This dataset was created by Zufar Asyraf

    Contents

  5. SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network...

    • zenodo.org
    tar
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha (2025). SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.15046995
    Explore at:
    tarAvailable download formats
    Dataset updated
    Mar 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These datasets provide packet-level labeling of the payloads in the CIC-IDS-2017 and UNSW-NB15 network intrusion detection datasets. A full discussion of the data processing can be found in our Transactions on Machine Learning Research journal paper SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection. Code for additional processing and experimentation can be found here. The UNSW-NB15 dataset contains over 50 million non-empty payloads coming from nine attack classes with benign background traffic. The CIC-IDS-2017 dataset contains over 30 million non-empty payloads coming from fourteen attack classes with benign background traffic. Both datasets are highly imbalanced, with 20-25x more benign packets than malicious ones.

  6. CICIDS-2017 TUE

    • kaggle.com
    zip
    Updated May 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sweety (2020). CICIDS-2017 TUE [Dataset]. https://www.kaggle.com/sweety18/cicids2017-tue
    Explore at:
    zip(40046227 bytes)Available download formats
    Dataset updated
    May 14, 2020
    Authors
    Sweety
    Description

    Dataset

    This dataset was created by Sweety

    Contents

  7. f

    Performance of testing on original dataset with generated dataset.

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Performance of testing on original dataset with generated dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of testing on original dataset with generated dataset.

  8. f

    Statistical description of the dataset.

    • figshare.com
    xls
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistical description of the dataset. [Dataset]. https://figshare.com/articles/dataset/Statistical_description_of_the_dataset_/21336560
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical description of the dataset.

  9. m

    Composed Encrypted Malicious Traffic Dataset for machine learning based...

    • data.mendeley.com
    Updated Oct 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihao Wang (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.2
    Explore at:
    Dataset updated
    Oct 12, 2021
    Authors
    Zihao Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

    Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.

  10. cicids 2017 try

    • kaggle.com
    zip
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sweety (2020). cicids 2017 try [Dataset]. https://www.kaggle.com/datasets/sweety18/cicids-2017-try
    Explore at:
    zip(16989882 bytes)Available download formats
    Dataset updated
    May 13, 2020
    Authors
    Sweety
    Description

    Dataset

    This dataset was created by Sweety

    Contents

  11. f

    Hyperparameter settings.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Hyperparameter settings. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  12. f

    Relationship between F1-Score and epoch.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Relationship between F1-Score and epoch. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  13. CICIDS2017_reduced

    • kaggle.com
    zip
    Updated May 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will (2021). CICIDS2017_reduced [Dataset]. https://www.kaggle.com/datasets/wmanka/cicids2017-reduced
    Explore at:
    zip(75795345 bytes)Available download formats
    Dataset updated
    May 2, 2021
    Authors
    Will
    Description

    Dataset

    This dataset was created by Will

    Contents

  14. f

    The results in the CICIDS2018 dataset.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). The results in the CICIDS2018 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Naveen Gill (2024). CICIDS2017 Dataset [Dataset]. https://www.kaggle.com/datasets/naveengill/cicids2017-dataset/code
Organization logo

CICIDS2017 Dataset

Explore at:
zip(180271700 bytes)Available download formats
Dataset updated
Dec 7, 2024
Authors
Naveen Gill
Description

Dataset

This dataset was created by Naveen Gill

Released under Other (specified in description)

Contents

Search
Clear search
Close search
Google apps
Main menu