11 datasets found

CICIDS2017 Dataset
kaggle.com
zip
Updated Dec 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naveen Gill (2024). CICIDS2017 Dataset [Dataset]. https://www.kaggle.com/datasets/naveengill/cicids2017-dataset/code
Explore at:
zip(180271700 bytes)Available download formats
Dataset updated
Dec 7, 2024
Authors
Naveen Gill
Description
Dataset

This dataset was created by Naveen Gill

Released under Other (specified in description)

Contents
SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network...
zenodo.org
tar
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha (2025). SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.15046995
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15046995
Dataset updated
Mar 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brian Matejek; Brian Matejek; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha; Ashish Gehani; Nathaniel Bastian; Daniel Clouse; Bradford Kline; Susmit Jha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These datasets provide packet-level labeling of the payloads in the CIC-IDS-2017 and UNSW-NB15 network intrusion detection datasets. A full discussion of the data processing can be found in our Transactions on Machine Learning Research journal paper SAFE-NID: Self-Attention with Normalizing-Flow Encodings for Network Intrusion Detection. Code for additional processing and experimentation can be found here. The UNSW-NB15 dataset contains over 50 million non-empty payloads coming from nine attack classes with benign background traffic. The CIC-IDS-2017 dataset contains over 30 million non-empty payloads coming from fourteen attack classes with benign background traffic. Both datasets are highly imbalanced, with 20-25x more benign packets than malicious ones.
UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data
kaggle.com
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasir-Ali (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/4170054
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4170054
Dataset updated
Sep 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yasir-Ali
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

https://github.com/Yasir-ali-farrukh/Payload-Byte

You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

@article{Payload, author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian", title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}", year = "2022", month = "9", url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221", doi = "10.36227/techrxiv.20714221.v1" } ``` If you are using our tool or dataset, kindly cite our related paper which outlines the details of the tools and its processing.
CICIDS-2017 TUE
kaggle.com
zip
Updated May 14, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sweety (2020). CICIDS-2017 TUE [Dataset]. https://www.kaggle.com/sweety18/cicids2017-tue
Explore at:
zip(40046227 bytes)Available download formats
Dataset updated
May 14, 2020
Authors
Sweety
Description
Dataset

This dataset was created by Sweety

Contents
f
Literature review comprising main studies.
figshare.com
xls
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila (2024). Literature review comprising main studies. [Dataset]. http://doi.org/10.1371/journal.pone.0299666.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299666.t001
Dataset updated
Jun 21, 2024
Dataset provided by
PLOS ONE
Authors
Nasrullah Khan; Muhammad Ismail Mohmand; Sadaqat ur Rehman; Zia Ullah; Zahid Khan; Wadii Boulila
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.
f
Performance of testing on original dataset with generated dataset.
plos.figshare.com
xls
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Performance of testing on original dataset with generated dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t004
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of testing on original dataset with generated dataset.
f
Relationship between F1-Score and epoch.
plos.figshare.com
xls
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Relationship between F1-Score and epoch. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317713.t003
Dataset updated
Jan 16, 2025
Dataset provided by
PLOS ONE
Authors
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.
f
Hyperparameter settings.
plos.figshare.com
xls
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Hyperparameter settings. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317713.t002
Dataset updated
Jan 16, 2025
Dataset provided by
PLOS ONE
Authors
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.
f
The results in the CICIDS2018 dataset.
plos.figshare.com
xls
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). The results in the CICIDS2018 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317713.t005
Dataset updated
Jan 16, 2025
Dataset provided by
PLOS ONE
Authors
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.
f
Statistical description of the dataset.
figshare.com
xls
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistical description of the dataset. [Dataset]. https://figshare.com/articles/dataset/Statistical_description_of_the_dataset_/21336560
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t001
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical description of the dataset.
f
Summary of hyperparameters.
plos.figshare.com
figshare.com
xls
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh (2025). Summary of hyperparameters. [Dataset]. http://doi.org/10.1371/journal.pone.0312752.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312752.t005
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Autonomous transportation systems have the potential to greatly impact the way we travel. A vital aspect of these systems is their connectivity, facilitated by intelligent transport applications. However, the safety ensured by the vehicular network can be easily compromised by malicious traffic with the exponential growth of IoT devices. One aspect is malicious traffic identification in Vehicular networks. We proposed a hybrid approach uses automated feature engineering via correlation-based feature selection (CFS) and principal component analysis (PCA)-based dimensionality reduction to reduce feature matrix size before a series of dense layers are used for classification. The intended use of CFS and PCA in the machine learning pipeline serves two folds benefit, first is that the resultant feature matrix contains attributes that are most useful for recognizing malicious traffic, and second that after CFS and PCA, the feature matrix has a smaller dimensionality which in turn means that smaller number of weights need to be trained for the dense layers (connections are required for the dense layers) which resulting in smaller model size. Furthermore, we show the impact of post-training model weight quantization to further reduce the model size. Results demonstrate the effectiveness of feature engineering which improves the classification f1score from 96.48% to 98.43%. It also reduces the model size from 28.09 KB to 20.34 KB thus optimizing the model in terms of both classification performance and model size. Post-training quantization further optimizes the model size to 9 KB. The experimental results using CICIDS2017 dataset demonstrate that proposed hybrid model performs well not only in terms of classification performance but also yields trained models that have a low parameter count and model size. Thus, the proposed low-complexity models can be used for intrusion detection in VANET scenario.
Not seeing a result you expected?
Learn how you can add new datasets to our index.