19 datasets found

CIC-IDS-2017 V2
zenodo.org
zip
Updated Nov 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshayraj Madhubalan; Akshayraj Madhubalan; Amit Gautam; Amit Gautam; Priya Tiwary; Priya Tiwary (2024). CIC-IDS-2017 V2 [Dataset]. http://doi.org/10.5281/zenodo.10141593
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10141593
Dataset updated
Nov 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Akshayraj Madhubalan; Akshayraj Madhubalan; Amit Gautam; Amit Gautam; Priya Tiwary; Priya Tiwary
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CIC-IDS-V2 is an extended version of the original CIC-IDS 2017 dataset. The dataset is normalised and 1 new class called "Comb" is added which is a combination of synthesised data of multiple non-benign classes.

To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here.

Citation info:

Madhubalan, Akshayraj & Gautam, Amit & Tiwary, Priya. (2024). Blender-GAN: Multi-Target Conditional Generative Adversarial Network for Novel Class Synthetic Data Generation. 1-7. 10.1109/SmartNets61466.2024.10577645.

This dataset was made by Abluva Inc, a Palo Alto based, research-driven Data Protection firm. Our data protection platform empowers customers to secure data through advanced security mechanisms such as Fine Grained Access control and sophisticated depersonalization algorithms (e.g. Pseudonymization, Anonymization and Randomization). Abluva's Data Protection solutions facilitate data democratization within and outside the organizations, mitigating the concerns related to theft and compliance. The innovative intrusion detection algorithm by Abluva employs patented technologies for an intricately balanced approach that excludes normal access deviations, ensuring intrusion detection without disrupting the business operations. Abluva’s Solution enables organizations to extract further value from their data by enabling secure Knowledge Graphs and deploying Secure Data as a Service among other novel uses of data. Committed to providing a safe and secure environment, Abluva empowers organizations to unlock the full potential of their data.
i
CICIDS2017 and UNBSW-NB15
ieee-dataport.org
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xinpeng chen (2023). CICIDS2017 and UNBSW-NB15 [Dataset]. https://ieee-dataport.org/documents/cicids2017-and-unbsw-nb15
Explore at:
Dataset updated
Dec 13, 2023
Authors
xinpeng chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DoS
h
cyberbert_dataset
huggingface.co
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaitany Agrawal (2025). cyberbert_dataset [Dataset]. https://huggingface.co/datasets/agrawalchaitany/cyberbert_dataset
Explore at:
Dataset updated
Apr 10, 2025
Authors
Chaitany Agrawal
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Cleaned CICIDS2017 Dataset

This dataset is a cleaned and preprocessed version of the CICIDS2017 dataset created by the Canadian Institute for Cybersecurity, University of New Brunswick.

Modifications

Removed duplicate records Normalized feature names Filtered specific attack types Piviot the different attack data into single dataset

Source

Original dataset: CICIDS2017

License & Citation

This dataset is provided for research purposes. Please refer… See the full description on the dataset page: https://huggingface.co/datasets/agrawalchaitany/cyberbert_dataset.
f
Results of training under poisoned data.
plos.figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Results of training under poisoned data. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t007
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of training under poisoned data.
m
Composed Encrypted Malicious Traffic Dataset for machine learning based...
data.mendeley.com
Updated Oct 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihao Wang (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.2
Explore at:
Unique identifier
https://doi.org/10.17632/ztyk4h3v6s.2
Dataset updated
Oct 12, 2021
Authors
Zihao Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.
f
Performance of testing on original dataset.
figshare.com
xls
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Performance of testing on original dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t003
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of testing on original dataset.
f
Statistical description of the dataset.
figshare.com
xls
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Statistical description of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t001
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical description of the dataset.
f
Network Intrusion Detection Datasets
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ogobuchi Daniel Okey; Demostenes Zegarra Rodriguez (2023). Network Intrusion Detection Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.23118164.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23118164.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Ogobuchi Daniel Okey; Demostenes Zegarra Rodriguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the continuous expansion of data exchange, the threat of cybercrime and network invasions is also on the rise. This project aims to address these concerns by investigating an innovative approach: an Attentive Transformer Deep Learning Algorithm for Intrusion Detection of IoT Systems using Automatic Xplainable Feature Selection. The primary focus of this project is to develop an effective Intrusion Detection System (IDS) using the aforementioned algorithm. To accomplish this, carefully curated datasets have been utilized, which have been created through a meticulous process involving data extraction from the University of New Brunswick repository. This repository houses the datasets used in this research and can be accessed publically in order to replicate the findings of this research.
f
Results of training on the original dataset.
figshare.com
xls
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Results of training on the original dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t005
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of training on the original dataset.
f
CICID2017 dataset information.
plos.figshare.com
xls
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi (2025). CICID2017 dataset information. [Dataset]. http://doi.org/10.1371/journal.pone.0327137.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0327137.t004
Dataset updated
Jul 2, 2025
Dataset provided by
PLOS ONE
Authors
Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information technology has significantly impacted society. IoT and its specialized variant, IoMT, enable remote patient monitoring and improve healthcare. While it contributes to improving healthcare services, it may pose significant security challenges, especially due to the growing interconnectivity of IoMT devices. Hence, a robust IDS is required to handle these issues and prevent future intrusions in a appropriate time. This study proposes an IDS model for the IoMT that integrates advanced feature selection techniques and deep learning to enhance detection performance. The proposed model employs Information Gain (IG) and Recursive Feature Elimination (RFE) in parallel to select the top 50% of features, from which intersection and union subsets are created, followed by a deep autoencoder (DAE) to reduce dimensionality without losing important data. Finally, a deep neural network (DNN) classifies traffic as normal or anomalous. The Experimental results demonstrate superior performance in terms of accuracy, precision, recall, and F1 score. It achieves an accuracy of 99.93% on the WUSTL-EHMS-2020 dataset while reducing training time and attains 99.61% accuracy on the CICIDS2017 dataset. The model performance was validated with an average accuracy of 99.82% ± 0.16% and a statistically significant p-value of 0.0001 on the WUSTL-EHMS-2020 dataset, which refers to stable statistical improvement. This study indicates that the proposed strategy decreases computational complexity and enhances IDS efficiency in resource-constrained IoMT environments.
f
Results of testing under poisoned data.
plos.figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Results of testing under poisoned data. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275971.t008
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of testing under poisoned data.
f
WUSTL-EHMS-2020 dataset information.
plos.figshare.com
xls
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi (2025). WUSTL-EHMS-2020 dataset information. [Dataset]. http://doi.org/10.1371/journal.pone.0327137.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0327137.t003
Dataset updated
Jul 2, 2025
Dataset provided by
PLOS ONE
Authors
Ahmed Muqdad Alnasrallah; Maheyzah Md Siraj; Hanan Ali Alrikabi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information technology has significantly impacted society. IoT and its specialized variant, IoMT, enable remote patient monitoring and improve healthcare. While it contributes to improving healthcare services, it may pose significant security challenges, especially due to the growing interconnectivity of IoMT devices. Hence, a robust IDS is required to handle these issues and prevent future intrusions in a appropriate time. This study proposes an IDS model for the IoMT that integrates advanced feature selection techniques and deep learning to enhance detection performance. The proposed model employs Information Gain (IG) and Recursive Feature Elimination (RFE) in parallel to select the top 50% of features, from which intersection and union subsets are created, followed by a deep autoencoder (DAE) to reduce dimensionality without losing important data. Finally, a deep neural network (DNN) classifies traffic as normal or anomalous. The Experimental results demonstrate superior performance in terms of accuracy, precision, recall, and F1 score. It achieves an accuracy of 99.93% on the WUSTL-EHMS-2020 dataset while reducing training time and attains 99.61% accuracy on the CICIDS2017 dataset. The model performance was validated with an average accuracy of 99.82% ± 0.16% and a statistically significant p-value of 0.0001 on the WUSTL-EHMS-2020 dataset, which refers to stable statistical improvement. This study indicates that the proposed strategy decreases computational complexity and enhances IDS efficiency in resource-constrained IoMT environments.
f
Summary of dataset records for each class.
figshare.com
xls
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh (2025). Summary of dataset records for each class. [Dataset]. http://doi.org/10.1371/journal.pone.0312752.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312752.t002
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Autonomous transportation systems have the potential to greatly impact the way we travel. A vital aspect of these systems is their connectivity, facilitated by intelligent transport applications. However, the safety ensured by the vehicular network can be easily compromised by malicious traffic with the exponential growth of IoT devices. One aspect is malicious traffic identification in Vehicular networks. We proposed a hybrid approach uses automated feature engineering via correlation-based feature selection (CFS) and principal component analysis (PCA)-based dimensionality reduction to reduce feature matrix size before a series of dense layers are used for classification. The intended use of CFS and PCA in the machine learning pipeline serves two folds benefit, first is that the resultant feature matrix contains attributes that are most useful for recognizing malicious traffic, and second that after CFS and PCA, the feature matrix has a smaller dimensionality which in turn means that smaller number of weights need to be trained for the dense layers (connections are required for the dense layers) which resulting in smaller model size. Furthermore, we show the impact of post-training model weight quantization to further reduce the model size. Results demonstrate the effectiveness of feature engineering which improves the classification f1score from 96.48% to 98.43%. It also reduces the model size from 28.09 KB to 20.34 KB thus optimizing the model in terms of both classification performance and model size. Post-training quantization further optimizes the model size to 9 KB. The experimental results using CICIDS2017 dataset demonstrate that proposed hybrid model performs well not only in terms of classification performance but also yields trained models that have a low parameter count and model size. Thus, the proposed low-complexity models can be used for intrusion detection in VANET scenario.
f
Values of hyperparameters.
plos.figshare.com
xls
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Congyuan Xu; Donghui Li; Zihao Liu; Jun Yang; Qinfeng Shen; Ningbing Tong (2025). Values of hyperparameters. [Dataset]. http://doi.org/10.1371/journal.pone.0327161.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0327161.t002
Dataset updated
Jul 2, 2025
Dataset provided by
PLOS ONE
Authors
Congyuan Xu; Donghui Li; Zihao Liu; Jun Yang; Qinfeng Shen; Ningbing Tong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep learning methods have achieved remarkable progress in network intrusion detection. However, their performance often deteriorates significantly in real-world scenarios characterized by limited attack samples and substantial domain shifts. To address this challenge, we propose a novel few-shot intrusion detection method that integrates multi-domain feature fusion with a bidirectional cross-attention mechanism. Specifically, the method adopts a dual-branch feature extractor to jointly capture spatial and frequency domain characteristics of network traffic. The frequency domain features are obtained via two-dimensional discrete cosine transform (2D-DCT), which helps to highlight the spectral structure and improve feature discriminability. To bridge the semantic gap between support and query samples under few-shot conditions, we design a dual-domain bidirectional cross-attention module that enables deep, task-specific alignment across spatial and frequency domains. Additionally, we introduce a hierarchical feature encoding module based on a modified Mamba architecture, which leverages state space modeling to capture long-range dependencies and temporal patterns in traffic sequences. Extensive experiments on two benchmark datasets, CICIDS2017 and CICIDS2018, demonstrate that the proposed method achieves accuracy of 99.03% and 98.64% under the 10-shot setting, outperforming state-of-the-art methods. Moreover, the method exhibits strong cross-domain generalization, achieving over 95.13% accuracy in cross-domain scenarios, thereby proving its robustness and practical applicability in real-world, dynamic network environments.
f
Comparison of models with different layers.
plos.figshare.com
xls
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingfeng Li; Boyu Wang; Xueyan Wen; Yuao Chen (2025). Comparison of models with different layers. [Dataset]. http://doi.org/10.1371/journal.pone.0322000.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0322000.t002
Dataset updated
May 15, 2025
Dataset provided by
PLOS ONE
Authors
Qingfeng Li; Boyu Wang; Xueyan Wen; Yuao Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In light of the increasing threat posed by cyberattacks, it is imperative for organizations to accurately identify malicious network traffic. However, the imbalance among various attack categories diminishes the accuracy of model predictions. To address this issue, we propose the Maple-IDS dataset as an innovative solution. We utilize DPDK along with its zero-copy (ZC) technology and BPF compiler to compile filtering rules. Additionally, a headless client is employed to generate control traffic, thereby preventing overfitting. Our data collections are sourced from a variety of operating systems and middleware platforms, ensuring broad applicability and relevance. By comparing our dataset with the CIC-IDS-2017 dataset, we achieve a more balanced representation of attack data, which enhances the model’s learning performance. To tackle the challenges of low accuracy and slow convergence speed in existing network security situation predictions, we propose a network situation awareness prediction model that integrates a residual network with an improved attention mechanism. This model leverages the attention mechanism to assign greater weight to abnormal data, thereby facilitating the accurate identification of anomalies within large data streams. Furthermore, the residual network accelerates convergence speed, enhances the model’s expressive capability, and improves the efficiency of rapid response to attacks. Experimental results indicate that the accuracy of predicting attack data flows reaches an impressive 99.83%, which significantly aids in the early detection of network security threats and enables preemptive measures to maintain normal network operations.
f
Ablation results of CICIDS2017 data set.
plos.figshare.com
xls
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haizhen Wang; Xiaojing Yang; Na Jia (2025). Ablation results of CICIDS2017 data set. [Dataset]. http://doi.org/10.1371/journal.pone.0322839.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0322839.t007
Dataset updated
May 15, 2025
Dataset provided by
PLOS ONE
Authors
Haizhen Wang; Xiaojing Yang; Na Jia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Software Defined Networking (SDN) is an emerging network architecture and management method, whose core idea is to separate the network control plane from the data transmission plane. It is precisely because of this characteristic that SDN controllers are susceptible to external malicious attacks, the most common of which are Distributed Denial of Service (DDoS) attacks. This paper suggests a way to find DDoS attacks called ConvLTSM-MHA-TWD. It is based on the Convolutional Long Short-Term Memory Network (ConvLSTM) and three-way decision (TWD). It solves the problem of insufficient feature extraction in SDN environment and improves classification accuracy. This method uses ConvLSTM to extract data features, and uses multi-head attention (MHA) mechanism to learn the long-distance dependence relationship in the input data, and then constructs multi-granularity feature space. ConvLSTM and MHA outputs are added to form a residual connection to further enhance feature extraction and timing modeling capabilities and solve the problem of gradient disappearance during model training. Then the three-way decision theory is used to make decisions on network behaviors immediately. For the network behaviors that cannot be made immediately, the delayed decision is made, and the feature extraction and decision are made on this part of the network behaviors again. Finally, the classification results are output. This paper conducted experiments on data sets CICIDS2017 and DDoS SDN, with accuracy rates of 0.994 and 0.977, respectively, which has better overall performance, and is suitable for training large amounts of data.
f
Number of features selected.
plos.figshare.com
figshare.com
xls
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Islam Zada; Esraa Omran; Salman Jan; Hessa Alfraihi; Seetah Alsalamah; Abdullah Alshahrani; Shaukat Hayat; Nguyen Phi (2025). Number of features selected. [Dataset]. http://doi.org/10.1371/journal.pone.0328050.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0328050.t003
Dataset updated
Jul 21, 2025
Dataset provided by
PLOS ONE
Authors
Islam Zada; Esraa Omran; Salman Jan; Hessa Alfraihi; Seetah Alsalamah; Abdullah Alshahrani; Shaukat Hayat; Nguyen Phi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dynamical growth of cyber threats in IoT setting requires smart and scalable intrusion detection systems. In this paper, a Lean-based hybrid Intrusion Detection framework using Particle Swarm Optimization and Genetic Algorithm (PSO-GA) to select the features and Extreme Learning Machine and Bootstrap Aggregation (ELM-BA) to classify the features is introduced. The proposed framework obtains high detection rates on the CICIDS-2017 dataset, with 100 percent accuracy on important attack categories, like PortScan, SQL Injection, and Brute Force. Statistical verification and visual evaluation metrics are used to validate the model, which can be interpreted and proved to be solid. The framework is crafted following Lean ideals; thus, it has minimal computational overhead and optimal detection efficiency. It can be efficiently ported to the real-world usage in smart cities and industrial internet of things systems. The suggested framework can be deployed in smart cities and industrial Internet of Things (IoT) systems in real time, and it provides scalable and effective cyber threat detection. By adopting it, false positives can be greatly minimized, the latency of the decision-making process can be decreased, as well as the IoT critical infrastructure resilience against the ever-changing cyber threats can be increased.
f
Hyperparameter settings.
plos.figshare.com
xls
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Hyperparameter settings. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317713.t002
Dataset updated
Jan 16, 2025
Dataset provided by
PLOS ONE
Authors
Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.
f
Summary of hyperparameters.
plos.figshare.com
xls
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh (2025). Summary of hyperparameters. [Dataset]. http://doi.org/10.1371/journal.pone.0312752.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312752.t005
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Fayaz Hassan; Zafi Sherhan Syed; Aftab Ahmed Memon; Saad Said Alqahtany; Nadeem Ahmed; Mana Saleh Al Reshan; Yousef Asiri; Asadullah Shaikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Autonomous transportation systems have the potential to greatly impact the way we travel. A vital aspect of these systems is their connectivity, facilitated by intelligent transport applications. However, the safety ensured by the vehicular network can be easily compromised by malicious traffic with the exponential growth of IoT devices. One aspect is malicious traffic identification in Vehicular networks. We proposed a hybrid approach uses automated feature engineering via correlation-based feature selection (CFS) and principal component analysis (PCA)-based dimensionality reduction to reduce feature matrix size before a series of dense layers are used for classification. The intended use of CFS and PCA in the machine learning pipeline serves two folds benefit, first is that the resultant feature matrix contains attributes that are most useful for recognizing malicious traffic, and second that after CFS and PCA, the feature matrix has a smaller dimensionality which in turn means that smaller number of weights need to be trained for the dense layers (connections are required for the dense layers) which resulting in smaller model size. Furthermore, we show the impact of post-training model weight quantization to further reduce the model size. Results demonstrate the effectiveness of feature engineering which improves the classification f1score from 96.48% to 98.43%. It also reduces the model size from 28.09 KB to 20.34 KB thus optimizing the model in terms of both classification performance and model size. Post-training quantization further optimizes the model size to 9 KB. The experimental results using CICIDS2017 dataset demonstrate that proposed hybrid model performs well not only in terms of classification performance but also yields trained models that have a low parameter count and model size. Thus, the proposed low-complexity models can be used for intrusion detection in VANET scenario.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Akshayraj Madhubalan; Akshayraj Madhubalan; Amit Gautam; Amit Gautam; Priya Tiwary; Priya Tiwary (2024). CIC-IDS-2017 V2 [Dataset]. http://doi.org/10.5281/zenodo.10141593

CIC-IDS-2017 V2

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10141593

Dataset updated

Nov 26, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Akshayraj Madhubalan; Akshayraj Madhubalan; Amit Gautam; Amit Gautam; Priya Tiwary; Priya Tiwary

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The CIC-IDS-V2 is an extended version of the original CIC-IDS 2017 dataset. The dataset is normalised and 1 new class called "Comb" is added which is a combination of synthesised data of multiple non-benign classes.

To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here.

Citation info:

Madhubalan, Akshayraj & Gautam, Amit & Tiwary, Priya. (2024). Blender-GAN: Multi-Target Conditional Generative Adversarial Network for Novel Class Synthetic Data Generation. 1-7. 10.1109/SmartNets61466.2024.10577645.

This dataset was made by Abluva Inc, a Palo Alto based, research-driven Data Protection firm. Our data protection platform empowers customers to secure data through advanced security mechanisms such as Fine Grained Access control and sophisticated depersonalization algorithms (e.g. Pseudonymization, Anonymization and Randomization). Abluva's Data Protection solutions facilitate data democratization within and outside the organizations, mitigating the concerns related to theft and compliance. The innovative intrusion detection algorithm by Abluva employs patented technologies for an intricately balanced approach that excludes normal access deviations, ensuring intrusion detection without disrupting the business operations. Abluva’s Solution enables organizations to extract further value from their data by enabling secure Knowledge Graphs and deploying Secure Data as a Service among other novel uses of data. Committed to providing a safe and secure environment, Abluva empowers organizations to unlock the full potential of their data.

Clear search

Close search

Google apps

Main menu

CIC-IDS-2017 V2

CICIDS2017 and UNBSW-NB15

cyberbert_dataset

Results of training under poisoned data.

Composed Encrypted Malicious Traffic Dataset for machine learning based...

Performance of testing on original dataset.

Statistical description of the dataset.

Network Intrusion Detection Datasets

Results of training on the original dataset.

CICID2017 dataset information.

Results of testing under poisoned data.

WUSTL-EHMS-2020 dataset information.

Summary of dataset records for each class.

Values of hyperparameters.

Comparison of models with different layers.

Ablation results of CICIDS2017 data set.

Number of features selected.

Hyperparameter settings.

Summary of hyperparameters.

CIC-IDS-2017 V2See More Versions

CIC-IDS-2017 V2