30 datasets found
  1. CICIDS-2017

    • kaggle.com
    zip
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bert van keulen (2024). CICIDS-2017 [Dataset]. https://www.kaggle.com/datasets/bertvankeulen/cicids-2017
    Explore at:
    zip(533474114 bytes)Available download formats
    Dataset updated
    Jul 17, 2024
    Authors
    bert van keulen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Raw network data was collected over a period of 5 days, Monday through Friday, and stored in PCAP files. Monday was used to create most of the Benign data, while the Attack-Network implemented various types of attacks over the next 4 days, such as Brute Force connections (FTP and SSH), several types of DoS attacks, as well as a Botnet attack, Infiltration attacks and subsequent Port-Scanning activity.

    The PCAP data was processed using a tool developed by one of the authors of [1], called CICFlowMeter [3]. This tool produces flow traces: sequences of packets between specific source and destination IP, with corresponding values for source and destination ports. TCP flows are usually terminated by connection teardowns, while UDP flows are terminated by a flow timeout. For each of these flow traces many features were selected, measuring flow characteristics, such as packet size, number of packets, flow duration, etc. For some of these variables, statistics such as their mean and standard deviations are provided as features as well. While several features are categorical (such as IP addresses, Port numbers and TCP flag counts), most of the other features are numerical.

    The result is the CICIDS-2017 dataset, with about 80 features and several attack families which can ultimately be divided in 16 categories: one Benign category and 15 Attack categories. This original dataset is available at [4]. Subsequently, the authors of [2] spent a lot of effort to correct some errors in the dataset, by fixing the CICFlowMeter software (especially regarding TCP flow terminations) and by re-labeling some of the samples accordingly. They posted the corrected dataset on their website [5]; this also has links to their GitHub site, which provides Python code that can be used to efficiently import the data. I used that as a starting point for my notebook, here on Kaggle.

    For each of the 5 days a csv file with network flows was produced.

    These are the files in the dataset, with some changes: I created decimal values for the IP-addresses, and I removed a couple of rows with inf values.

    In addition, I created 5 more files (_plus for each day), with extra features that translate information regarding traffic flows within the local network, or between the local network and external IP addresses. It should be noted that only two attacks have an external IP address, while for most attacks the local network is facing the gateway.

    [1] Sharafaldin I., Lashkari A.H., and Ghorbani A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization, Proceedings of the 4th International Conference on Information Systems Security and Privacy ICISSP - Volume 1, 108-116, 2018. [2] Engelen G., Rimmer V., and Joosen W. Troubleshooting an intrusion detection dataset: the CICIDS2017 case study, 2021 IEEE Security and Privacy Workshops (SPW), 2021:7-12. [3] https://www.unb.ca/cic/research/applications.html [4] https://www.unb.ca/cic/datasets/ids-2017.html [5] https://intrusion-detection.distrinet-research.be/CNS2022/index.html

  2. Intrusion Detection Datasets (BCCC-CIC-IDS-2017)

    • kaggle.com
    zip
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Behaviour-Centric Cybersecurity Center (BCCC) (2025). Intrusion Detection Datasets (BCCC-CIC-IDS-2017) [Dataset]. https://www.kaggle.com/datasets/bcccdatasets/intrusion-detection-datasets-bccc-cic-ids-2017/code
    Explore at:
    zip(393241701 bytes)Available download formats
    Dataset updated
    Apr 18, 2025
    Authors
    Behaviour-Centric Cybersecurity Center (BCCC)
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Using NLFlowLyzer, we successfully generated the “BCCC-CIC-IDS2017” dataset by extracting key flows from raw network traffic data of CIC-IDS2017, resulting in CSV files integrating essential network and transport layer features. This new dataset offers a structured approach for analyzing intrusion detection, combining diverse traffic types into multiple sub-categories. The “BCCC-CIC-IDS2017” dataset enriches the depth and variety needed to rigorously evaluate our proposed profiling model, advancing research in network security and enhancing the development of intrusion detection systems.

    The full research paper outlining the details of the dataset and its underlying principles:

    "NTLFlowLyzer: Toward Generating an Intrusion Detection Dataset and Intruders Behavior Profiling through Network Layer Traffic Analysis and Pattern Extraction, MohammadMoein Shafi, Arash Habibi Lashkari, Arousha Haghighian Roudsari, Computer & Security, Computers & Security, 104160, ISSN 0167-4048 (2024)"

  3. Intrusion detection IDS Data cleaned

    • kaggle.com
    zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    arar tawil (2024). Intrusion detection IDS Data cleaned [Dataset]. https://www.kaggle.com/datasets/araraltawil/ids-data-cleaned
    Explore at:
    zip(219896832 bytes)Available download formats
    Dataset updated
    Aug 4, 2024
    Authors
    arar tawil
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of reliable test and validation datasets, anomaly-based intrusion detection approaches are suffering from consistent and accurate performance evolutions.

    Our evaluations of the existing eleven datasets since 1998 show that most are out of date and unreliable. Some of these datasets suffer from the lack of traffic diversity and volumes, some do not cover the variety of known attacks, while others anonymize packet payload data, which cannot reflect the current trends. Some are also lacking feature set and metadata.

    CICIDS2017 dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files). Also available is the extracted features definition.

    Generating realistic background traffic was our top priority in building this dataset. We have used our proposed B-Profile system (Sharafaldin, et al. 2016) to profile the abstract behavior of human interactions and generates naturalistic benign background traffic. For this dataset, we built the abstract behaviour of 25 users based on the HTTP, HTTPS, FTP, SSH, and email protocols.

    The data capturing period started at 9 a.m., Monday, July 3, 2017 and ended at 5 p.m. on Friday July 7, 2017, for a total of 5 days. Monday is the normal day and only includes the benign traffic. The implemented attacks include Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS. They have been executed both morning and afternoon on Tuesday, Wednesday, Thursday and Friday.

  4. The results in the CICIDS2017 dataset.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). The results in the CICIDS2017 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  5. DETECTION OF CYBER ATTACK IN NETWORK.

    • kaggle.com
    zip
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamal Acharya (2025). DETECTION OF CYBER ATTACK IN NETWORK. [Dataset]. https://www.kaggle.com/datasets/acharyakamal/detection-of-cyber-attack-in-network
    Explore at:
    zip(2102201 bytes)Available download formats
    Dataset updated
    Jul 3, 2025
    Authors
    Kamal Acharya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contrasted with the past, improvements in PC and correspondence innovations have given broad and propelled changes. The use of new innovations give incredible advantages to people, organizations, and governments, be that as it may, messes some up against them. For instance, the protection of significant data, security of put away information stages, accessibility of information and so forth. Contingent upon these issues, digital fear based oppression is one of the most significant issues in this day and age. Digital fear, which made a great deal of issues people and establishments, has arrived at a level that could undermine open and nation security by different gatherings, for example, criminal association, proficient people and digital activists. Along these lines, Intrusion Detection Systems (IDS) has been created to maintain a strategic distance from digital assaults. Right now, learning the bolster support vector machine (SVM) calculations were utilized to recognize port sweep endeavors dependent on the new CICIDS2017 dataset with 97.80%, 69.79% precision rates were accomplished individually.

  6. Ablation study results on CICIDS2017 dataset.

    • plos.figshare.com
    xls
    Updated Oct 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Wang; Ning Huang; Houzhong Zhang; Luyun Liu; Qiang Fu; Kerang Cao; Xiwang Guo; Hoekyung Jung (2025). Ablation study results on CICIDS2017 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0332502.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jun Wang; Ning Huang; Houzhong Zhang; Luyun Liu; Qiang Fu; Kerang Cao; Xiwang Guo; Hoekyung Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The rapid evolution of cyber threats poses significant challenges to the adaptability and performance of anomaly detection systems. This study presents an innovative hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and Transformer models with a novel self-learning mechanism to enhance network traffic anomaly detection. Our key contributions include: (1) a synergistic two-stage model fusion architecture that captures both spatial and temporal traffic patterns; (2) an adaptive learning mechanism with multi-metric drift detection that autonomously responds to evolving threats; and (3) a knowledge preservation strategy that maintains detection capabilities while adapting to new attack patterns. The proposed CNN-LSTM model achieves F1-scores of 0.9778 and 0.9695 on the UNSW-NB15 and CICIDS2017 datasets respectively for binary classification of normal vs. anomalous traffic. The LSTM-Transformer model further classifies specific anomaly types with accuracies of 0.9632 and 0.9528 on these datasets, representing significant improvements over recent methods. Experiments demonstrate the framework’s robustness, maintaining an average accuracy of 0.955 ( 0.005) over a 15-day simulated period with multiple induced concept drifts. The self-learning mechanism, with multi-metric drift detection and an efficient model update strategy, enables the system to detect drifts and recover performance within 23.4 ± 0.20 hours post-drift, while achieving a 92.8% detection rate for zero-day attacks. The proposed framework offers a promising direction for developing efficient and autonomous cybersecurity systems capable of handling dynamic and evolving threat landscapes.

  7. Improved CICIDS2017 and CSECICIDS2018

    • kaggle.com
    zip
    Updated Aug 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ernie Chang (2023). Improved CICIDS2017 and CSECICIDS2018 [Dataset]. https://www.kaggle.com/datasets/ernie55ernie/improved-cicids2017-and-csecicids2018
    Explore at:
    zip(10985642855 bytes)Available download formats
    Dataset updated
    Aug 15, 2023
    Authors
    Ernie Chang
    Description

    This dataset is obtained from Error Prevalence in NIDS datasets: A Case Study on CIC-IDS-2017 and CSE-CIC-IDS-2018.

    It is improved according to the paper [1].

    [1] Liu, Lisa, et al. "Error prevalence in nids datasets: A case study on cic-ids-2017 and cse-cic-ids-2018." 2022 IEEE Conference on Communications and Network Security (CNS). IEEE, 2022.

  8. CIC-IDS-Collection

    • kaggle.com
    • huggingface.co
    zip
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    StrGenIx | Laurens D'hooge (2022). CIC-IDS-Collection [Dataset]. https://www.kaggle.com/datasets/dhoogla/cicidscollection
    Explore at:
    zip(864681190 bytes)Available download formats
    Dataset updated
    Nov 9, 2022
    Authors
    StrGenIx | Laurens D'hooge
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Canadian Institute for Cybersecurity has published several datasets for network intrusion detection. Four of them: CIC-IDS2017, CIC-DoS2017, CSE-CIC-IDS2018 and CIC-DDoS2019 are collated here into one collection, cleaned up and with harmonized labeling.

    The intent behind this collection is simple: to have a larger, more varied set of NIDS samples for more powerful analyses by researchers. Too often, researchers still rely on the individual datasets even though the full set is compatible out-of-the-box. The parts have been created for the same purpose and they have been processed with the same feature extraction tool chain.

    This collection also takes into account 2 articles in which flawed features were discovered. Those features have been removed from the dataset. See the cleanup notebook for more information.

    If you make use of this combined version, please credit the original authors. The relevant publications are cited here on Kaggle alongside the individual datasets and they are also readily available at the CIC's official dataset distribution page

  9. Labels of normal and attack classes in the CICIDS-2017 dataset.

    • plos.figshare.com
    xls
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Islam Zada; Esraa Omran; Salman Jan; Hessa Alfraihi; Seetah Alsalamah; Abdullah Alshahrani; Shaukat Hayat; Nguyen Phi (2025). Labels of normal and attack classes in the CICIDS-2017 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0328050.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Islam Zada; Esraa Omran; Salman Jan; Hessa Alfraihi; Seetah Alsalamah; Abdullah Alshahrani; Shaukat Hayat; Nguyen Phi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Labels of normal and attack classes in the CICIDS-2017 dataset.

  10. CICIDS2017: Cleaned & Preprocessed

    • kaggle.com
    zip
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Anacleto Ribeiro (2025). CICIDS2017: Cleaned & Preprocessed [Dataset]. https://www.kaggle.com/datasets/ericanacletoribeiro/cicids2017-cleaned-and-preprocessed/data
    Explore at:
    zip(210143955 bytes)Available download formats
    Dataset updated
    Jan 12, 2025
    Authors
    Eric Anacleto Ribeiro
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Cleaned and Preprocessed CICIDS2017 Data for Machine Learning

    This dataset provides a cleaned and preprocessed version of the original CICIDS2017 network intrusion detection dataset, prepared for machine learning. It includes the following CSV file:

    1. cicids2017_cleaned.csv: Contains the raw, unscaled feature values after cleaning and preprocessing, ready for further treatment (such as scaling and sampling) after train/test split.

    Original Dataset:

    The CICIDS2017 dataset (available here) is a widely used benchmark dataset in cybersecurity research. It captures network traffic with both benign (normal) activity and various attack scenarios, making it suitable for developing and testing intrusion detection systems. However, the original dataset presents some challenges for direct use in machine learning due to missing values, duplicate entries, inconsistencies, and the need for feature engineering.

    Steps Taken:

    1. File Merging: The original CICIDS2017 dataset is split across multiple CSV files. These files have been merged into a single, unified dataset.
    2. Duplicate Removal: Duplicate rows have been identified and removed to improve data integrity and model performance. The same process was applied to identify and remove duplicate columns.
    3. Infinite Value Handling: Infinite values have been replaced with NaN and then handled along with other missing values.
    4. Missing Value Handling: Rows with missing values have been removed due to their minimal impact on the dataset (less than 1% of total rows). This decision simplifies data handling while minimizing the risk of introducing bias through imputation.
    5. Inconsistent Whitespace: Leading and trailing whitespace in column names have been removed for consistency.
    6. Data-Driven Feature Selection:
      • Columns with only one unique value have been removed, as they do not contribute to the performance of machine learning models.
      • Correlation Analysis: To reduce multicollinearity, one feature from each pair with a near-perfect correlation (>= 0.99) has been removed. This simplifies the dataset and can improve the interpretability of machine learning models.
      • H-Statistics and Tree Feature Selection: The Kruskal-Wallis test has been combined with the built-in feature selection capabilities of Random Forest to eliminate statistically irrelevant columns from the dataset.
    7. Target Feature Handling:
      • The original 'Label' column has been converted into a new column named 'Attack Type', with similar attack labels grouped into broader categories (e.g., DoS Hulk, DoS GoldenEye are grouped as "DoS").
      • Rare attack types ('Infiltration', 'Heartbleed') have been removed to prevent potential overfitting and improve model generalization.

    Source Code and Project:

    • Notebook: The Jupyter Notebook used to generate this dataset is available here.
    • Repository: This dataset is part of a larger project to develop a Raspberry Pi-based Network Intrusion Detection System (NIDS) prototype for Small and Medium Enterprises (SMEs). The complete project repository, including the NIDS prototype code, is available on GitHub.

    Kudos to chethuhn, who, among others, uploaded the original CICIDS2017 to Kaggle.

  11. J

    Data and code associated with the publication: Improving IDS performance...

    • archive.data.jhu.edu
    Updated Nov 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph A. Zatika (2024). Data and code associated with the publication: Improving IDS performance with XGBoost: hyperparameter optimization and real-time analysis [Dataset]. http://doi.org/10.7281/T1/WBOOHG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2024
    Dataset provided by
    Johns Hopkins Research Data Repository
    Authors
    Joseph A. Zatika
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The RS2024 dataset features logs generated by network load balancers (Layer 4) and application load balancers (Layer 7). It includes identified attack traffic through detailed payload analysis, making it ideal for testing Network Intrusion Detection Systems (NIDS) in real-time enterprise scenarios. The GC2024 dataset comprises Zeek network logs, offering a combination of attack traffic and bot traffic sourced from public repositories. This data was obtained from an AWS Elastic Compute Cloud (EC2) instance, presenting a varied setting for assessing security systems. The network intrusion detection model using the eXtreme Gradient Boosting (XGBoost) algorithm was trained on a combined dataset that includes UNSW-NB15, CICIDS2017, TON_IoT, RS-2024, and GC-2024. The model, saved in joblib format, is ready for deployment and can be used for further analysis.

  12. Results of training under poisoned data.

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Results of training under poisoned data. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of training under poisoned data.

  13. Attack types distribution in CICIDS2017 and CICIDS2018 datasets.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Attack types distribution in CICIDS2017 and CICIDS2018 datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Attack types distribution in CICIDS2017 and CICIDS2018 datasets.

  14. The results in the CICIDS2018 dataset.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). The results in the CICIDS2018 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  15. Number of features selected.

    • plos.figshare.com
    xls
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Islam Zada; Esraa Omran; Salman Jan; Hessa Alfraihi; Seetah Alsalamah; Abdullah Alshahrani; Shaukat Hayat; Nguyen Phi (2025). Number of features selected. [Dataset]. http://doi.org/10.1371/journal.pone.0328050.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Islam Zada; Esraa Omran; Salman Jan; Hessa Alfraihi; Seetah Alsalamah; Abdullah Alshahrani; Shaukat Hayat; Nguyen Phi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dynamical growth of cyber threats in IoT setting requires smart and scalable intrusion detection systems. In this paper, a Lean-based hybrid Intrusion Detection framework using Particle Swarm Optimization and Genetic Algorithm (PSO-GA) to select the features and Extreme Learning Machine and Bootstrap Aggregation (ELM-BA) to classify the features is introduced. The proposed framework obtains high detection rates on the CICIDS-2017 dataset, with 100 percent accuracy on important attack categories, like PortScan, SQL Injection, and Brute Force. Statistical verification and visual evaluation metrics are used to validate the model, which can be interpreted and proved to be solid. The framework is crafted following Lean ideals; thus, it has minimal computational overhead and optimal detection efficiency. It can be efficiently ported to the real-world usage in smart cities and industrial internet of things systems. The suggested framework can be deployed in smart cities and industrial Internet of Things (IoT) systems in real time, and it provides scalable and effective cyber threat detection. By adopting it, false positives can be greatly minimized, the latency of the decision-making process can be decreased, as well as the IoT critical infrastructure resilience against the ever-changing cyber threats can be increased.

  16. Hyperparameter settings.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Hyperparameter settings. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  17. Relationship between F1-Score and epoch.

    • plos.figshare.com
    xls
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu (2025). Relationship between F1-Score and epoch. [Dataset]. http://doi.org/10.1371/journal.pone.0317713.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Congyuan Xu; Yong Zhan; Guanghui Chen; Zhiqiang Wang; Siqing Liu; Weichen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.

  18. f

    Comparative performance analysis on CICIDS2017 datasets.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Wang; Ning Huang; Houzhong Zhang; Luyun Liu; Qiang Fu; Kerang Cao; Xiwang Guo; Hoekyung Jung (2025). Comparative performance analysis on CICIDS2017 datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0332502.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Jun Wang; Ning Huang; Houzhong Zhang; Luyun Liu; Qiang Fu; Kerang Cao; Xiwang Guo; Hoekyung Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparative performance analysis on CICIDS2017 datasets.

  19. Performance of testing on original dataset with generated dataset.

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Performance of testing on original dataset with generated dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of testing on original dataset with generated dataset.

  20. Results of training on the original dataset.

    • figshare.com
    xls
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie (2023). Results of training on the original dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0275971.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ebtihaj Alshahrani; Daniyal Alghazzawi; Reem Alotaibi; Osama Rabie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of training on the original dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
bert van keulen (2024). CICIDS-2017 [Dataset]. https://www.kaggle.com/datasets/bertvankeulen/cicids-2017
Organization logo

CICIDS-2017

Explore at:
zip(533474114 bytes)Available download formats
Dataset updated
Jul 17, 2024
Authors
bert van keulen
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Raw network data was collected over a period of 5 days, Monday through Friday, and stored in PCAP files. Monday was used to create most of the Benign data, while the Attack-Network implemented various types of attacks over the next 4 days, such as Brute Force connections (FTP and SSH), several types of DoS attacks, as well as a Botnet attack, Infiltration attacks and subsequent Port-Scanning activity.

The PCAP data was processed using a tool developed by one of the authors of [1], called CICFlowMeter [3]. This tool produces flow traces: sequences of packets between specific source and destination IP, with corresponding values for source and destination ports. TCP flows are usually terminated by connection teardowns, while UDP flows are terminated by a flow timeout. For each of these flow traces many features were selected, measuring flow characteristics, such as packet size, number of packets, flow duration, etc. For some of these variables, statistics such as their mean and standard deviations are provided as features as well. While several features are categorical (such as IP addresses, Port numbers and TCP flag counts), most of the other features are numerical.

The result is the CICIDS-2017 dataset, with about 80 features and several attack families which can ultimately be divided in 16 categories: one Benign category and 15 Attack categories. This original dataset is available at [4]. Subsequently, the authors of [2] spent a lot of effort to correct some errors in the dataset, by fixing the CICFlowMeter software (especially regarding TCP flow terminations) and by re-labeling some of the samples accordingly. They posted the corrected dataset on their website [5]; this also has links to their GitHub site, which provides Python code that can be used to efficiently import the data. I used that as a starting point for my notebook, here on Kaggle.

For each of the 5 days a csv file with network flows was produced.

These are the files in the dataset, with some changes: I created decimal values for the IP-addresses, and I removed a couple of rows with inf values.

In addition, I created 5 more files (_plus for each day), with extra features that translate information regarding traffic flows within the local network, or between the local network and external IP addresses. It should be noted that only two attacks have an external IP address, while for most attacks the local network is facing the gateway.

[1] Sharafaldin I., Lashkari A.H., and Ghorbani A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization, Proceedings of the 4th International Conference on Information Systems Security and Privacy ICISSP - Volume 1, 108-116, 2018. [2] Engelen G., Rimmer V., and Joosen W. Troubleshooting an intrusion detection dataset: the CICIDS2017 case study, 2021 IEEE Security and Privacy Workshops (SPW), 2021:7-12. [3] https://www.unb.ca/cic/research/applications.html [4] https://www.unb.ca/cic/datasets/ids-2017.html [5] https://intrusion-detection.distrinet-research.be/CNS2022/index.html

Search
Clear search
Close search
Google apps
Main menu