Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Using NLFlowLyzer, we successfully generated the “BCCC-CIC-IDS2017” dataset by extracting key flows from raw network traffic data of CIC-IDS2017, resulting in CSV files integrating essential network and transport layer features. This new dataset offers a structured approach for analyzing intrusion detection, combining diverse traffic types into multiple sub-categories. The “BCCC-CIC-IDS2017” dataset enriches the depth and variety needed to rigorously evaluate our proposed profiling model, advancing research in network security and enhancing the development of intrusion detection systems.
The full research paper outlining the details of the dataset and its underlying principles:
“NTLFlowLyzer: Toward Generating an Intrusion Detection Dataset and Intruders Behavior Profiling through Network Layer Traffic Analysis and Pattern Extraction, MohammadMoein Shafi, Arash Habibi Lashkari, Arousha Haghighian Roudsari, Computer & Security, Computers & Security, 104160, ISSN 0167-4048 (2024)” https://doi.org/10.1016/j.cose.2024.104160
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is an academic intrusion detection dataset. All the credit goes to the original authors: Dr. Iman Sharafaldin, Dr. Arash Habibi Lashkari Dr. Ali Ghorbani. Please cite their original paper.
It was published by the Canadian Institute for Cybersecurity and is the successor to CIC-IDS2017. The biggest difference is the move away from on-premise infrastructure to AWS to generate the dataset. It also vastly increased the representation of 'Infiltration' traffic compared to CIC-IDS2017.
V1: Base dataset in CSV format as downloaded from here V2: Cleaning -> parquet files V3: Reorganize to save storage, only keep original CSVs in V1/V2
In the parquet files all data types are already set correctly, there are 0 records with missing information and 0 duplicate records in this clean version. Baseline classification scores with simple models will be available shorty.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
makekali/CIC-IDS-2017 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This directory consists on 24x24 images
train folder have total 1548421 images from 10 classes test folder have 663609 images from 10 classes
This Dataset is Spectrogram converted images using method explained in our research article XYZ. Dataset Used: Intrusion detection evaluation dataset (CIC-IDS2017) Image Size: 28x28 Classes:
BENIGN Bot DDoS DoS GoldenEye DoS Hulk DoS Slowhttptest DoS slowloris Heartbleed Infiltration PortScan
License: https://www.unb.ca/cic/datasets/ids-2017.html… See the full description on the dataset page: https://huggingface.co/datasets/rashid-rao/CICIDS2017-Images-spectrograms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of stream records in CICIDS2017 dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CoAt-Set dataset is a transformed, specialized dataset designed to support collaborative anomaly detection within Collaborative Intrusion Detection Systems (CIDS). This dataset extract coordinated attack behaviors from established sources like CIC-ToN-IoT, CIC-IDS2017, CIC-UNSW-NB15, CSE-CIC-IDS2018, CIC-BoT-IoT, Distrinet-CIC-IDS2017, and NF-UQ-NIDS, refining these raw datasets into a format specifically designed for collaborative anomaly detection. CoAt-Set highlights coordinated attack situations, such as widespread, stealthy scanning, worm outbreaks, and distributed denial-of-service (DDoS) attacks, all of which mirror high-impact, realistic threats that are frequently found in today’s complex networks.
In creating CoAt-Set, the data was restructured and relabeled to represent collaborative security environments for CIDS better. This involved organizing attack behaviors in ways that offer clear annotations and insightful traffic features, which is invaluable for training and testing systems aimed at identifying anomalies across multiple networks. The focus on collaborative contexts means that CoAt-Set is well-suited for researchers and developers who need a dataset that goes beyond isolated attack patterns, instead offering a perspective on the kinds of threats that might span across different segments of a network. CoAt-Set integrates well with common neural network algorithms, allowing researchers to easily plug it into various neural network models and algorithms. The dataset also has broader uses, as it supports development of collaborative machine learning algorithms and can be used to simulate and test attacks across networks with varying configurations.
A usage example of CoAt-Set in practice is in multi-agent CIDS setups where each agent analyzes non-IID data for distributed learning. For instance, the NF-UQ-NIDS dataset, created from multiple data sources, allows extraction of coordinated attack patterns, saved as "CoAt_NF-UQ-NIDS-V2.parquet." This version is useful for creating non-IID scenarios within a single dataset, then distributing it across agents for collaborative learning. Each agent can independently learn from its unique data, making this a powerful tool for distributed security research.
To further increase heterogeneity, each CIDS agent can be assigned a different CoAt-Set version, such as "CoAt_CIC-BoT-IoT-V2.parquet," "CoAt_CIC-IDS2017-V2.parquet," "CoAt_CIC-ToN-IoT-V2.parquet," "CoAt_CIC-UNSW-NB15_Feeded-V2.parquet," and "CoAt_CSE-CIC-IDS2018_Feeded.parquet." Each dataset represents a unique network environment, with each agent positioned in a distinct network segment to detect coordinated attacks. This diversity enables CIDS agents to capture a range of attack patterns, supporting the development of robust, flexible intrusion detection strategies across heterogeneous networks.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The CIC-IDS-V2 is an extended version of the original CIC-IDS 2017 dataset. The dataset is normalised and 1 new class called "Comb" is added which is a combination of synthesised data of multiple non-benign classes. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here:… See the full description on the dataset page: https://huggingface.co/datasets/abluva/CIC-IDS-2017-V2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detection results on the CICIDS2017 dataset (K = 10).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recent years have witnessed an in-depth proliferation of the Internet of Things (IoT) and Industrial Internet of Things (IIoT) systems linked to Industry 4.0 technology. The increasing rate of IoT device usage is associated with rising security risks resulting from malicious network flows during data exchange between the connected devices. Various security threats have shown high adverse effects on the availability, functionality, and usability of the devices among which denial of service (DoS) and distributed denial of service (DDoS), which attempt to exhaust the capacity of the IoT network (gateway), thereby causing failure in the functionality of the system have been more pronounced. Various machine learning and deep learning algorithms have been used to propose intelligent intrusion detection systems (IDS) to mitigate the challenging effects of these network threats. One concern is that although deep learning algorithms have shown good accuracy results on tabular data, not all deep learning algorithms can perform well on tabular datasets, which happen to be the most commonly available format of datasets for machine learning tasks. Again, there is also the challenge of model explainability and feature selection, which affect model performance. In this regard, we propose a model for IDS that uses attentive mechanisms to automatically select salient features from a dataset to train the IDS model and provide explainable results, the TabNet-IDS. We implement the proposed model using the TabNet algorithm based on PyTorch which is a deep-learning framework. The results obtained show that the TabNet architecture can be used on tabular datasets for IoT security to achieve good results comparable to those of neural networks, reaching an accuracy of 97% on CIC-IDS2017, 95% on CSE-CICIDS2018 and 98% on CIC-DDoS2019 datasets.
This dataset was created by lengxingxin
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Ben11304/Cic-Ids2017-DiFL dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model performance results based on the CICIDS2017 dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Selected Features via a proposed approach from CICIDS2017.
gyawalishiva/cic-ids-2017-textual dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-domain detection results on the CICIDS2017 dataset (K = 5).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS.