Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This directory consists on 24x24 images
train folder have total 1548421 images from 10 classes test folder have 663609 images from 10 classes
This Dataset is Spectrogram converted images using method explained in our research article XYZ. Dataset Used: Intrusion detection evaluation dataset (CIC-IDS2017) Image Size: 28x28 Classes:
BENIGN Bot DDoS DoS GoldenEye DoS Hulk DoS Slowhttptest DoS slowloris Heartbleed Infiltration PortScan
License: https://www.unb.ca/cic/datasets/ids-2017.html… See the full description on the dataset page: https://huggingface.co/datasets/rashid-rao/CICIDS2017-Images-spectrograms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of stream records in CICIDS2017 dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
makekali/CIC-IDS-2017 dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Using NLFlowLyzer, we successfully generated the “BCCC-CIC-IDS2017” dataset by extracting key flows from raw network traffic data of CIC-IDS2017, resulting in CSV files integrating essential network and transport layer features. This new dataset offers a structured approach for analyzing intrusion detection, combining diverse traffic types into multiple sub-categories. The “BCCC-CIC-IDS2017” dataset enriches the depth and variety needed to rigorously evaluate our proposed profiling model, advancing research in network security and enhancing the development of intrusion detection systems.
The full research paper outlining the details of the dataset and its underlying principles:
“NTLFlowLyzer: Toward Generating an Intrusion Detection Dataset and Intruders Behavior Profiling through Network Layer Traffic Analysis and Pattern Extraction, MohammadMoein Shafi, Arash Habibi Lashkari, Arousha Haghighian Roudsari, Computer & Security, Computers & Security, 104160, ISSN 0167-4048 (2024)” https://doi.org/10.1016/j.cose.2024.104160
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The network intrusion detection system (NIDS) plays a critical role in maintaining network security. However, traditional NIDS relies on a large volume of samples for training, which exhibits insufficient adaptability in rapidly changing network environments and complex attack methods, especially when facing novel and rare attacks. As attack strategies evolve, there is often a lack of sufficient samples to train models, making it difficult for traditional methods to respond quickly and effectively to new threats. Although existing few-shot network intrusion detection systems have begun to address sample scarcity, these systems often fail to effectively capture long-range dependencies within the network environment due to limited observational scope. To overcome these challenges, this paper proposes a novel elevated few-shot network intrusion detection method based on self-attention mechanisms and iterative refinement. This approach leverages the advantages of self-attention to effectively extract key features from network traffic and capture long-range dependencies. Additionally, the introduction of positional encoding ensures the temporal sequence of traffic is preserved during processing, enhancing the model’s ability to capture temporal dynamics. By combining multiple update strategies in meta-learning, the model is initially trained on a general foundation during the training phase, followed by fine-tuning with few-shot data during the testing phase, significantly reducing sample dependency while improving the model’s adaptability and prediction accuracy. Experimental results indicate that this method achieved detection rates of 99.90% and 98.23% on the CICIDS2017 and CICIDS2018 datasets, respectively, using only 10 samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model performance results based on the CICIDS2017 dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The CIC-IDS-V2 is an extended version of the original CIC-IDS 2017 dataset. The dataset is normalised and 1 new class called "Comb" is added which is a combination of synthesised data of multiple non-benign classes. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here:… See the full description on the dataset page: https://huggingface.co/datasets/abluva/CIC-IDS-2017-V2.
This dataset was created by lengxingxin
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detection results on the CICIDS2017 dataset (K = 10).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison with recent methods on the CICIDS2017 dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-domain detection results on the CICIDS2017 dataset (K = 5).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hyperparameter experiments of CICIDS2017 data set.
bvk/CICIDS-2017-plus dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-domain detection results on the CICIDS2017 dataset (K = 10).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Pankaj Gupta
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Software Defined Networking (SDN) is an emerging network architecture and management method, whose core idea is to separate the network control plane from the data transmission plane. It is precisely because of this characteristic that SDN controllers are susceptible to external malicious attacks, the most common of which are Distributed Denial of Service (DDoS) attacks. This paper suggests a way to find DDoS attacks called ConvLTSM-MHA-TWD. It is based on the Convolutional Long Short-Term Memory Network (ConvLSTM) and three-way decision (TWD). It solves the problem of insufficient feature extraction in SDN environment and improves classification accuracy. This method uses ConvLSTM to extract data features, and uses multi-head attention (MHA) mechanism to learn the long-distance dependence relationship in the input data, and then constructs multi-granularity feature space. ConvLSTM and MHA outputs are added to form a residual connection to further enhance feature extraction and timing modeling capabilities and solve the problem of gradient disappearance during model training. Then the three-way decision theory is used to make decisions on network behaviors immediately. For the network behaviors that cannot be made immediately, the delayed decision is made, and the feature extraction and decision are made on this part of the network behaviors again. Finally, the classification results are output. This paper conducted experiments on data sets CICIDS2017 and DDoS SDN, with accuracy rates of 0.994 and 0.977, respectively, which has better overall performance, and is suitable for training large amounts of data.
This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.
Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% from selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS.