Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Canadian Institute for Cybersecurity Intrusion Detection System (CICIDS) dataset is a modern and comprehensive benchmark dataset for network intrusion detection research.
It was created by the Canadian Institute for Cybersecurity (CIC) in collaboration with industry partners to address the limitations of older datasets (such as KDD99 and NSL-KDD) by providing realistic traffic patterns, up-to-date attack types, and a balanced mix of normal and malicious activities.
The CICIDS dataset has become a widely adopted benchmark for evaluating Intrusion Detection Systems (IDS) due to its: - Rich feature set - Real-world attack scenarios - Balanced structure for training and testing models
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
An Intrusion Detection System (IDS) dataset is a collection of network traffic data, often labeled to distinguish between normal and malicious activities (intrusions or attacks). These datasets are crucial for developing, training, and evaluating Intrusion Detection Systems, which are security tools designed to monitor network traffic for suspicious behavior and alert administrators to potential threats.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions or anomalies on CANs. Producing vehicular CAN data with a variety of intrusions is a difficult task for most researchers as it requires expensive assets and deep expertise. To illuminate this task, we introduce the first comprehensive guide to the existing open CAN intrusion detection system (IDS) datasets. We categorize attacks on CANs including fabrication (adding frames, e.g., flooding or targeting and ID), suspension (removing an ID’s frames), and masquerade attacks (spoofed frames sent in lieu of suspended ones). We provide a quality analysis of each dataset; an enumeration of each datasets’ attacks, benefits, and drawbacks; categorization as real vs. simulated CAN data and real vs. simulated attacks; whether the data is raw CAN data or signal-translated; number of vehicles/CANs; quantity in terms of time; and finally a suggested use case of each dataset. State-of-the-art public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, lacking fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but is missing a corresponding “raw” binary version. This issue pigeon-holes CAN IDS research into testing on limited and often inappropriate data (usually with attacks that are too easily detectable to truly test the method). The scarcity of appropriate data has stymied comparability and reproducibility of results for researchers. As our primary contribution, we present the Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset, consisting of over 3.5 hours of one vehicle’s CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. To facilitate a benchmark for CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS research field.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Canadian Institute for Cybersecurity has published several datasets for network intrusion detection. Four of them: CIC-IDS2017, CIC-DoS2017, CSE-CIC-IDS2018 and CIC-DDoS2019 are collated here into one collection, cleaned up and with harmonized labeling.
The intent behind this collection is simple: to have a larger, more varied set of NIDS samples for more powerful analyses by researchers. Too often, researchers still rely on the individual datasets even though the full set is compatible out-of-the-box. The parts have been created for the same purpose and they have been processed with the same feature extraction tool chain.
This collection also takes into account 2 articles in which flawed features were discovered. Those features have been removed from the dataset. See the cleanup notebook for more information.
If you make use of this combined version, please credit the original authors. The relevant publications are cited here on Kaggle alongside the individual datasets and they are also readily available at the CIC's official dataset distribution page
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Cybersecurity Intrusion Detection Dataset is designed for detecting cyber intrusions based on network traffic and user behavior. Below, I’ll explain each aspect in detail, including the dataset structure, feature importance, possible analysis approaches, and how it can be used for machine learning.
The dataset consists of network-based and user behavior-based features. Each feature provides valuable information about potential cyber threats.
These features describe network-level information such as packet size, protocol type, and encryption methods.
network_packet_size (Packet Size in Bytes)
protocol_type (Communication Protocol)
encryption_used (Encryption Protocol)
These features track user activities, such as login attempts and session duration.
login_attempts (Number of Logins)
session_duration (Session Length in Seconds)
failed_logins (Failed Login Attempts)
unusual_time_access (Login Time Anomaly)
0 or 1) indicating whether access happened at an unusual time.ip_reputation_score (Trustworthiness of IP Address)
browser_type (User’s Browser)
attack_detected)1 means an attack was detected, 0 means normal activity.This dataset can be used for intrusion detection systems (IDS) and cybersecurity research. Some key applications include:
Supervised Learning Approaches
attack_detected as the target).Deep Learning Approaches
If attack labels are missing, anomaly detection can be used: - Autoencoders: Learn normal traffic and flag anomalies. - Isolation Forest: Detects outliers based on feature isolation. - One-Class SVM: Learns normal behavior and detects deviations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions or anomalies on CANs. Producing vehicular CAN data with a variety of intrusions is a difficult task for most researchers as it requires expensive assets and deep expertise. To illuminate this task, we introduce the first comprehensive guide to the existing open CAN intrusion detection system (IDS) datasets. We categorize attacks on CANs including fabrication (adding frames, e.g., flooding or targeting and ID), suspension (removing an ID’s frames), and masquerade attacks (spoofed frames sent in lieu of suspended ones). We provide a quality analysis of each dataset; an enumeration of each datasets’ attacks, benefits, and drawbacks; categorization as real vs. simulated CAN data and real vs. simulated attacks; whether the data is raw CAN data or signal-translated; number of vehicles/CANs; quantity in terms of time; and finally a suggested use case of each dataset. State-of-the-art public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, lacking fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but is missing a corresponding “raw” binary version. This issue pigeon-holes CAN IDS research into testing on limited and often inappropriate data (usually with attacks that are too easily detectable to truly test the method). The scarcity of appropriate data has stymied comparability and reproducibility of results for researchers. As our primary contribution, we present the Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset, consisting of over 3.5 hours of one vehicle’s CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. To facilitate a benchmark for CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS research field.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the continuous expansion of data exchange, the threat of cybercrime and network invasions is also on the rise. This project aims to address these concerns by investigating an innovative approach: an Attentive Transformer Deep Learning Algorithm for Intrusion Detection of IoT Systems using Automatic Xplainable Feature Selection. The primary focus of this project is to develop an effective Intrusion Detection System (IDS) using the aforementioned algorithm. To accomplish this, carefully curated datasets have been utilized, which have been created through a meticulous process involving data extraction from the University of New Brunswick repository. This repository houses the datasets used in this research and can be accessed publically in order to replicate the findings of this research.
Facebook
TwitterSecurity is the main challenge in Supervisory Control and Data Acquisition (SCADA) systems since SCADA systems must be connected to heterogeneous networks to save costs. SCADA devices such as RTUs have limited resources, so a small-scale cyber attack on a computer network will have a major impact on the SCADA system. This study discusses the SCADA system with the IEC 60870-5-104 protocol which is widely used in the power plant industry. A physical testbed is built to simulate the electrical distribution process. The SCADA system in the distribution section is more vulnerable than other parts because it is located directly in the community environment so that many holes can be entered by attackers. The purpose of this study is to obtain relevant datasets in the SCADA system. The simulation carried out in this study is a normal communication between the HMI and the RTU, then attacked to disrupt the communication. The attack activities carried out are port scan, brute force and DoS. DoS attacks carried out are ICMP flood, Syn flood, and IEC 104 flood. IEC 104 flood attack is a modified attack to attack RTU where RTU is flooded with an unknown typeid ASDU (Application Service Data Unit). Attacks are carried out using Kali Linux operating system. All scenarios are recorded and saved in pcap. To prove that there is attack data traffic on the IDS dataset Snort and Suricata are used to detect it. In this study, there are also intrusion detection performance results from Snort and Suricata
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For academic purposes
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9718963%2F285300ef3cd7e22695f09be521b9a448%2Funknown.png?generation=1738181187047409&alt=media" alt="">The dataset presented aims to support research in developing robust Intrusion Detection Systems (IDS) for modern networks. It simulates a network environment of a fictitious organization with multiple vulnerable hosts and strategic IDS deployments. The experimental setup uses virtual machines to emulate an attacker machine, vulnerable hosts, and IDS devices, connected via Open vSwitches (OVS) with port mirroring to capture traffic. Attack scenarios include multi-hop attacks targeting internal hosts by exploiting vulnerabilities and bypassing traffic restrictions. The raw PcapNG files are complemented with extracted features in CSV format, supporting Machine Learning (ML) analysis. The dataset is designed for training and evaluating IDS models capable of detecting complex, multi-stage attacks in realistic network environments.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of reliable test and validation datasets, anomaly-based intrusion detection approaches are suffering from consistent and accurate performance evolutions.
Our evaluations of the existing eleven datasets since 1998 show that most are out of date and unreliable. Some of these datasets suffer from the lack of traffic diversity and volumes, some do not cover the variety of known attacks, while others anonymize packet payload data, which cannot reflect the current trends. Some are also lacking feature set and metadata.
CICIDS2017 dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files). Also available is the extracted features definition.
Generating realistic background traffic was our top priority in building this dataset. We have used our proposed B-Profile system (Sharafaldin, et al. 2016) to profile the abstract behavior of human interactions and generates naturalistic benign background traffic. For this dataset, we built the abstract behaviour of 25 users based on the HTTP, HTTPS, FTP, SSH, and email protocols.
The data capturing period started at 9 a.m., Monday, July 3, 2017 and ended at 5 p.m. on Friday July 7, 2017, for a total of 5 days. Monday is the normal day and only includes the benign traffic. The implemented attacks include Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS. They have been executed both morning and afternoon on Tuesday, Wednesday, Thursday and Friday.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Intrusion Detection System Market Size 2024-2028
The intrusion detection system market size is forecast to increase by USD 4.65 billion at a CAGR of 14% between 2023 and 2028.
The market is witnessing significant growth due to the escalating number of cyberattacks and the need to secure IT service infrastructure, particularly in the banking and financial services industry (BFSI). IDS solutions employ two primary identification techniques: signature-based and anomaly detection. Signature-based identification relies on known attack patterns, while anomaly detection identifies deviations from normal behavior.
Additionally, with the rise in digital transactions, there is a growing emphasis on securing security architecture through traffic monitoring and intrusion detection. The market is driven by the increasing demand for BFSI applications and the subsequent need to protect against cyber threats. However, the high cost of maintaining IDS solutions remains a challenge. In conclusion, the IDS market is expected to continue growing as organizations prioritize securing their IT infrastructure against cyber threats.
What will be the Size of the Market During the Forecast Period?
Request Free Sample
The Intrusion Detection System (IDS) market is a significant segment of the cybersecurity industry, playing a crucial role in safeguarding IT infrastructure against various cyber threats. IDS solutions help identify and prevent unauthorized access, malicious activities, and potential security breaches. These systems can be categorized into Network Intrusion Detection Systems (NIDS) and Host-based Intrusion Detection Systems (HIDS). IDS and Intrusion Prevention Systems (IPS) are essential components of an organization's cybersecurity strategy. IPS goes beyond simple identification and provides real-time prevention of attacks. Both IDS and IPS are instrumental in mitigating risks from phishing incidents, cyberattacks, and other malicious threats.
Additionally, cybersecurity is a major concern for various sectors, including BFSI applications, telecom, defense, and cloud computing. With the increasing reliance on IT infrastructure and work from home arrangements, cybersecurity expenditure has seen a significant rise. IDS and IPS solutions are integral to securing data and maintaining information security. Cybercrimes are on the rise, with malicious threat actors constantly evolving their tactics. Traditional signature-based identification methods may not be sufficient to detect advanced threats. Anomaly detection, a key feature of modern IDS and IPS solutions, can help identify unusual patterns and potential threats. IDS and IPS solutions are not limited to protecting traditional IT infrastructure.
Simultaneously, they also play a vital role in securing cloud computing environments. IDS and IPS as part of IDP (Intrusion Detection and Prevention) systems offer advanced threat detection and prevention capabilities, ensuring comprehensive protection against cyberattacks. Ransomware attacks have emerged as a major concern, with their disruptive impact on business operations. IDS and IPS solutions can help prevent ransomware attacks by identifying and blocking malicious traffic before it can cause damage. In conclusion, IDS and IPS solutions are essential components of an effective cybersecurity strategy. They help organizations protect their IT infrastructure, data security, and information security against various cyber threats, including phishing incidents, cyberattacks, and malicious threat actors. The market for IDS and IPS solutions is expected to grow as organizations continue to invest in advanced cybersecurity solutions to mitigate risks and maintain business continuity.
How is this market segmented and which is the largest segment?
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Deployment
On-premises
Cloud-based
Geography
North America
US
APAC
China
Japan
Europe
Germany
UK
Middle East and Africa
South America
By Deployment Insights
The on-premises segment is estimated to witness significant growth during the forecast period.
The on-premises segment is projected to dominate the market in the US, with substantial growth in terms of revenue. Large enterprises, particularly those with a global footprint, are the primary consumers of on-premises intrusion detection systems. The primary reason for this preference is the control it offers over managing software assets, including data generated and stored within business applications. This deployment model enables organizations to ensure compliance with licensing agreements and automate tasks, making it an attractive choice for many busine
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been meticulously prepared and utilized as a validation set during the evaluation phase of "Meta IDS" to asses the performance of various machine learning models. It is now made available for interested users and researchers who seek a reliable and diverse dataset for training and testing their own custom models.
The validation dataset comprises a comprehensive collection of labeled entries, that determines whether the packet type is "malicious" or "benign." It covers complex design patterns that are commonly encountered in real-world applications. The dataset is designed to be representative, encompassing edge and fog layers that are in contact with cloud layer, thereby enabling thorough testing and evaluation of different models. Each sample in the dataset is labeled with the corresponding ground truth, providing a reliable reference for model performance evaluation.
To ensure convenient distribution and storage, the dataset has been broken down into three separate batches, each containing a portion of the dataset. This allows for convenient downloading and management of the dataset. The three batches are provided as individual compressed files.
In order to extract the data, follow the following instructions:
Once uncompressed, you will have access to the dataset in its original format for further exploration, analysis, and model training etc. The total storage required for extraction is approximately 800 GB in total, with the first batch requiring approximately 302 GB, the second batch requiring approximately 203 GB, and the third batch requiring approximately 297 GB of data storage.
The first batch contains 1,049,527,992 entries, where as the second batch contains 711,043,331 entries, and for the third and last batch we have 1,029,303,062 entries. The following table provides the feature names along with their explanation and example value once the dataset is extracted.
| Feature | Description | Example Value |
|---|---|---|
| ip.src | Source IP address in the packet | a05d4ecc38da01406c9635ec694917e969622160e728495e3169f62822444e17 |
| ip.dst | Destination IP address in the packet | a52db0d87623d8a25d0db324d74f0900deb5ca4ec8ad9f346114db134e040ec5 |
| frame.time_epoch | Epoch time of the frame | 1676165569.930869 |
| arp.hw.type | Hardware type | 1 |
| arp.hw.size | Hardware size | 6 |
| arp.proto.size | Protocol size | 4 |
| arp.opcode | Opcode | 2 |
| data.len | Length | 2713 |
| eth.dst.lg | Destination LG bit | 1 |
| eth.dst.ig | Destination IG bit | 1 |
| eth.src.lg | Source LG bit | 1 |
| eth.src.ig | Source IG bit | 1 |
| frame.offset_shift | Time shift for this packet | 0 |
| frame.len | frame length on the wire | 1208 |
| frame.cap_len | Frame length stored into the capture file | 215 |
| frame.marked | Frame is marked | 0 |
| frame.ignored | Frame is ignored | 0 |
| frame.encap_type | Encapsulation type | 1 |
| gre | Generic Routing Encapsulation | 'Generic Routing Encapsulation (IP)’ |
| ip.version | Version | 6 |
| ip.hdr_len | Header length | 24 |
| ip.dsfield.dscp | Differentiated Services Codepoint | 56 |
| ip.dsfield.ecn | Explicit Congestion Notification | 2 |
| ip.len | Total length | 614 |
| ip.flags.rb | Reserved bit | 0 |
| ip.flags.df | Don't fragment | 1 |
| ip.flags.mf | More fragments | 0 |
| ip.frag_offset | Fragment offset | 0 |
| ip.ttl | Time to live | 31 |
| ip.proto | Protocol | 47 |
| ip.checksum.status | Header checksum status | 2 |
| tcp.srcport | TCP source port | 53425 |
| tcp.flags | Flags | 0x00000098 |
| tcp.flags.ns | Nonce | 0 |
| tcp.flags.cwr | Congestion Window Reduced (CWR) | 1 |
| udp.srcport | UDP source port | 64413 |
| udp.dstport | UDP destination port | 54087 |
| udp.stream | Stream index | 1345 |
| udp.length | Length | 225 |
| udp.checksum.status | Checksum status | 3 |
| packet_type | Type of the packet which is either "benign" or "malicious" | 0 |
Furthermore, in compliance with the GDPR and to ensure the privacy of individuals, all IP addresses present in the dataset have been anonymized through hashing. This anonymization process helps protect the identity of individuals while preserving the integrity and utility of the dataset for research and model development purposes.
Please note that while the dataset provides valuable insights and a solid foundation for machine learning tasks, it is not a substitute for extensive real-world data collection. However, it serves as a valuable resource for researchers, practitioners, and enthusiasts in the machine learning community, offering a compliant and anonymized dataset for developing and validating custom models in a specific problem domain.
By leveraging the validation dataset for machine learning model evaluation and custom model training, users can accelerate their research and development efforts, building upon the knowledge gained from my thesis while contributing to the advancement of the field.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset constructed to trigger IDS rules based on the community data set of the Snort Intrusion Detection System
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Overview: This dataset presents a subset of network traffic data collected from 20 captures of malicious traffic and 3 captures of live benign traffic on Internet of Things (IoT) devices. It is primarily designed for the development and evaluation of Intrusion Detection Systems (IDS) targeted at IoT devices. The dataset, although not balanced, provides valuable insights into the detection of malicious activities within IoT networks. It contains a total of 23,000+ rows, with duplicates removed for clarity and efficiency.
Data Features: The dataset includes six key features extracted from the Zeek processing performed by the dataset creators. Each feature serves as a crucial input for building IDS models:
Responder's Port (id.resp_p): This feature denotes the port number of the responder in the network connection. It is represented as an integer.
Transport Layer Protocol (proto): Indicates the transport layer protocol used in the connection, with possible values being TCP, UDP, or ICMP (although only TCP and UDP are present in this subset). This feature is stored as a string.
Connection State (conn_state): Describes the state of the connection, using various indicators such as S0, S1, SF, REJ, among others. This feature is optional and stored as a string.
Number of Packets Sent by Originator (orig_pkts): Represents the count of packets transmitted by the originator in the connection. It is stored as an optional integer.
Number of IP Level Bytes Sent by Originator (orig_ip_bytes): Indicates the number of IP level bytes transmitted by the originator. It is stored as an optional integer.
Number of IP Level Bytes Sent by Responder (resp_ip_bytes): Denotes the number of IP level bytes sent by the responder in the connection. This feature is stored as an optional integer.
Target Label: The dataset is suited for binary classification tasks, particularly for distinguishing between malicious and benign traffic. The target label, represented by the 'label' feature, specifies whether a data point corresponds to malicious or benign activity. It is stored as a string with enumerated values: 'Malicious' or 'Benign'.
Data Preprocessing Recommendations: Given that the dataset lacks balanced representation and detailed criteria for sample selection, it's essential to preprocess the data before constructing models. To ensure best practices and model generalization, steps such as data balancing, feature scaling, and potentially feature engineering should be considered. A mock-up processing of this dataset into a model can serve as a preliminary step before utilizing the full dataset for training IDS models aimed at IoT devices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IEC 60870-5-104
Intrusion Detection Dataset
Readme File
ITHACA – University of Western Macedonia - https://ithaca.ece.uowm.gr/
Authors: Panagiotis Radoglou-Grammatikis, Thomas Lagkas, Vasileios Argyriou, Panagiotis Sarigiannidis
Publication Date: September 23, 2022
1.Introduction
The evolution of the Industrial Internet of Things (IIoT) introduces several benefits, such as real-time monitoring, pervasive control and self-healing. However, despite the valuable services, security and privacy issues still remain given the presence of legacy and insecure communication protocols like IEC 60870-5-104. IEC 60870-5-104 is an industrial protocol widely applied in critical infrastructures, such as the smart electrical grid and industrial healthcare systems. The IEC 60870-5-104 Intrusion Detection Dataset was implemented in the context of the research paper entitled "Modeling, Detecting, and Mitigating Threats Against Industrial Healthcare Systems: A Combined Software Defined Networking and Reinforcement Learning Approach" [1], in the context of two H2020 projects: ELECTRON: rEsilient and seLf-healed EleCTRical pOwer Nanogrid (101021936) and SDN-microSENSE: SDN - microgrid reSilient Electrical eNergy SystEm (833955). This dataset includes labelled Transmission Control Protocol (TCP)/Internet Protocol (IP) network flow statistics (Common-Separated Values (CSV) format) and IEC 60870-5-104 flow statistics (CSV format) related to twelve IEC 60870-5-104 cyberattacks. In particular, the cyberattacks are related to unauthorised commands and Denial of Service (DoS) activities against IEC 60870-5-104. Moreover, the relevant Packet Capture (PCAP) files are available. The dataset can be utilised for Artificial Intelligence (AI)-based Intrusion Detection Systems (IDS), taking full advantage of Machine Learning (ML) and Deep Learning (DL).
2.Instructions
The IEC 60870-5-104 dataset was implemented following the methodology of A. Gharib et al. in [2], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.
A network topology consisting of (a) seven industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to construct the IEC 60870-5-104 Intrusion Detection Dataset. The industrial entities use IEC TestServer[1], while the HMI uses Qtester104[2]. On the other hand, the cyberattackers use Kali Linux[3] equipped with Metasploit[4], OpenMUC j60870[5] and Ettercap[6]. The cyberattacks were performed during the following days.
For each attack, a 7zip file is provided, including the network traffic and the network flow statistics for each entity. Moreover, a relevant diagram is provided, illustrating the corresponding cyberattack. In particular, for each entity, a folder is given, including (a) the relevant pcap file, (b) Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics in a Common Separated Value (CSV) format and (c) IEC 60870-5-104 flow statistics in a CSV format. The TCP/IP network flow statistics were generated by CICFlowMeter[7], while the IEC 60870-5-104 flow statistics were generated based on a Custom IEC 60870-5-104 Python Parser[8], taking full advantage of Scapy[9].
3.Dataset Structure
The dataset consists of the following files:
Each 7zip file includes respective folders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the overall network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the overall network traffic, (c) the IEC 60870-5-104 network traffic (pcap file) related to this entity/device during each attack, (d) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the IEC 608770-5-104 network traffic, (e) the IEC 60870-5-104 flow statistics (CSV file) from the Custom IEC 60870-5-104 Python Parser for the IEC 608770-5-104 network traffic and finally, (f) an image showing how the attack was executed. Finally, it is noteworthy that the network flow from both CICFlowMeter and Custom IEC 60870-5-104 Python Parser in each CSV file are labelled based on the IEC 60870-5-104 cyberattacks executed for the generation of this dataset. The description of these attacks is given in the following section, while the various features from CICFlowMeter and Custom IEC 60870-5-104 Python Parser are presented in Section 5.
4.Testbed & IEC 60870-5-104 Attacks
The testbed created for generating this dataset is composed of five virtual RTU devices emulated by IEC TestServer and two real RTU devices. Moreover, there is another workstation which plays the role of Master Terminal Unit (MTU) and HMI, sending legitimate IEC 60870-5-104 commands to the corresponding RTUs. For this purpose, the workstation uses QTester104. In addition, there are three attackers that act as malicious insiders executing the following cyberattacks against the aforementioned RTUs. Finally, the network traffic data of each entity/device was captured through tshark.
Table 1: IEC 60870-5-104 Cyberattacks Description
|
IEC 60870-5-104 Cyberattack Description |
Description |
Dataset Files |
|
MITM Drop |
During this attack, the cyberattacker is placed between two endpoints, thus monitoring and dropping the network traffic |
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for resampled_IDS_datasets
Intrusion Detection Systems (IDS) play a crucial role in securing computer networks against malicious activities. However, their efficacy is consistently hindered by the persistent challenge of class imbalance in real-world datasets. While various methods, such as resampling techniques, ensemble methods, cost-sensitive learning, data augmentation, and so on, have individually addressed imbalance classification issues, there exists a notable… See the full description on the dataset page: https://huggingface.co/datasets/Thi-Thu-Huong/resampled_IDS_datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions or anomalies on CANs. Producing vehicular CAN data with a variety of intrusions is a difficult task for most researchers as it requires expensive assets and deep expertise. To illuminate this task, we introduce the first comprehensive guide to the existing open CAN intrusion detection system (IDS) datasets. We categorize attacks on CANs including fabrication (adding frames, e.g., flooding or targeting and ID), suspension (removing an ID’s frames), and masquerade attacks (spoofed frames sent in lieu of suspended ones). We provide a quality analysis of each dataset; an enumeration of each datasets’ attacks, benefits, and drawbacks; categorization as real vs. simulated CAN data and real vs. simulated attacks; whether the data is raw CAN data or signal-translated; number of vehicles/CANs; quantity in terms of time; and finally a suggested use case of each dataset. State-of-the-art public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, lacking fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but is missing a corresponding “raw” binary version. This issue pigeon-holes CAN IDS research into testing on limited and often inappropriate data (usually with attacks that are too easily detectable to truly test the method). The scarcity of appropriate data has stymied comparability and reproducibility of results for researchers. As our primary contribution, we present the Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset, consisting of over 3.5 hours of one vehicle’s CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. To facilitate a benchmark for CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS research field.
Facebook
TwitterThese are ADFA IDS datasets that contain network IDS datasets and host IDS datasets. These datasets were generated by former UNSW Ph.D. students, postdocs, and academic visitors under the supervision of Prof. Jiankun Hu, who acts as the communication contact. Please read through the file "How to use ADFA-IDS-Datasets, Giden's Ph. Thesis, and web page file for details. NGIDS-DS dataset: It was created by former Ph.D. student Mr. Waqas Haider. This dataset contains the network IDS dataset, which was generated at the next-generation cyber range infrastructure of the Australian Centre OF Cyber Security (ACCS) in the University of New South Wales (UNSW)@ Australian Defence Force Academy(ADFA), Canberra. It is part of the ongoing projects in the ADFA related to cyber security. ADFA-LD, ADFA-WD-SAA, and ADFA-WD datasets: They were coreated by former Ph.D. student Mr. Gideon Creech. They contain Windows host IDS datasets and stealthy attack IDS datasets. netflow_ids_label dataset: It was created by the academic visitor Dr. Quang Anh Tran and UNSW postdoc Dr. Frank Jiang, which provides network flow lables to the 1999 DARPA IDS dataset created by MIT. Please read the relevant real-time network flow publication paper attached. TSE-DS dataset: It was created by former Ph.D. students/postdocs Dr. Nam Tran and Dr. Xuefei Yin. It is a labeled false data injection attack detection dataset.
Facebook
TwitterThe following dataset was collected from a set of cybersecurity experiments conducted in an Electricity and Natural Gas environment. The architecture was instantiated within the powerNET testbed at Pacific Northwest National Laboratory, and is comprised of both simulated components and hardware-in-the-loop devices. The test environment consisted of a substation and control center network representative of electrical systems. In addition, it also contained a compressor station, and an odorizer and pressure regulation station representative of oil and natural gas systems. The various devices on the electrical and gas systems were organized into multiple networks to mimic real-world deployments. There were 14 testing scenarios overall that covered a wide variety of cybersecurity and infrastructure events.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Canadian Institute for Cybersecurity Intrusion Detection System (CICIDS) dataset is a modern and comprehensive benchmark dataset for network intrusion detection research.
It was created by the Canadian Institute for Cybersecurity (CIC) in collaboration with industry partners to address the limitations of older datasets (such as KDD99 and NSL-KDD) by providing realistic traffic patterns, up-to-date attack types, and a balanced mix of normal and malicious activities.
The CICIDS dataset has become a widely adopted benchmark for evaluating Intrusion Detection Systems (IDS) due to its: - Rich feature set - Real-world attack scenarios - Balanced structure for training and testing models