100+ datasets found

CIC-IDS 2018 Dataset
kaggle.com
zip
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nagi (2025). CIC-IDS 2018 Dataset [Dataset]. https://www.kaggle.com/datasets/primus11/cic-ids-2018-dataset/data
Explore at:
zip(80066040 bytes)Available download formats
Dataset updated
Aug 13, 2025
Authors
nagi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CICIDS Dataset

The Canadian Institute for Cybersecurity Intrusion Detection System (CICIDS) dataset is a modern and comprehensive benchmark dataset for network intrusion detection research.
It was created by the Canadian Institute for Cybersecurity (CIC) in collaboration with industry partners to address the limitations of older datasets (such as KDD99 and NSL-KDD) by providing realistic traffic patterns, up-to-date attack types, and a balanced mix of normal and malicious activities.

Key Characteristics

Realistic Traffic Generation: Traffic was captured in a controlled but realistic enterprise-like network, including servers, clients, switches, and routers.

Diverse Attack Scenarios:

Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS)

Brute force (SSH, FTP)

Web-based attacks (XSS, SQL Injection, Command Injection)

Infiltration from inside the network

Botnet activities

Port scanning and reconnaissance

Data Capture: Raw traffic was recorded in PCAP format.

Feature Extraction: Processed with CICFlowMeter to generate over 80 features, including:

Flow-based: Duration, total forward/backward packets, packet length statistics

Time-based: Inter-arrival times, active and idle times

Content-based: HTTP methods, DNS queries, and more

Labeling: Each network flow is annotated as either benign or belonging to a specific attack type.

Balance: Designed to include both normal and attack traffic with realistic distribution patterns.

Advantages

Reflects modern threats not covered in older datasets.

Provides detailed labels for fine-grained attack classification.

Suitable for both binary classification (normal vs. attack) and multi-class classification (attack type detection).

Enables research in machine learning, deep learning, and feature selection for IDS.

Usage

The CICIDS dataset has become a widely adopted benchmark for evaluating Intrusion Detection Systems (IDS) due to its: - Rich feature set - Real-world attack scenarios - Balanced structure for training and testing models
IDS Dataset 2025
kaggle.com
zip
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranto Kumar (2025). IDS Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/prantokumar/ids-dataset-2025
Explore at:
zip(775182589 bytes)Available download formats
Dataset updated
May 9, 2025
Authors
Pranto Kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
An Intrusion Detection System (IDS) dataset is a collection of network traffic data, often labeled to distinguish between normal and malicious activities (intrusions or attacks). These datasets are crucial for developing, training, and evaluating Intrusion Detection Systems, which are security tools designed to monitor network traffic for suspicious behavior and alert administrators to potential threats.
Open CAN IDS datasets’ attack metadata.
plos.figshare.com
xls
Updated Jan 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miki E. Verma; Robert A. Bridges; Michael D. Iannacone; Samuel C. Hollifield; Pablo Moriano; Steven C. Hespeler; Bill Kay; Frank L. Combs (2024). Open CAN IDS datasets’ attack metadata. [Dataset]. http://doi.org/10.1371/journal.pone.0296879.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0296879.t003
Dataset updated
Jan 22, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Miki E. Verma; Robert A. Bridges; Michael D. Iannacone; Samuel C. Hollifield; Pablo Moriano; Steven C. Hespeler; Bill Kay; Frank L. Combs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions or anomalies on CANs. Producing vehicular CAN data with a variety of intrusions is a difficult task for most researchers as it requires expensive assets and deep expertise. To illuminate this task, we introduce the first comprehensive guide to the existing open CAN intrusion detection system (IDS) datasets. We categorize attacks on CANs including fabrication (adding frames, e.g., flooding or targeting and ID), suspension (removing an ID’s frames), and masquerade attacks (spoofed frames sent in lieu of suspended ones). We provide a quality analysis of each dataset; an enumeration of each datasets’ attacks, benefits, and drawbacks; categorization as real vs. simulated CAN data and real vs. simulated attacks; whether the data is raw CAN data or signal-translated; number of vehicles/CANs; quantity in terms of time; and finally a suggested use case of each dataset. State-of-the-art public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, lacking fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but is missing a corresponding “raw” binary version. This issue pigeon-holes CAN IDS research into testing on limited and often inappropriate data (usually with attacks that are too easily detectable to truly test the method). The scarcity of appropriate data has stymied comparability and reproducibility of results for researchers. As our primary contribution, we present the Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset, consisting of over 3.5 hours of one vehicle’s CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. To facilitate a benchmark for CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS research field.
CIC-IDS-Collection
kaggle.com
huggingface.co
zip
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
StrGenIx | Laurens D'hooge (2022). CIC-IDS-Collection [Dataset]. https://www.kaggle.com/datasets/dhoogla/cicidscollection
Explore at:
zip(864681190 bytes)Available download formats
Dataset updated
Nov 9, 2022
Authors
StrGenIx | Laurens D'hooge
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Canadian Institute for Cybersecurity has published several datasets for network intrusion detection. Four of them: CIC-IDS2017, CIC-DoS2017, CSE-CIC-IDS2018 and CIC-DDoS2019 are collated here into one collection, cleaned up and with harmonized labeling.

The intent behind this collection is simple: to have a larger, more varied set of NIDS samples for more powerful analyses by researchers. Too often, researchers still rely on the individual datasets even though the full set is compatible out-of-the-box. The parts have been created for the same purpose and they have been processed with the same feature extraction tool chain.

This collection also takes into account 2 articles in which flawed features were discovered. Those features have been removed from the dataset. See the cleanup notebook for more information.

If you make use of this combined version, please credit the original authors. The relevant publications are cited here on Kaggle alongside the individual datasets and they are also readily available at the CIC's official dataset distribution page
Cybersecurity 🪪 Intrusion 🦠 Detection Dataset
kaggle.com
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dinesh Naveen Kumar Samudrala (2025). Cybersecurity 🪪 Intrusion 🦠 Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dnkumars/cybersecurity-intrusion-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dinesh Naveen Kumar Samudrala
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This Cybersecurity Intrusion Detection Dataset is designed for detecting cyber intrusions based on network traffic and user behavior. Below, I’ll explain each aspect in detail, including the dataset structure, feature importance, possible analysis approaches, and how it can be used for machine learning.

1. Understanding the Features

The dataset consists of network-based and user behavior-based features. Each feature provides valuable information about potential cyber threats.

A. Network-Based Features

These features describe network-level information such as packet size, protocol type, and encryption methods.

network_packet_size (Packet Size in Bytes)

Represents the size of network packets, ranging between 64 to 1500 bytes.

Packets on the lower end (~64 bytes) may indicate control messages, while larger packets (~1500 bytes) often carry bulk data.

Attackers may use abnormally small or large packets for reconnaissance or exploitation attempts.

protocol_type (Communication Protocol)

The protocol used in the session: TCP, UDP, or ICMP.

TCP (Transmission Control Protocol): Reliable, connection-oriented (common for HTTP, HTTPS, SSH).

UDP (User Datagram Protocol): Faster but less reliable (used for VoIP, streaming).

ICMP (Internet Control Message Protocol): Used for network diagnostics (ping); often abused in Denial-of-Service (DoS) attacks.

encryption_used (Encryption Protocol)

Values: AES, DES, None.

AES (Advanced Encryption Standard): Strong encryption, commonly used.

DES (Data Encryption Standard): Older encryption, weaker security.

None: Indicates unencrypted communication, which can be risky.

Attackers might use no encryption to avoid detection or weak encryption to exploit vulnerabilities.

B. User Behavior-Based Features

These features track user activities, such as login attempts and session duration.

login_attempts (Number of Logins)

High values might indicate brute-force attacks (repeated login attempts).

Typical users have 1–3 login attempts, while an attack may have hundreds or thousands.

session_duration (Session Length in Seconds)

A very long session might indicate unauthorized access or persistence by an attacker.

Attackers may try to stay connected to maintain access.

failed_logins (Failed Login Attempts)

High failed login counts indicate credential stuffing or dictionary attacks.

Many failed attempts followed by a successful login could suggest an account was compromised.

unusual_time_access (Login Time Anomaly)

A binary flag (0 or 1) indicating whether access happened at an unusual time.

Attackers often operate outside normal business hours to evade detection.

ip_reputation_score (Trustworthiness of IP Address)

A score from 0 to 1, where higher values indicate suspicious activity.

IP addresses associated with botnets, spam, or previous attacks tend to have higher scores.

browser_type (User’s Browser)

Common browsers: Chrome, Firefox, Edge, Safari.

Unknown: Could be an indicator of automated scripts or bots.

2. Target Variable (attack_detected)

Binary classification: 1 means an attack was detected, 0 means normal activity.

The dataset is useful for supervised machine learning, where a model learns from labeled attack patterns.

3. Possible Use Cases

This dataset can be used for intrusion detection systems (IDS) and cybersecurity research. Some key applications include:

A. Machine Learning-Based Intrusion Detection

Supervised Learning Approaches

Classification Models (Logistic Regression, Decision Trees, Random Forest, XGBoost, SVM)

Train the model using labeled data (attack_detected as the target).

Evaluate using accuracy, precision, recall, F1-score.

Deep Learning Approaches

Use Neural Networks (DNN, LSTM, CNN) for pattern recognition.

LSTMs work well for time-series-based network traffic analysis.

B. Anomaly Detection (Unsupervised Learning)

If attack labels are missing, anomaly detection can be used: - Autoencoders: Learn normal traffic and flag anomalies. - Isolation Forest: Detects outliers based on feature isolation. - One-Class SVM: Learns normal behavior and detects deviations.

C. Rule-Based Detection

If certain thresholds are met (e.g., failed_logins > 10 & ip_reputation_score > 0.8), an alert is triggered.

4. Challenges & Considerations

Adversarial Attacks: Attackers may modify traffic to evade detection.

Concept Drift: Cyber threats...
Logs in ROAD CAN intrusion detection dataset.
plos.figshare.com
xls
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miki E. Verma; Robert A. Bridges; Michael D. Iannacone; Samuel C. Hollifield; Pablo Moriano; Steven C. Hespeler; Bill Kay; Frank L. Combs (2024). Logs in ROAD CAN intrusion detection dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0296879.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0296879.t005
Dataset updated
Jan 22, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Miki E. Verma; Robert A. Bridges; Michael D. Iannacone; Samuel C. Hollifield; Pablo Moriano; Steven C. Hespeler; Bill Kay; Frank L. Combs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions or anomalies on CANs. Producing vehicular CAN data with a variety of intrusions is a difficult task for most researchers as it requires expensive assets and deep expertise. To illuminate this task, we introduce the first comprehensive guide to the existing open CAN intrusion detection system (IDS) datasets. We categorize attacks on CANs including fabrication (adding frames, e.g., flooding or targeting and ID), suspension (removing an ID’s frames), and masquerade attacks (spoofed frames sent in lieu of suspended ones). We provide a quality analysis of each dataset; an enumeration of each datasets’ attacks, benefits, and drawbacks; categorization as real vs. simulated CAN data and real vs. simulated attacks; whether the data is raw CAN data or signal-translated; number of vehicles/CANs; quantity in terms of time; and finally a suggested use case of each dataset. State-of-the-art public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, lacking fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but is missing a corresponding “raw” binary version. This issue pigeon-holes CAN IDS research into testing on limited and often inappropriate data (usually with attacks that are too easily detectable to truly test the method). The scarcity of appropriate data has stymied comparability and reproducibility of results for researchers. As our primary contribution, we present the Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset, consisting of over 3.5 hours of one vehicle’s CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. To facilitate a benchmark for CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS research field.
Network Intrusion Detection Datasets
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ogobuchi Daniel Okey; Demostenes Zegarra Rodriguez (2023). Network Intrusion Detection Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.23118164.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23118164.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Ogobuchi Daniel Okey; Demostenes Zegarra Rodriguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the continuous expansion of data exchange, the threat of cybercrime and network invasions is also on the rise. This project aims to address these concerns by investigating an innovative approach: an Attentive Transformer Deep Learning Algorithm for Intrusion Detection of IoT Systems using Automatic Xplainable Feature Selection. The primary focus of this project is to develop an effective Intrusion Detection System (IDS) using the aforementioned algorithm. To accomplish this, carefully curated datasets have been utilized, which have been created through a meticulous process involving data extraction from the University of New Brunswick repository. This repository houses the datasets used in this research and can be accessed publically in order to replicate the findings of this research.
Dataset for Network Intrusion Detection System on SCADA IEC 60870-5-104
zenodo.org
data.niaid.nih.gov
Updated Aug 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Agus Syamsul Arifin; M. Agus Syamsul Arifin; Deris Stiawan; Deris Stiawan; Susanto; Susanto; Rahmat Budiarto; Rahmat Budiarto; Mohd Yazid Idris; Mohd Yazid Idris (2022). Dataset for Network Intrusion Detection System on SCADA IEC 60870-5-104 [Dataset]. http://doi.org/10.5281/zenodo.7034534
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7034534
Dataset updated
Aug 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
M. Agus Syamsul Arifin; M. Agus Syamsul Arifin; Deris Stiawan; Deris Stiawan; Susanto; Susanto; Rahmat Budiarto; Rahmat Budiarto; Mohd Yazid Idris; Mohd Yazid Idris
Description
Security is the main challenge in Supervisory Control and Data Acquisition (SCADA) systems since SCADA systems must be connected to heterogeneous networks to save costs. SCADA devices such as RTUs have limited resources, so a small-scale cyber attack on a computer network will have a major impact on the SCADA system. This study discusses the SCADA system with the IEC 60870-5-104 protocol which is widely used in the power plant industry. A physical testbed is built to simulate the electrical distribution process. The SCADA system in the distribution section is more vulnerable than other parts because it is located directly in the community environment so that many holes can be entered by attackers. The purpose of this study is to obtain relevant datasets in the SCADA system. The simulation carried out in this study is a normal communication between the HMI and the RTU, then attacked to disrupt the communication. The attack activities carried out are port scan, brute force and DoS. DoS attacks carried out are ICMP flood, Syn flood, and IEC 104 flood. IEC 104 flood attack is a modified attack to attack RTU where RTU is flooded with an unknown typeid ASDU (Application Service Data Unit). Attacks are carried out using Kali Linux operating system. All scenarios are recorded and saved in pcap. To prove that there is attack data traffic on the IDS dataset Snort and Suricata are used to detect it. In this study, there are also intrusion detection performance results from Snort and Suricata
i
TOW-IDS: Automotive Ethernet Intrusion Dataset
ieee-dataport.org
Updated Nov 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MEE LAN HAN (2022). TOW-IDS: Automotive Ethernet Intrusion Dataset [Dataset]. https://ieee-dataport.org/documents/tow-ids-automotive-ethernet-intrusion-dataset
Explore at:
Dataset updated
Nov 1, 2022
Authors
MEE LAN HAN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For academic purposes
Dataset for Detection in Multi-IDS Environment
kaggle.com
zip
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arka Ghosh (2025). Dataset for Detection in Multi-IDS Environment [Dataset]. https://www.kaggle.com/datasets/arkaghoshcs/dataset-for-multi-ids-environment
Explore at:
zip(23900605 bytes)Available download formats
Dataset updated
Jan 29, 2025
Authors
Arka Ghosh
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9718963%2F285300ef3cd7e22695f09be521b9a448%2Funknown.png?generation=1738181187047409&alt=media" alt="">The dataset presented aims to support research in developing robust Intrusion Detection Systems (IDS) for modern networks. It simulates a network environment of a fictitious organization with multiple vulnerable hosts and strategic IDS deployments. The experimental setup uses virtual machines to emulate an attacker machine, vulnerable hosts, and IDS devices, connected via Open vSwitches (OVS) with port mirroring to capture traffic. Attack scenarios include multi-hop attacks targeting internal hosts by exploiting vulnerabilities and bypassing traffic restrictions. The raw PcapNG files are complemented with extracted features in CSV format, supporting Machine Learning (ML) analysis. The dataset is designed for training and evaluating IDS models capable of detecting complex, multi-stage attacks in realistic network environments.
Intrusion detection IDS Data cleaned
kaggle.com
zip
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
arar tawil (2024). Intrusion detection IDS Data cleaned [Dataset]. https://www.kaggle.com/datasets/araraltawil/ids-data-cleaned
Explore at:
zip(219896832 bytes)Available download formats
Dataset updated
Aug 4, 2024
Authors
arar tawil
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of reliable test and validation datasets, anomaly-based intrusion detection approaches are suffering from consistent and accurate performance evolutions.

Our evaluations of the existing eleven datasets since 1998 show that most are out of date and unreliable. Some of these datasets suffer from the lack of traffic diversity and volumes, some do not cover the variety of known attacks, while others anonymize packet payload data, which cannot reflect the current trends. Some are also lacking feature set and metadata.

CICIDS2017 dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files). Also available is the extracted features definition.

Generating realistic background traffic was our top priority in building this dataset. We have used our proposed B-Profile system (Sharafaldin, et al. 2016) to profile the abstract behavior of human interactions and generates naturalistic benign background traffic. For this dataset, we built the abstract behaviour of 25 users based on the HTTP, HTTPS, FTP, SSH, and email protocols.

The data capturing period started at 9 a.m., Monday, July 3, 2017 and ended at 5 p.m. on Friday July 7, 2017, for a total of 5 days. Monday is the normal day and only includes the benign traffic. The implemented attacks include Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS. They have been executed both morning and afternoon on Tuesday, Wednesday, Thursday and Friday.

Intrusion Detection System Market Analysis North America, APAC, Europe,...

technavio.com

pdf

Updated Oct 23, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2024). Intrusion Detection System Market Analysis North America, APAC, Europe, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/intrusion-detection-system-market-industry-analysis

Explore at:

pdfAvailable download formats

Dataset updated

Oct 23, 2024

Dataset provided by

TechNavio

Authors

Technavio

License

https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

Time period covered

2024 - 2028

Area covered

United Kingdom, United States

Description

Snapshot img

Intrusion Detection System Market Size 2024-2028

The intrusion detection system market size is forecast to increase by USD 4.65 billion at a CAGR of 14% between 2023 and 2028.

The market is witnessing significant growth due to the escalating number of cyberattacks and the need to secure IT service infrastructure, particularly in the banking and financial services industry (BFSI). IDS solutions employ two primary identification techniques: signature-based and anomaly detection. Signature-based identification relies on known attack patterns, while anomaly detection identifies deviations from normal behavior.
Additionally, with the rise in digital transactions, there is a growing emphasis on securing security architecture through traffic monitoring and intrusion detection. The market is driven by the increasing demand for BFSI applications and the subsequent need to protect against cyber threats. However, the high cost of maintaining IDS solutions remains a challenge. In conclusion, the IDS market is expected to continue growing as organizations prioritize securing their IT infrastructure against cyber threats.

What will be the Size of the Market During the Forecast Period?

Request Free Sample

The Intrusion Detection System (IDS) market is a significant segment of the cybersecurity industry, playing a crucial role in safeguarding IT infrastructure against various cyber threats. IDS solutions help identify and prevent unauthorized access, malicious activities, and potential security breaches. These systems can be categorized into Network Intrusion Detection Systems (NIDS) and Host-based Intrusion Detection Systems (HIDS). IDS and Intrusion Prevention Systems (IPS) are essential components of an organization's cybersecurity strategy. IPS goes beyond simple identification and provides real-time prevention of attacks. Both IDS and IPS are instrumental in mitigating risks from phishing incidents, cyberattacks, and other malicious threats.
Additionally, cybersecurity is a major concern for various sectors, including BFSI applications, telecom, defense, and cloud computing. With the increasing reliance on IT infrastructure and work from home arrangements, cybersecurity expenditure has seen a significant rise. IDS and IPS solutions are integral to securing data and maintaining information security. Cybercrimes are on the rise, with malicious threat actors constantly evolving their tactics. Traditional signature-based identification methods may not be sufficient to detect advanced threats. Anomaly detection, a key feature of modern IDS and IPS solutions, can help identify unusual patterns and potential threats. IDS and IPS solutions are not limited to protecting traditional IT infrastructure.
Simultaneously, they also play a vital role in securing cloud computing environments. IDS and IPS as part of IDP (Intrusion Detection and Prevention) systems offer advanced threat detection and prevention capabilities, ensuring comprehensive protection against cyberattacks. Ransomware attacks have emerged as a major concern, with their disruptive impact on business operations. IDS and IPS solutions can help prevent ransomware attacks by identifying and blocking malicious traffic before it can cause damage. In conclusion, IDS and IPS solutions are essential components of an effective cybersecurity strategy. They help organizations protect their IT infrastructure, data security, and information security against various cyber threats, including phishing incidents, cyberattacks, and malicious threat actors. The market for IDS and IPS solutions is expected to grow as organizations continue to invest in advanced cybersecurity solutions to mitigate risks and maintain business continuity.

How is this market segmented and which is the largest segment?

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Deployment

  On-premises
  Cloud-based


Geography

  North America

    US


  APAC

    China
    Japan


  Europe

    Germany
    UK


  Middle East and Africa



  South America

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.

The on-premises segment is projected to dominate the market in the US, with substantial growth in terms of revenue. Large enterprises, particularly those with a global footprint, are the primary consumers of on-premises intrusion detection systems. The primary reason for this preference is the control it offers over managing software assets, including data generated and stored within business applications. This deployment model enables organizations to ensure compliance with licensing agreements and automate tasks, making it an attractive choice for many busine

Federated Learning for Distributed Intrusion Detection Systems in Public...

zenodo.org
data.europa.eu

bz2

Updated May 23, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Alireza Bakhshi Zadi Mahmoodi; Alireza Bakhshi Zadi Mahmoodi; Panos Kostakos; Panos Kostakos (2023). Federated Learning for Distributed Intrusion Detection Systems in Public Networks - Validation Dataset [Dataset]. http://doi.org/10.5281/zenodo.7956304

Explore at:

bz2Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.7956304

Dataset updated

May 23, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Alireza Bakhshi Zadi Mahmoodi; Alireza Bakhshi Zadi Mahmoodi; Panos Kostakos; Panos Kostakos

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset has been meticulously prepared and utilized as a validation set during the evaluation phase of "Meta IDS" to asses the performance of various machine learning models. It is now made available for interested users and researchers who seek a reliable and diverse dataset for training and testing their own custom models.

The validation dataset comprises a comprehensive collection of labeled entries, that determines whether the packet type is "malicious" or "benign." It covers complex design patterns that are commonly encountered in real-world applications. The dataset is designed to be representative, encompassing edge and fog layers that are in contact with cloud layer, thereby enabling thorough testing and evaluation of different models. Each sample in the dataset is labeled with the corresponding ground truth, providing a reliable reference for model performance evaluation.

To ensure convenient distribution and storage, the dataset has been broken down into three separate batches, each containing a portion of the dataset. This allows for convenient downloading and management of the dataset. The three batches are provided as individual compressed files.

In order to extract the data, follow the following instructions:

Download and install bzip2 (if not already installed) from the official website or your package manager.
Place the compressed dataset file in a directory of your choice.
Open a terminal or command prompt and navigate to the directory where the compressed dataset file is located.
Execute the following command to uncompress the dataset:
- bzip2 -d filename.bz2
Replace "filename.bz2" with the actual name of the compressed dataset file.

Once uncompressed, you will have access to the dataset in its original format for further exploration, analysis, and model training etc. The total storage required for extraction is approximately 800 GB in total, with the first batch requiring approximately 302 GB, the second batch requiring approximately 203 GB, and the third batch requiring approximately 297 GB of data storage.

The first batch contains 1,049,527,992 entries, where as the second batch contains 711,043,331 entries, and for the third and last batch we have 1,029,303,062 entries. The following table provides the feature names along with their explanation and example value once the dataset is extracted.

Feature	Description	Example Value
ip.src	Source IP address in the packet	a05d4ecc38da01406c9635ec694917e969622160e728495e3169f62822444e17
ip.dst	Destination IP address in the packet	a52db0d87623d8a25d0db324d74f0900deb5ca4ec8ad9f346114db134e040ec5
frame.time_epoch	Epoch time of the frame	1676165569.930869
arp.hw.type	Hardware type	1
arp.hw.size	Hardware size	6
arp.proto.size	Protocol size	4
arp.opcode	Opcode	2
data.len	Length	2713
eth.dst.lg	Destination LG bit	1
eth.dst.ig	Destination IG bit	1
eth.src.lg	Source LG bit	1
eth.src.ig	Source IG bit	1
frame.offset_shift	Time shift for this packet	0
frame.len	frame length on the wire	1208
frame.cap_len	Frame length stored into the capture file	215
frame.marked	Frame is marked	0
frame.ignored	Frame is ignored	0
frame.encap_type	Encapsulation type	1
gre	Generic Routing Encapsulation	'Generic Routing Encapsulation (IP)’
ip.version	Version	6
ip.hdr_len	Header length	24
ip.dsfield.dscp	Differentiated Services Codepoint	56
ip.dsfield.ecn	Explicit Congestion Notification	2
ip.len	Total length	614
ip.flags.rb	Reserved bit	0
ip.flags.df	Don't fragment	1
ip.flags.mf	More fragments	0
ip.frag_offset	Fragment offset	0
ip.ttl	Time to live	31
ip.proto	Protocol	47
ip.checksum.status	Header checksum status	2
tcp.srcport	TCP source port	53425
tcp.flags	Flags	0x00000098
tcp.flags.ns	Nonce	0
tcp.flags.cwr	Congestion Window Reduced (CWR)	1
udp.srcport	UDP source port	64413
udp.dstport	UDP destination port	54087
udp.stream	Stream index	1345
udp.length	Length	225
udp.checksum.status	Checksum status	3
packet_type	Type of the packet which is either "benign" or "malicious"	0

Furthermore, in compliance with the GDPR and to ensure the privacy of individuals, all IP addresses present in the dataset have been anonymized through hashing. This anonymization process helps protect the identity of individuals while preserving the integrity and utility of the dataset for research and model development purposes.

Please note that while the dataset provides valuable insights and a solid foundation for machine learning tasks, it is not a substitute for extensive real-world data collection. However, it serves as a valuable resource for researchers, practitioners, and enthusiasts in the machine learning community, offering a compliant and anonymized dataset for developing and validating custom models in a specific problem domain.

By leveraging the validation dataset for machine learning model evaluation and custom model training, users can accelerate their research and development efforts, building upon the knowledge gained from my thesis while contributing to the advancement of the field.

Z
Data from: Dataset for IDS testing
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jun 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukaseder, Thomas; Wagner, Mathias (2020). Dataset for IDS testing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3892998
Explore at:
Dataset updated
Jun 14, 2020
Dataset provided by
Ulm University
Authors
Lukaseder, Thomas; Wagner, Mathias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset constructed to trigger IDS rules based on the community data set of the Snort Intrusion Detection System
IoTNet24 Dataset for IDS
kaggle.com
zip
Updated Mar 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wittigenZ (2024). IoTNet24 Dataset for IDS [Dataset]. https://www.kaggle.com/datasets/wittigenz/hydras
Explore at:
zip(123042 bytes)Available download formats
Dataset updated
Mar 27, 2024
Authors
wittigenZ
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Overview: This dataset presents a subset of network traffic data collected from 20 captures of malicious traffic and 3 captures of live benign traffic on Internet of Things (IoT) devices. It is primarily designed for the development and evaluation of Intrusion Detection Systems (IDS) targeted at IoT devices. The dataset, although not balanced, provides valuable insights into the detection of malicious activities within IoT networks. It contains a total of 23,000+ rows, with duplicates removed for clarity and efficiency.

Data Features: The dataset includes six key features extracted from the Zeek processing performed by the dataset creators. Each feature serves as a crucial input for building IDS models:

Responder's Port (id.resp_p): This feature denotes the port number of the responder in the network connection. It is represented as an integer.

Transport Layer Protocol (proto): Indicates the transport layer protocol used in the connection, with possible values being TCP, UDP, or ICMP (although only TCP and UDP are present in this subset). This feature is stored as a string.

Connection State (conn_state): Describes the state of the connection, using various indicators such as S0, S1, SF, REJ, among others. This feature is optional and stored as a string.

Number of Packets Sent by Originator (orig_pkts): Represents the count of packets transmitted by the originator in the connection. It is stored as an optional integer.

Number of IP Level Bytes Sent by Originator (orig_ip_bytes): Indicates the number of IP level bytes transmitted by the originator. It is stored as an optional integer.

Number of IP Level Bytes Sent by Responder (resp_ip_bytes): Denotes the number of IP level bytes sent by the responder in the connection. This feature is stored as an optional integer.

Target Label: The dataset is suited for binary classification tasks, particularly for distinguishing between malicious and benign traffic. The target label, represented by the 'label' feature, specifies whether a data point corresponds to malicious or benign activity. It is stored as a string with enumerated values: 'Malicious' or 'Benign'.

Data Preprocessing Recommendations: Given that the dataset lacks balanced representation and detailed criteria for sample selection, it's essential to preprocess the data before constructing models. To ensure best practices and model generalization, steps such as data balancing, feature scaling, and potentially feature engineering should be considered. A mock-up processing of this dataset into a model can serve as a preliminary step before utilizing the full dataset for training IDS models aimed at IoT devices.

IEC 60870-5-104 Intrusion Detection Dataset

zenodo.org
data.europa.eu

bin, pdf

Updated Jul 16, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Panagiotis; Panagiotis; Konstantinos; Thomas; Thomas; Vasileios; Vasileios; Panagiotis; Panagiotis; Konstantinos (2024). IEC 60870-5-104 Intrusion Detection Dataset [Dataset]. http://doi.org/10.21227/fj7s-f281

Explore at:

bin, pdfAvailable download formats

Unique identifier

https://doi.org/10.21227/fj7s-f281

Dataset updated

Jul 16, 2024

Dataset provided by

Zenodo

Authors

Panagiotis; Panagiotis; Konstantinos; Thomas; Thomas; Vasileios; Vasileios; Panagiotis; Panagiotis; Konstantinos

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IEC 60870-5-104

Intrusion Detection Dataset

Readme File

ITHACA – University of Western Macedonia - https://ithaca.ece.uowm.gr/

Authors: Panagiotis Radoglou-Grammatikis, Thomas Lagkas, Vasileios Argyriou, Panagiotis Sarigiannidis

Publication Date: September 23, 2022

1.Introduction

The evolution of the Industrial Internet of Things (IIoT) introduces several benefits, such as real-time monitoring, pervasive control and self-healing. However, despite the valuable services, security and privacy issues still remain given the presence of legacy and insecure communication protocols like IEC 60870-5-104. IEC 60870-5-104 is an industrial protocol widely applied in critical infrastructures, such as the smart electrical grid and industrial healthcare systems. The IEC 60870-5-104 Intrusion Detection Dataset was implemented in the context of the research paper entitled "Modeling, Detecting, and Mitigating Threats Against Industrial Healthcare Systems: A Combined Software Defined Networking and Reinforcement Learning Approach" [1], in the context of two H2020 projects: ELECTRON: rEsilient and seLf-healed EleCTRical pOwer Nanogrid (101021936) and SDN-microSENSE: SDN - microgrid reSilient Electrical eNergy SystEm (833955). This dataset includes labelled Transmission Control Protocol (TCP)/Internet Protocol (IP) network flow statistics (Common-Separated Values (CSV) format) and IEC 60870-5-104 flow statistics (CSV format) related to twelve IEC 60870-5-104 cyberattacks. In particular, the cyberattacks are related to unauthorised commands and Denial of Service (DoS) activities against IEC 60870-5-104. Moreover, the relevant Packet Capture (PCAP) files are available. The dataset can be utilised for Artificial Intelligence (AI)-based Intrusion Detection Systems (IDS), taking full advantage of Machine Learning (ML) and Deep Learning (DL).

2.Instructions

The IEC 60870-5-104 dataset was implemented following the methodology of A. Gharib et al. in [2], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.

A network topology consisting of (a) seven industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to construct the IEC 60870-5-104 Intrusion Detection Dataset. The industrial entities use IEC TestServer[1], while the HMI uses Qtester104[2]. On the other hand, the cyberattackers use Kali Linux[3] equipped with Metasploit[4], OpenMUC j60870[5] and Ettercap[6]. The cyberattacks were performed during the following days.

On Saturday, April 25, 2020, a DoS cyberattack (M_SP_NA_1_DoS) was executed for 2 hours, using the M_SP_NA_1 command.
On Sunday, April 26, 2020, two cyberattacks were executed, namely (a) DoS (C_CI_NA_1_DoS) and (b) unauthorised injection (C_CI_NA_1), using the C_CI_NA_1 command for 2 hours.
On Monday, April 27, 2020, one unauthorised injection attack (C_SE_NA_1) was executed for 4 hours, using the C_SE_NA_1 command.
Tuesday, April 28, 2020 two cyberattacks were executed, namely (a) unauthorised injection (C_SC_NA_1) and (b) DoS (C_SE_NA_1_DoS), using the C_SC_NA_1 and C_SE_NA_1 commands for 2 hours and 4 hours, respectively.
Wednesday, April 29, 2020, one DoS (C_SC_NA_1) cyberattack was performed for 2 hours, using the C_SC_NA_1 command.
Friday, June 05, 2020, two cyberattacks were executed, namely (a) DoS (C_RD_NA_1_DoS) and (b) unauthorised injection (C_RD_NA_1), using the C_RD_NA_1 command for 2 and 4 hours, respectively.
Saturday, June 06, 2020, two cyberattacks were executed, namely (a) DoS (C_RP_NA_1_DoS) and (b) unauthorised injection (C_RP_NA_1), using the C_RP_NA_1 command for 2 and 4 hours, respectively.
Monday, June 08, 2020, a Man In The Middle (MITM) cyberattack was executed for 2 hours, filtering and dropping the IEC 60870-5-104 packets.

For each attack, a 7zip file is provided, including the network traffic and the network flow statistics for each entity. Moreover, a relevant diagram is provided, illustrating the corresponding cyberattack. In particular, for each entity, a folder is given, including (a) the relevant pcap file, (b) Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics in a Common Separated Value (CSV) format and (c) IEC 60870-5-104 flow statistics in a CSV format. The TCP/IP network flow statistics were generated by CICFlowMeter[7], while the IEC 60870-5-104 flow statistics were generated based on a Custom IEC 60870-5-104 Python Parser[8], taking full advantage of Scapy[9].

3.Dataset Structure

The dataset consists of the following files:

20200425_UOWM_IEC104_Dataset_m_sp_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the M_SP_NA_1 attack.
20200426_UOWM_IEC104_Dataset_c_ci_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_CI_NA_1_DoS attack.
20200426_UOWM_IEC104_Dataset_c_ci_na_1.7z: A 7zip file including the pcap and CSV files related to C_CI_NA_1 attack.
20200427_UOWM_IEC104_Dataset_c_se_na_1.7z: A 7zip file including the pcap and CSV files related to the C_SE_NA_1 attack.
20200428_UOWM_IEC104_Dataset_c_sc_na_1.7z: A 7zip file including the pcap and CSV files related to the C_SC_NA_1 attack.
20200428_UOWM_IEC104_Dataset_c_se_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_SE_NA_1_DoS attack.
20200429_UOWM_IEC104_Dataset_c_sc_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_SC_NA_1_DoS attack.
20200605_UOWM_IEC104_Dataset_c_rd_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_RD_NA_1_DoS attack.
20200605_UOWM_IEC104_Dataset_c_rd_na_1.7z: A 7zip file including the pcap and CSV files related to the C_RD_NA_1 attack.
20200606_UOWM_IEC104_Dataset_c_rp_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_RP_NA_1_DoS attack.
20200606_UOWM_IEC104_Dataset_c_rp_na_1.7z: A 7zip file including the pcap and CSV files related to the C_RP_NA_1 attack.
20200608_UOWM_IEC104_Dataset_mitm_drop.7z: A 7zip file including the pcap and CSV files related to the MITM attack.
Balanced_IEC104_Train_Test_CSV_Files.zip: This zip file includes balanced CSV files from CICFlowMeter and the Custom IEC 60870-5-104 Python Parser that could be utilised for training ML and DL methods. The zip file includes different folders for the corresponding flow timeout values used for CICFlowMeter and IEC 60870-5-104 Python Parser, respectively.

Each 7zip file includes respective folders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the overall network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the overall network traffic, (c) the IEC 60870-5-104 network traffic (pcap file) related to this entity/device during each attack, (d) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the IEC 608770-5-104 network traffic, (e) the IEC 60870-5-104 flow statistics (CSV file) from the Custom IEC 60870-5-104 Python Parser for the IEC 608770-5-104 network traffic and finally, (f) an image showing how the attack was executed. Finally, it is noteworthy that the network flow from both CICFlowMeter and Custom IEC 60870-5-104 Python Parser in each CSV file are labelled based on the IEC 60870-5-104 cyberattacks executed for the generation of this dataset. The description of these attacks is given in the following section, while the various features from CICFlowMeter and Custom IEC 60870-5-104 Python Parser are presented in Section 5.

4.Testbed & IEC 60870-5-104 Attacks

The testbed created for generating this dataset is composed of five virtual RTU devices emulated by IEC TestServer and two real RTU devices. Moreover, there is another workstation which plays the role of Master Terminal Unit (MTU) and HMI, sending legitimate IEC 60870-5-104 commands to the corresponding RTUs. For this purpose, the workstation uses QTester104. In addition, there are three attackers that act as malicious insiders executing the following cyberattacks against the aforementioned RTUs. Finally, the network traffic data of each entity/device was captured through tshark.

Table 1: IEC 60870-5-104 Cyberattacks Description

IEC 60870-5-104 Cyberattack Description	Description	Dataset Files
MITM Drop	During this attack, the cyberattacker is placed between two endpoints, thus monitoring and dropping the network traffic

h
resampled_IDS_datasets
huggingface.co
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Le (2025). resampled_IDS_datasets [Dataset]. http://doi.org/10.57967/hf/4961
Explore at:
Unique identifier
https://doi.org/10.57967/hf/4961
Dataset updated
Jul 17, 2025
Authors
Le
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for resampled_IDS_datasets

Intrusion Detection Systems (IDS) play a crucial role in securing computer networks against malicious activities. However, their efficacy is consistently hindered by the persistent challenge of class imbalance in real-world datasets. While various methods, such as resampling techniques, ensemble methods, cost-sensitive learning, data augmentation, and so on, have individually addressed imbalance classification issues, there exists a notable… See the full description on the dataset page: https://huggingface.co/datasets/Thi-Thu-Huong/resampled_IDS_datasets.
Automotive CAN signal reverse engineering works.
plos.figshare.com
xls
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miki E. Verma; Robert A. Bridges; Michael D. Iannacone; Samuel C. Hollifield; Pablo Moriano; Steven C. Hespeler; Bill Kay; Frank L. Combs (2024). Automotive CAN signal reverse engineering works. [Dataset]. http://doi.org/10.1371/journal.pone.0296879.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0296879.t004
Dataset updated
Jan 22, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Miki E. Verma; Robert A. Bridges; Michael D. Iannacone; Samuel C. Hollifield; Pablo Moriano; Steven C. Hespeler; Bill Kay; Frank L. Combs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions or anomalies on CANs. Producing vehicular CAN data with a variety of intrusions is a difficult task for most researchers as it requires expensive assets and deep expertise. To illuminate this task, we introduce the first comprehensive guide to the existing open CAN intrusion detection system (IDS) datasets. We categorize attacks on CANs including fabrication (adding frames, e.g., flooding or targeting and ID), suspension (removing an ID’s frames), and masquerade attacks (spoofed frames sent in lieu of suspended ones). We provide a quality analysis of each dataset; an enumeration of each datasets’ attacks, benefits, and drawbacks; categorization as real vs. simulated CAN data and real vs. simulated attacks; whether the data is raw CAN data or signal-translated; number of vehicles/CANs; quantity in terms of time; and finally a suggested use case of each dataset. State-of-the-art public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, lacking fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but is missing a corresponding “raw” binary version. This issue pigeon-holes CAN IDS research into testing on limited and often inappropriate data (usually with attacks that are too easily detectable to truly test the method). The scarcity of appropriate data has stymied comparability and reproducibility of results for researchers. As our primary contribution, we present the Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset, consisting of over 3.5 hours of one vehicle’s CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. To facilitate a benchmark for CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS research field.
d
ADFA IDS (Intrusion detection systems) datasets comprising labeled host,...
dataone.org
dataverse.harvard.edu
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNSW Canberra (2024). ADFA IDS (Intrusion detection systems) datasets comprising labeled host, network and windows stealthy attacks settings [Dataset]. http://doi.org/10.7910/DVN/IFTZPF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/IFTZPF
Dataset updated
Mar 6, 2024
Dataset provided by
Harvard Dataverse
Authors
UNSW Canberra
Description
These are ADFA IDS datasets that contain network IDS datasets and host IDS datasets. These datasets were generated by former UNSW Ph.D. students, postdocs, and academic visitors under the supervision of Prof. Jiankun Hu, who acts as the communication contact. Please read through the file "How to use ADFA-IDS-Datasets, Giden's Ph. Thesis, and web page file for details. NGIDS-DS dataset: It was created by former Ph.D. student Mr. Waqas Haider. This dataset contains the network IDS dataset, which was generated at the next-generation cyber range infrastructure of the Australian Centre OF Cyber Security (ACCS) in the University of New South Wales (UNSW)@ Australian Defence Force Academy(ADFA), Canberra. It is part of the ongoing projects in the ADFA related to cyber security. ADFA-LD, ADFA-WD-SAA, and ADFA-WD datasets: They were coreated by former Ph.D. student Mr. Gideon Creech. They contain Windows host IDS datasets and stealthy attack IDS datasets. netflow_ids_label dataset: It was created by the academic visitor Dr. Quang Anh Tran and UNSW postdoc Dr. Frank Jiang, which provides network flow lables to the 1999 DARPA IDS dataset created by MIT. Please read the relevant real-time network flow publication paper attached. TSE-DS dataset: It was created by former Ph.D. students/postdocs Dr. Nam Tran and Dr. Xuefei Yin. It is a labeled false data injection attack detection dataset.
o
Electricity and Gas IDS Dataset
osti.gov
Updated Nov 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOE (2021). Electricity and Gas IDS Dataset [Dataset]. http://doi.org/10.25584/PNNLDH/1839095
Explore at:
Unique identifier
https://doi.org/10.25584/PNNLDH/1839095
Dataset updated
Nov 1, 2021
Dataset provided by
DOE
Pacific Northwest National Laboratory 2
Description
The following dataset was collected from a set of cybersecurity experiments conducted in an Electricity and Natural Gas environment. The architecture was instantiated within the powerNET testbed at Pacific Northwest National Laboratory, and is comprised of both simulated components and hardware-in-the-loop devices. The test environment consisted of a substation and control center network representative of electrical systems. In addition, it also contained a compressor station, and an odorizer and pressure regulation station representative of oil and natural gas systems. The various devices on the electrical and gas systems were organized into multiple networks to mimic real-world deployments. There were 14 testing scenarios overall that covered a wide variety of cybersecurity and infrastructure events.

Facebook

Twitter

Click to copy link

Link copied

Cite

nagi (2025). CIC-IDS 2018 Dataset [Dataset]. https://www.kaggle.com/datasets/primus11/cic-ids-2018-dataset/data

CIC-IDS 2018 Dataset

CIC- IDS data for Intrusion detection system

Explore at:

zip(80066040 bytes)Available download formats

Dataset updated

Aug 13, 2025

Authors

nagi

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

CICIDS Dataset

The Canadian Institute for Cybersecurity Intrusion Detection System (CICIDS) dataset is a modern and comprehensive benchmark dataset for network intrusion detection research.
It was created by the Canadian Institute for Cybersecurity (CIC) in collaboration with industry partners to address the limitations of older datasets (such as KDD99 and NSL-KDD) by providing realistic traffic patterns, up-to-date attack types, and a balanced mix of normal and malicious activities.

Key Characteristics

Realistic Traffic Generation: Traffic was captured in a controlled but realistic enterprise-like network, including servers, clients, switches, and routers.
Diverse Attack Scenarios:
- Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS)
- Brute force (SSH, FTP)
- Web-based attacks (XSS, SQL Injection, Command Injection)
- Infiltration from inside the network
- Botnet activities
- Port scanning and reconnaissance
Data Capture: Raw traffic was recorded in PCAP format.
Feature Extraction: Processed with CICFlowMeter to generate over 80 features, including:
- Flow-based: Duration, total forward/backward packets, packet length statistics
- Time-based: Inter-arrival times, active and idle times
- Content-based: HTTP methods, DNS queries, and more
Labeling: Each network flow is annotated as either benign or belonging to a specific attack type.
Balance: Designed to include both normal and attack traffic with realistic distribution patterns.

Advantages

Reflects modern threats not covered in older datasets.
Provides detailed labels for fine-grained attack classification.
Suitable for both binary classification (normal vs. attack) and multi-class classification (attack type detection).
Enables research in machine learning, deep learning, and feature selection for IDS.

Usage

The CICIDS dataset has become a widely adopted benchmark for evaluating Intrusion Detection Systems (IDS) due to its: - Rich feature set - Real-world attack scenarios - Balanced structure for training and testing models

Clear search

Close search

Google apps

Main menu

CIC-IDS 2018 Dataset

CICIDS Dataset

Key Characteristics

Advantages

Usage

IDS Dataset 2025

Open CAN IDS datasets’ attack metadata.

CIC-IDS-Collection

Cybersecurity 🪪 Intrusion 🦠 Detection Dataset

1. Understanding the Features

A. Network-Based Features

B. User Behavior-Based Features

2. Target Variable (attack_detected)

3. Possible Use Cases

A. Machine Learning-Based Intrusion Detection

B. Anomaly Detection (Unsupervised Learning)

C. Rule-Based Detection

4. Challenges & Considerations

Logs in ROAD CAN intrusion detection dataset.

Network Intrusion Detection Datasets

Dataset for Network Intrusion Detection System on SCADA IEC 60870-5-104

TOW-IDS: Automotive Ethernet Intrusion Dataset

Dataset for Detection in Multi-IDS Environment

Intrusion detection IDS Data cleaned

Intrusion Detection System Market Analysis North America, APAC, Europe,...

Snapshot img

Federated Learning for Distributed Intrusion Detection Systems in Public...

Data from: Dataset for IDS testing

IoTNet24 Dataset for IDS

IEC 60870-5-104 Intrusion Detection Dataset

resampled_IDS_datasets

Automotive CAN signal reverse engineering works.

ADFA IDS (Intrusion detection systems) datasets comprising labeled host,...

Electricity and Gas IDS Dataset

CIC-IDS 2018 Dataset

CIC- IDS data for Intrusion detection system

CICIDS Dataset

Key Characteristics

Advantages

Usage

2. Target Variable (`attack_detected`)