Facebook
TwitterBoTNeTIoT-L01 is a data set integrated all the IoT devices data file from the detection_of_IoT_botnet_attacks_N_BaIoT (BoTNeTIoT) data set. This new version reduced the redundancy of the original dataset by choosing the features of 10 seconds time window only. In the dataset class label, 0 stands for attacks, and 1 stands for normal samples.
The BoTNeTIoT-L01, the most recent dataset, contains nine IoT devices traffic sniffed using Wireshark in a local network using a central switch. It includes two Botnet attacks (Mirai and Gafgyt). The dataset contains twenty-three statistically engineered features extracted from the .pcap files. Seven statistical measures were computed (mean, variance, count, magnitude, radius, covariance, correlation coefficient) over the time window of 10 sec with decay factor equals 0.1. The decay factor value is used in the dataset as well as in our papers below [2],[3],[4], and [5] to refer to its corresponding time window as L0.1. Four features were extracted from the .pcap: packet count, jitter, size of outbound packets only, and outbound and inbound packets together. For each of these four features, three or more statistical measures were computed, resulting in twenty-three features.
-- References to the article where the dataset was initially described and used. Please, cite all the papers below: [1] A. Alhowaide, I. Alsmadi, J. Tang. “Towards the design of real-time autonomous IoT NIDS”, Cluster Computing (2021), pages 1-14, Jan 2021. [2] A. Alhowaide, I. Alsmadi, J. Tang, “Features Quality Impact on Cyber Physical Security Systems”, 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Oct. 2019.
-- References to the article where the dataset was used: [3] A. Alhowaide, I. Alsmadi, J. Tang. “PCA, Random-Forest and Pearson Correlation for Dimensionality Reduction in IoT IDS”, 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), pages. 1-6. Vancouver, BC, Canada, Sept. 2020. [4] A. Alhowaide, I. Alsmadi, J. Tang. “An Ensemble Feature Selection Method for IoT IDS”, 2020 IEEE 6th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (DependSys), Fiji, Dec. 2020.
Facebook
TwitterThis project we used IoT Network intrusion dataset from the following site:
https://sites.google.com/view/iot-network-intrusion-dataset/home?pli=1
We improved the dataset with the following:
1- Preprocessing the dataset
2- Undersampling
Coming up soon as we are working on oversampling the dataset using SMOTE technique.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
building IoT IDS requires the availability of datasets to process
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai
There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dragon_Pi
For a more in depth description of the Dragon_Pi dataset, please consult the journal article of the same name:
Lightbody et al., Future Internet, 2024, https://doi.org/10.3390/fi16030088 - specifically Section 3.2: Dataset Overview.
Dragon_Pi is an intrusion detection dataset for IoT devices. In the field of IoT security there are few datasets, and those which do exist tend to focus solely on network traffic. The Dragon_Pi dataset seeks to provide not only more data for the field of IoT security, but also, data of a somewhat under-published type: linear time series power consumption data.
Dragon_Pi is a fully labelled Intrusion Detection dataset for IoT devices. It is composed of both normal and under-attack power consumption data obtained from two separate testbeds - one using a DragonBoard 410c and the other a Raspberry Pi Model 3 - Hence the moniker Dragon_Pi.
These testbeds were set up with predefined normal behavour as described in the attached publications. The normal linear time series power consumption was sampled from the testbed under these normal conditions. Both testbeds were then attacked using some common attacks on IoT - the linear time series power consumption captured under these condtions as well.
Specifically, the testbeds were subjected to the Port Scan (using Nmap), SSH Brute Force (using Hydra) and SYNFlood Denial of Service (using Hping3) attacks. These attacks were repeated to gain insight to what their signatures looked like and also how varying the tool settings effected the resultant signature. A fourth type of scenario was also conducted on the testbeds - the "Capture the Flag" scenarios. In these files multiple attack types were used with a more specific target - to exfiltrate a hidden file from the testbeds.
Each file has three hierarchical levels of annotation for each sample within:
A simple "Normal or Anomaly" label for the specific sample
A specifc attack type label e.g. "SSH Bruteforce", for the specific sample
A specific tool setting for that attack e.g. "Hydra_T16", for the specific sample
Users can decide for themselves what level of annotation they require for their specific task.
Each file in the Dragon_Pi dataset is accompanied by its own legend file. This file explains the contents of the specific .csv file and the specific indexes of the events within.
The Dragon_Pi dataset consists of approximately 67 files, as shown in Table 1. Compressed, the datset totals approximately 13GB. Completely decompressed the dataset is approximately 80GB ( 30GB Pi data, 50 GB Dragon data).
Label Type Specific Label Number of Files DragonBoard 410c Number of Files Raspberry Pi
Normal Normal 3 2
Port Scan Attack Nmap_T5 2 1
Nmap_T4 1 1
Nmap_T3 1 1
Nmap_T2 1 1
SSH Brute Force Hydra_T32 4 2
Hydra_T16 16 2
Hydra_T3 8 2
Hydra_T1 5 2
SYNFlood DOS SYNFlood DOS 1 1
Capture the Flag Misc Attacks 3 5
Table 1. Enumeration of the in the Dragon_Pi dataset.
For a more in depth description of the Dragon_Pi dataset, please consult the journal article of the same name:
Lightbody et al., Future Internet, 2024, https://doi.org/10.3390/fi16030088 - specifically Section 3.2: Dataset Overview.
Publication of this dataset:
This dataset was published in Lightbody et al., Future Internet, 2024, https://doi.org/10.3390/fi16030088. Consult and cite this article for a more in depth dataset description, as well as an in depth review of first AI Intrusion Detection model trained on this dataset.
See article Lightbody et al., Future Internet, 2023, https://doi.org/10.3390/fi15050187 for a detailed investigation on the attack signatures discovered while creating this dataset. This work was an inital investigation of the dataset and can serve as a part 1 to the Dragon_Pi paper.
How to cite this dataset in your work:
Please cite these two DOIs when publishing using this dataset:
Dragon_Pi release publication: https://doi.org/10.3390/fi16030088 (most important)
Zenodo Dataset DOI: https://doi.org/10.5281/zenodo.10784947
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The exponential growth of the Internet of Things (IoT) devices provides a large attack surface for intruders to launch more destructive cyber-attacks. The intruder aimed to exhaust the target IoT network resources with malicious activity. New techniques and detection algorithms required a well-designed dataset for IoT networks. We proposed a new dataset, namely IoTID20, generated dataset from [1]. The new IoT botnet dataset has a more comprehensive network and flow-based features. The flow-based feature can be used to analyze and evaluate a flow-based intrusion detection system. Our proposed IoT botnet dataset will provide a reference point to identify anomalous activity across the IoT networks. The IoT Botnet dataset can be accessed from [2]. The new IoTID20 dataset will provide a foundation for the development of new intrusion detection techniques in IoT networks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the continuous expansion of data exchange, the threat of cybercrime and network invasions is also on the rise. This project aims to address these concerns by investigating an innovative approach: an Attentive Transformer Deep Learning Algorithm for Intrusion Detection of IoT Systems using Automatic Xplainable Feature Selection. The primary focus of this project is to develop an effective Intrusion Detection System (IDS) using the aforementioned algorithm. To accomplish this, carefully curated datasets have been utilized, which have been created through a meticulous process involving data extraction from the University of New Brunswick repository. This repository houses the datasets used in this research and can be accessed publically in order to replicate the findings of this research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
namely
Facebook
TwitterThe DS2OS Dataset is a crucial resource designed for researchers and developers focused on Intrusion Detection Systems (IDS) and security solutions for smart home environments. This dataset was generated within a smart home setting and includes traces from a wide range of Internet of Things (IoT) devices, such as:
The dataset captures the communication between these devices and provides detailed information on network activity. It includes attributes such as:
The dataset consists of 7 malicious classes and one normal class, which are:
This dataset is invaluable for developing and testing intrusion detection techniques tailored for smart home environments and IoT networks.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 1,300 data of IoT network traffic tailored for Arduino board–based intrusion detection systems. It includes features such as flow duration, packet counts, packet sizes, and network protocols, along with a target label identifying traffic as Normal, DoS, or Probe.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Article Information
The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.
Please do cite the aforementioned article when using this dataset.
Abstract
The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.
ZIP Folder Content
The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.
To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.
This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.
Datasets' Content
Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.
Identified Key Features Within Bluetooth Dataset
Feature Meaning
btle.advertising_header BLE Advertising Packet Header
btle.advertising_header.ch_sel BLE Advertising Channel Selection Algorithm
btle.advertising_header.length BLE Advertising Length
btle.advertising_header.pdu_type BLE Advertising PDU Type
btle.advertising_header.randomized_rx BLE Advertising Rx Address
btle.advertising_header.randomized_tx BLE Advertising Tx Address
btle.advertising_header.rfu.1 Reserved For Future 1
btle.advertising_header.rfu.2 Reserved For Future 2
btle.advertising_header.rfu.3 Reserved For Future 3
btle.advertising_header.rfu.4 Reserved For Future 4
btle.control.instant Instant Value Within a BLE Control Packet
btle.crc.incorrect Incorrect CRC
btle.extended_advertising Advertiser Data Information
btle.extended_advertising.did Advertiser Data Identifier
btle.extended_advertising.sid Advertiser Set Identifier
btle.length BLE Length
frame.cap_len Frame Length Stored Into the Capture File
frame.interface_id Interface ID
frame.len Frame Length Wire
nordic_ble.board_id Board ID
nordic_ble.channel Channel Index
nordic_ble.crcok Indicates if CRC is Correct
nordic_ble.flags Flags
nordic_ble.packet_counter Packet Counter
nordic_ble.packet_time Packet time (start to end)
nordic_ble.phy PHY
nordic_ble.protover Protocol Version
Identified Key Features Within IP-Based Packets Dataset
Feature Meaning
http.content_length Length of content in an HTTP response
http.request HTTP request being made
http.response.code Sequential number of an HTTP response
http.response_number Sequential number of an HTTP response
http.time Time taken for an HTTP transaction
tcp.analysis.initial_rtt Initial round-trip time for TCP connection
tcp.connection.fin TCP connection termination with a FIN flag
tcp.connection.syn TCP connection initiation with SYN flag
tcp.connection.synack TCP connection establishment with SYN-ACK flags
tcp.flags.cwr Congestion Window Reduced flag in TCP
tcp.flags.ecn Explicit Congestion Notification flag in TCP
tcp.flags.fin FIN flag in TCP
tcp.flags.ns Nonce Sum flag in TCP
tcp.flags.res Reserved flags in TCP
tcp.flags.syn SYN flag in TCP
tcp.flags.urg Urgent flag in TCP
tcp.urgent_pointer Pointer to urgent data in TCP
ip.frag_offset Fragment offset in IP packets
eth.dst.ig Ethernet destination is in the internal network group
eth.src.ig Ethernet source is in the internal network group
eth.src.lg Ethernet source is in the local network group
eth.src_not_group Ethernet source is not in any network group
arp.isannouncement Indicates if an ARP message is an announcement
Identified Key Features Within IP-Based Flows Dataset
Feature Meaning
proto Transport layer protocol of the connection
service Identification of an application protocol
orig_bytes Originator payload bytes
resp_bytes Responder payload bytes
history Connection state history
orig_pkts Originator sent packets
resp_pkts Responder sent packets
flow_duration Length of the flow in seconds
fwd_pkts_tot Forward packets total
bwd_pkts_tot Backward packets total
fwd_data_pkts_tot Forward data packets total
bwd_data_pkts_tot Backward data packets total
fwd_pkts_per_sec Forward packets per second
bwd_pkts_per_sec Backward packets per second
flow_pkts_per_sec Flow packets per second
fwd_header_size Forward header bytes
bwd_header_size Backward header bytes
fwd_pkts_payload Forward payload bytes
bwd_pkts_payload Backward payload bytes
flow_pkts_payload Flow payload bytes
fwd_iat Forward inter-arrival time
bwd_iat Backward inter-arrival time
flow_iat Flow inter-arrival time
active Flow active duration
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global intrusion detection system (IDS) market, valued at $284 million in 2025, is projected to experience robust growth, driven by the increasing sophistication of cyber threats and the rising adoption of cloud computing and IoT devices. A Compound Annual Growth Rate (CAGR) of 4.6% from 2025 to 2033 suggests a significant market expansion over the forecast period. Key market drivers include the escalating need for robust cybersecurity measures across diverse sectors, including finance, government, and healthcare, which are increasingly reliant on digital infrastructure and sensitive data. The growing prevalence of advanced persistent threats (APTs) and ransomware attacks necessitates advanced intrusion detection capabilities, fueling market demand. Furthermore, the transition towards cloud-based security solutions is creating opportunities for IDS vendors offering scalable and flexible deployments. While data privacy regulations and the rising cost of implementation pose certain challenges, the overall market outlook remains positive, particularly considering the increasing awareness among organizations regarding potential cybersecurity risks. The market segmentation reveals significant opportunities within specific application areas. The finance sector, with its stringent regulatory compliance requirements and high-value assets, represents a substantial market segment. Government agencies, facing ever-evolving cyber threats, are also investing heavily in advanced IDS solutions. The IT and telecom sector's extensive network infrastructure makes it another key target market. Furthermore, the healthcare industry's increasing reliance on electronic health records and connected medical devices creates a growing demand for robust intrusion detection and prevention capabilities. Competition among established players such as Cisco, IBM, Check Point, and others, coupled with the emergence of innovative startups, ensures a dynamic and evolving market landscape. This competitive pressure is expected to drive innovation, resulting in more sophisticated and cost-effective IDS solutions tailored to specific organizational needs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is registered at "Registro Oficial de Propiedad Intelectual, Evidencias y Secretos Empresariales de la Universidad de León" under registry number: 2024 - 000004The CoAP_UAD dataset is designed to evaluate security in networks using the CoAP protocol. This dataset includes three files containing normal and malicious traffic, focusing on specific vulnerabilities of the CoAP protocol as described in its RFC, such as Denial of Service, Man-in-the-Middle, and Cross-Protocol attacks. It is intended for use in the development and testing of Machine Learning models for Intrusion Detection Systems in IoT environments, both domestic and industrial.
Facebook
TwitterAccording to a survey conducted in December 2022, almost ** percent of computer users in Japan neither knew the term nor the meaning of an Internet of Things (IoT) device intrusion. IoT devices are nonstandard computing devices that are able to connect with other devices and exchange data via wireless technology. Their intrusion presents a security risk, which can be mitigated by an intrusion detection system (IDS).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🧠 Project TitleTrustworthy and Ethical AI for Intrusion Detection in Healthcare IoT (IoMT) Systems: An Agentic Decision Loop Framework📋 OverviewThis repository contains the official code, datasets, and configuration setup for the paper submitted to Springer’s Journal of Healthcare Informatics Research (JHIR).The study presents a multi-agent intrusion detection architecture that integrates:A supervised flow-based detectorA Deep Q-Network (DQN) triage agentA NIST AI RMF–aligned ethical rule engineThe framework enables trustworthy, safe, and context-aware intrusion detection in healthcare IoT environments (IoMT).🏗️ Repository Structureagentic-ethical-ids-healthcare/│├── src/ # Source code for model, rule engine, and agent│ ├── train_agent.py│ ├── ethical_engine.py│ ├── detector_model.py│ └── utils/│├── data/ # Links or sample data subsets│ ├── CIC-IoMT-2024/ │ └── CSE-CIC-IDS2018/│├── notebooks/ # Jupyter notebooks for training and analysis│├── models/ # Pretrained model checkpoints (.pth, .pkl)│├── results/ # Evaluation outputs and figures│├── requirements.txt # Python dependencies├── LICENSE # MIT License for open research use└── README.md # Project documentation⚙️ Setup and InstallationClone the repository and set up your environment:git clone https://github.com/ibrahimadabara01/agentic-ethical-ids-healthcare.gitcd agentic-ethical-ids-healthcarepython -m venv venvsource venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txt📊 DatasetsThis project uses three datasets:DatasetPurposeSourceCIC-IoMT 2024Primary IoMT intrusion detection datasetCanadian Institute for CybersecurityCSE-CIC-IDS2018Domain-shift evaluationCIC Dataset PortalMIMIC-IV (Demo)Clinical context signalsPhysioNet⚠️ Note: All datasets are publicly available. The MIMIC-IV Demo contains only de-identified data.🚀 How to Reproduce ResultsRun the full pipeline (training + evaluation):python src/train_agent.py --config configs/agentic_ids.yamlThis script:Trains the supervised flow-based detector on CIC-IoMT 2024Fine-tunes the DQN triage agentEvaluates under domain-shift using CSE-CIC-IDS2018Computes Ethical Compliance Rate (ECR), False Escalation Rate (FER), and CAS metrics📈 Key MetricsMetricDescriptionAccuracyCorrect classification rate across all flowsF1-Score (Weighted)Balanced measure of precision and recallEthical Compliance Rate (ECR)Percentage of actions consistent with governance rulesFalse Escalation Rate (FER)Proportion of overreactions (false alarms)Contextual Adaptation Score (CAS)Robustness under domain-shift📘 CitationIf you use this repository, please cite:Adabara, I. M., et al. (2025). Trustworthy and Ethical AI for Intrusion Detection in Healthcare IoT (IoMT) Systems: An Agentic Decision Loop Framework. Journal of Healthcare Informatics Research, Springer.🔒 Ethical ComplianceAll experiments comply with PhysioNet and HIPAA de-identification standards.The MIMIC-IV Demo dataset was used under credentialed access and contains no PHI.🧾 LicenseThis project is released under the MIT License, allowing free use for research and educational purposes.
Facebook
Twitterhttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditionshttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions
NetFlow Version 2 of the datasets is made up of 43 extended NetFlow features. The details of the datasets are published in: Mohanad Sarhan, Siamak Layeghy, and Marius Portmann, Towards a Standard Feature Set for Network Intrusion Detection System Datasets, Mobile Networks and Applications, 103, 108379, 2022 The use of the datasets for academic research purposes is granted in perpetuity after citing the above papers. For commercial purposes, it should be agreed upon by the authors. Please get in touch with the author Mohanad Sarhan for more details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
The RT-IoT2022, a proprietary dataset derived from a real-time IoT infrastructure, is introduced as a comprehensive resource integrating a diverse range of IoT devices and sophisticated network attack methodologies. This dataset encompasses both normal and adversarial network behaviours, providing a general representation of real-world scenarios. Incorporating data from IoT devices such as ThingSpeak-LED, Wipro-Bulb, and MQTT-Temp, as well as simulated attack scenarios involving Brute-Force SSH attacks, DDoS attacks using Hping and Slowloris, and Nmap patterns, RT-IoT2022 offers a detailed perspective on the complex nature of network traffic. The bidirectional attributes of network traffic are meticulously captured using the Zeek network monitoring tool and the Flowmeter plugin. Researchers can leverage the RT-IoT2022 dataset to advance the capabilities of Intrusion Detection Systems (IDS), fostering the development of robust and adaptive security solutions for real-time IoT networks.
Introductory Paper Quantized autoencoder (QAE) intrusion detection system for anomaly detection in resource-constrained IoT devices using RT-IoT2022 dataset By B. S. Sharmila, Rohini Nagapadma. 2023 Published in Cybersecurity
Variable Table available here: https://archive.ics.uci.edu/dataset/942/rt-iot2022
Column Details: id.orig_p id.resp_p proto service flow_duration fwd_pkts_tot bwd_pkts_tot fwd_data_pkts_tot bwd_data_pkts_tot fwd_pkts_per_sec bwd_pkts_per_sec flow_pkts_per_sec down_up_ratio fwd_header_size_tot fwd_header_size_min fwd_header_size_max bwd_header_size_tot bwd_header_size_min bwd_header_size_max flow_FIN_flag_count flow_SYN_flag_count flow_RST_flag_count fwd_PSH_flag_count bwd_PSH_flag_count flow_ACK_flag_count fwd_URG_flag_count bwd_URG_flag_count flow_CWR_flag_count flow_ECE_flag_count fwd_pkts_payload.min fwd_pkts_payload.max fwd_pkts_payload.tot fwd_pkts_payload.avg fwd_pkts_payload.std bwd_pkts_payload.min bwd_pkts_payload.max bwd_pkts_payload.tot bwd_pkts_payload.avg bwd_pkts_payload.std flow_pkts_payload.min flow_pkts_payload.max flow_pkts_payload.tot flow_pkts_payload.avg flow_pkts_payload.std fwd_iat.min fwd_iat.max fwd_iat.tot fwd_iat.avg fwd_iat.std bwd_iat.min bwd_iat.max bwd_iat.tot bwd_iat.avg bwd_iat.std flow_iat.min flow_iat.max flow_iat.tot flow_iat.avg flow_iat.std payload_bytes_per_second fwd_subflow_pkts bwd_subflow_pkts fwd_subflow_bytes bwd_subflow_bytes fwd_bulk_bytes bwd_bulk_bytes fwd_bulk_packets bwd_bulk_packets fwd_bulk_rate bwd_bulk_rate active.min active.max active.tot active.avg active.std idle.min idle.max idle.tot idle.avg idle.std fwd_init_window_size bwd_init_window_size fwd_last_window_size Attack_type
Class Labels
The Dataset contains both Attack patterns and Normal Patterns. Attacks patterns Details: 1. DOS_SYN_Hping------------------------94659 2. ARP_poisioning--------------------------7750 3. NMAP_UDP_SCAN--------------------2590 4. NMAP_XMAS_TREE_SCAN--------2010 5. NMAP_OS_DETECTION-------------2000 6. NMAP_TCP_scan-----------------------1002 7. DDOS_Slowloris------------------------534 8. Metasploit_Brute_Force_SSH---------37 9. NMAP_FIN_SCAN---------------------28 Normal Patterns Details:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been meticulously prepared and utilized as a validation set during the evaluation phase of "Meta IDS" to asses the performance of various machine learning models. It is now made available for interested users and researchers who seek a reliable and diverse dataset for training and testing their own custom models.
The validation dataset comprises a comprehensive collection of labeled entries, that determines whether the packet type is "malicious" or "benign." It covers complex design patterns that are commonly encountered in real-world applications. The dataset is designed to be representative, encompassing edge and fog layers that are in contact with cloud layer, thereby enabling thorough testing and evaluation of different models. Each sample in the dataset is labeled with the corresponding ground truth, providing a reliable reference for model performance evaluation.
To ensure convenient distribution and storage, the dataset has been broken down into three separate batches, each containing a portion of the dataset. This allows for convenient downloading and management of the dataset. The three batches are provided as individual compressed files.
In order to extract the data, follow the following instructions:
Once uncompressed, you will have access to the dataset in its original format for further exploration, analysis, and model training etc. The total storage required for extraction is approximately 800 GB in total, with the first batch requiring approximately 302 GB, the second batch requiring approximately 203 GB, and the third batch requiring approximately 297 GB of data storage.
The first batch contains 1,049,527,992 entries, where as the second batch contains 711,043,331 entries, and for the third and last batch we have 1,029,303,062 entries. The following table provides the feature names along with their explanation and example value once the dataset is extracted.
| Feature | Description | Example Value |
|---|---|---|
| ip.src | Source IP address in the packet | a05d4ecc38da01406c9635ec694917e969622160e728495e3169f62822444e17 |
| ip.dst | Destination IP address in the packet | a52db0d87623d8a25d0db324d74f0900deb5ca4ec8ad9f346114db134e040ec5 |
| frame.time_epoch | Epoch time of the frame | 1676165569.930869 |
| arp.hw.type | Hardware type | 1 |
| arp.hw.size | Hardware size | 6 |
| arp.proto.size | Protocol size | 4 |
| arp.opcode | Opcode | 2 |
| data.len | Length | 2713 |
| eth.dst.lg | Destination LG bit | 1 |
| eth.dst.ig | Destination IG bit | 1 |
| eth.src.lg | Source LG bit | 1 |
| eth.src.ig | Source IG bit | 1 |
| frame.offset_shift | Time shift for this packet | 0 |
| frame.len | frame length on the wire | 1208 |
| frame.cap_len | Frame length stored into the capture file | 215 |
| frame.marked | Frame is marked | 0 |
| frame.ignored | Frame is ignored | 0 |
| frame.encap_type | Encapsulation type | 1 |
| gre | Generic Routing Encapsulation | 'Generic Routing Encapsulation (IP)’ |
| ip.version | Version | 6 |
| ip.hdr_len | Header length | 24 |
| ip.dsfield.dscp | Differentiated Services Codepoint | 56 |
| ip.dsfield.ecn | Explicit Congestion Notification | 2 |
| ip.len | Total length | 614 |
| ip.flags.rb | Reserved bit | 0 |
| ip.flags.df | Don't fragment | 1 |
| ip.flags.mf | More fragments | 0 |
| ip.frag_offset | Fragment offset | 0 |
| ip.ttl | Time to live | 31 |
| ip.proto | Protocol | 47 |
| ip.checksum.status | Header checksum status | 2 |
| tcp.srcport | TCP source port | 53425 |
| tcp.flags | Flags | 0x00000098 |
| tcp.flags.ns | Nonce | 0 |
| tcp.flags.cwr | Congestion Window Reduced (CWR) | 1 |
| udp.srcport | UDP source port | 64413 |
| udp.dstport | UDP destination port | 54087 |
| udp.stream | Stream index | 1345 |
| udp.length | Length | 225 |
| udp.checksum.status | Checksum status | 3 |
| packet_type | Type of the packet which is either "benign" or "malicious" | 0 |
Furthermore, in compliance with the GDPR and to ensure the privacy of individuals, all IP addresses present in the dataset have been anonymized through hashing. This anonymization process helps protect the identity of individuals while preserving the integrity and utility of the dataset for research and model development purposes.
Please note that while the dataset provides valuable insights and a solid foundation for machine learning tasks, it is not a substitute for extensive real-world data collection. However, it serves as a valuable resource for researchers, practitioners, and enthusiasts in the machine learning community, offering a compliant and anonymized dataset for developing and validating custom models in a specific problem domain.
By leveraging the validation dataset for machine learning model evaluation and custom model training, users can accelerate their research and development efforts, building upon the knowledge gained from my thesis while contributing to the advancement of the field.
Facebook
TwitterAccording to a survey conducted in December 2022, more than ** percent of smartphone users in Japan neither knew the term nor the meaning of an Internet of Things (IoT) device intrusion. IoT devices are nonstandard computing devices that are able to connect with other devices and exchange data via wireless technology. Their intrusion presents a security risk, which can be mitigated by an intrusion detection system (IDS).
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Wireless Intrusion Detection System (WIDS) market is booming, projected to reach $202.7 million in 2025 with a 10.2% CAGR. Discover key drivers, trends, and regional insights for this rapidly expanding sector, dominated by Cisco, IBM, and Check Point. Learn about market segmentation, growth forecasts, and competitive landscape analysis.
Facebook
TwitterBoTNeTIoT-L01 is a data set integrated all the IoT devices data file from the detection_of_IoT_botnet_attacks_N_BaIoT (BoTNeTIoT) data set. This new version reduced the redundancy of the original dataset by choosing the features of 10 seconds time window only. In the dataset class label, 0 stands for attacks, and 1 stands for normal samples.
The BoTNeTIoT-L01, the most recent dataset, contains nine IoT devices traffic sniffed using Wireshark in a local network using a central switch. It includes two Botnet attacks (Mirai and Gafgyt). The dataset contains twenty-three statistically engineered features extracted from the .pcap files. Seven statistical measures were computed (mean, variance, count, magnitude, radius, covariance, correlation coefficient) over the time window of 10 sec with decay factor equals 0.1. The decay factor value is used in the dataset as well as in our papers below [2],[3],[4], and [5] to refer to its corresponding time window as L0.1. Four features were extracted from the .pcap: packet count, jitter, size of outbound packets only, and outbound and inbound packets together. For each of these four features, three or more statistical measures were computed, resulting in twenty-three features.
-- References to the article where the dataset was initially described and used. Please, cite all the papers below: [1] A. Alhowaide, I. Alsmadi, J. Tang. “Towards the design of real-time autonomous IoT NIDS”, Cluster Computing (2021), pages 1-14, Jan 2021. [2] A. Alhowaide, I. Alsmadi, J. Tang, “Features Quality Impact on Cyber Physical Security Systems”, 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Oct. 2019.
-- References to the article where the dataset was used: [3] A. Alhowaide, I. Alsmadi, J. Tang. “PCA, Random-Forest and Pearson Correlation for Dimensionality Reduction in IoT IDS”, 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), pages. 1-6. Vancouver, BC, Canada, Sept. 2020. [4] A. Alhowaide, I. Alsmadi, J. Tang. “An Ensemble Feature Selection Method for IoT IDS”, 2020 IEEE 6th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (DependSys), Fiji, Dec. 2020.