http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai
There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
building IoT IDS requires the availability of datasets to process
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The CIC IoT Dataset 2023 is a comprehensive benchmark developed by the Canadian Institute for Cybersecurity (CIC) to advance intrusion detection research in real-world Internet of Things (IoT) environments. This dataset was created using a network of 105 actual IoT devices, encompassing smart home gadgets, sensors, and cameras, to simulate authentic IoT traffic and attack scenarios.
Key Features:
Diverse Attack Scenarios: The dataset includes 33 distinct attacks categorized into seven classes: DDoS, DoS, Reconnaissance, Web-based, Brute Force, Spoofing, and Mirai. These attacks were executed by compromised IoT devices targeting other IoT devices, reflecting realistic threat vectors.(University of New Brunswick)
Extensive Data Collection: Network traffic was captured in real-time, resulting in over 46 million records. The data is available in various formats, including raw PCAP files and pre-extracted CSV features, facilitating different research needs.
Realistic IoT Topology: Unlike many datasets that rely on simulations, this dataset was generated using a large-scale IoT testbed with devices from multiple vendors, providing a heterogeneous and realistic network environment.
Benchmarking and Evaluation: The dataset has been utilized to evaluate the performance of machine learning and deep learning algorithms in classifying and detecting malicious versus benign IoT network traffic.(University of New Brunswick)
This dataset serves as a valuable resource for researchers and practitioners aiming to develop and test security analytics applications, intrusion detection systems, and other cybersecurity solutions tailored for IoT ecosystems.(University of New Brunswick)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
including some laptops or smart phones
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The exponential growth of the Internet of Things (IoT) devices provides a large attack surface for intruders to launch more destructive cyber-attacks. The intruder aimed to exhaust the target IoT network resources with malicious activity. New techniques and detection algorithms required a well-designed dataset for IoT networks. We proposed a new dataset, namely IoTID20, generated dataset from [1]. The new IoT botnet dataset has a more comprehensive network and flow-based features. The flow-based feature can be used to analyze and evaluate a flow-based intrusion detection system. Our proposed IoT botnet dataset will provide a reference point to identify anomalous activity across the IoT networks. The IoT Botnet dataset can be accessed from [2]. The new IoTID20 dataset will provide a foundation for the development of new intrusion detection techniques in IoT networks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.
Please do cite the aforementioned article when using this dataset.
The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.
The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.
To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.
This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.
Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.
Identified Key Features Within Bluetooth Dataset
Feature | Meaning |
btle.advertising_header | BLE Advertising Packet Header |
btle.advertising_header.ch_sel | BLE Advertising Channel Selection Algorithm |
btle.advertising_header.length | BLE Advertising Length |
btle.advertising_header.pdu_type | BLE Advertising PDU Type |
btle.advertising_header.randomized_rx | BLE Advertising Rx Address |
btle.advertising_header.randomized_tx | BLE Advertising Tx Address |
btle.advertising_header.rfu.1 | Reserved For Future 1 |
btle.advertising_header.rfu.2 | Reserved For Future 2 |
btle.advertising_header.rfu.3 | Reserved For Future 3 |
btle.advertising_header.rfu.4 | Reserved For Future 4 |
btle.control.instant | Instant Value Within a BLE Control Packet |
btle.crc.incorrect | Incorrect CRC |
btle.extended_advertising | Advertiser Data Information |
btle.extended_advertising.did | Advertiser Data Identifier |
btle.extended_advertising.sid | Advertiser Set Identifier |
btle.length | BLE Length |
frame.cap_len | Frame Length Stored Into the Capture File |
frame.interface_id | Interface ID |
frame.len | Frame Length Wire |
nordic_ble.board_id | Board ID |
nordic_ble.channel | Channel Index |
nordic_ble.crcok | Indicates if CRC is Correct |
nordic_ble.flags | Flags |
nordic_ble.packet_counter | Packet Counter |
nordic_ble.packet_time | Packet time (start to end) |
nordic_ble.phy | PHY |
nordic_ble.protover | Protocol Version |
Identified Key Features Within IP-Based Packets Dataset
Feature | Meaning |
http.content_length | Length of content in an HTTP response |
http.request | HTTP request being made |
http.response.code | Sequential number of an HTTP response |
http.response_number | Sequential number of an HTTP response |
http.time | Time taken for an HTTP transaction |
tcp.analysis.initial_rtt | Initial round-trip time for TCP connection |
tcp.connection.fin | TCP connection termination with a FIN flag |
tcp.connection.syn | TCP connection initiation with SYN flag |
tcp.connection.synack | TCP connection establishment with SYN-ACK flags |
tcp.flags.cwr | Congestion Window Reduced flag in TCP |
tcp.flags.ecn | Explicit Congestion Notification flag in TCP |
tcp.flags.fin | FIN flag in TCP |
tcp.flags.ns | Nonce Sum flag in TCP |
tcp.flags.res | Reserved flags in TCP |
tcp.flags.syn | SYN flag in TCP |
tcp.flags.urg | Urgent flag in TCP |
tcp.urgent_pointer | Pointer to urgent data in TCP |
ip.frag_offset | Fragment offset in IP packets |
eth.dst.ig | Ethernet destination is in the internal network group |
eth.src.ig | Ethernet source is in the internal network group |
eth.src.lg | Ethernet source is in the local network group |
eth.src_not_group | Ethernet source is not in any network group |
arp.isannouncement | Indicates if an ARP message is an announcement |
Identified Key Features Within IP-Based Flows Dataset
Feature | Meaning |
proto | Transport layer protocol of the connection |
service | Identification of an application protocol |
orig_bytes | Originator payload bytes |
resp_bytes | Responder payload bytes |
history | Connection state history |
orig_pkts | Originator sent packets |
resp_pkts | Responder sent packets |
flow_duration | Length of the flow in seconds |
fwd_pkts_tot | Forward packets total |
bwd_pkts_tot | Backward packets total |
fwd_data_pkts_tot | Forward data packets total |
bwd_data_pkts_tot | Backward data packets total |
fwd_pkts_per_sec | Forward packets per second |
bwd_pkts_per_sec | Backward packets per second |
flow_pkts_per_sec | Flow packets per second |
fwd_header_size | Forward header bytes |
bwd_header_size | Backward header bytes |
fwd_pkts_payload | Forward payload bytes |
bwd_pkts_payload | Backward payload bytes |
flow_pkts_payload | Flow payload bytes |
fwd_iat | Forward inter-arrival time |
bwd_iat | Backward inter-arrival time |
flow_iat | Flow inter-arrival time |
active | Flow active duration |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the continuous expansion of data exchange, the threat of cybercrime and network invasions is also on the rise. This project aims to address these concerns by investigating an innovative approach: an Attentive Transformer Deep Learning Algorithm for Intrusion Detection of IoT Systems using Automatic Xplainable Feature Selection. The primary focus of this project is to develop an effective Intrusion Detection System (IDS) using the aforementioned algorithm. To accomplish this, carefully curated datasets have been utilized, which have been created through a meticulous process involving data extraction from the University of New Brunswick repository. This repository houses the datasets used in this research and can be accessed publically in order to replicate the findings of this research.
ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.
Instructions:
Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.
Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...
Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.
The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:
Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809
Link to paper : https://ieeexplore.ieee.org/document/9751703
The directories of the Edge-IIoTset dataset include the following:
•File 1 (Normal traffic)
-File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.
-File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.
-File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.
-File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.
-File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.
-File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.
-File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.
-File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.
-File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.
-File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.
•File 2 (Attack traffic):
-File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.
-File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.
•File 3 (Selected dataset for ML and DL):
-File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.
-File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.
Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files
!pip install -q kaggle
files.upload()
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"
!unzip DNN-EdgeIIoT-dataset.csv.zip
!rm DNN-EdgeIIoT-dataset.csv.zip
Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd
import numpy as np
df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)
Step 3 : Exploring some of the DataFrame's contents: df.head(5)
print(df['Attack_type'].value_counts())
Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle
drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",
"http.file_data","http.request.full_uri","icmp.transmit_timestamp",
"http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
"tcp.dstport", "udp.port", "mqtt.msg"]
df.drop(drop_columns, axis=1, inplace=True)
df.dropna(axis=0, how='any', inplace=True)
df.drop_duplicates(subset=None, keep="first", inplace=True)
df = shuffle(df)
df.isna().sum()
print(df['Attack_type'].value_counts())
Step 5: Categorical data encoding (Dummy Encoding): import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import preprocessing
def encode_text_dummy(df, name):
dummies = pd.get_dummies(df[name])
for x in dummies.columns:
dummy_name = f"{name}-{x}"
df[dummy_name] = dummies[x]
df.drop(name, axis=1, inplace=True)
encode_text_dummy(df,'http.request.method')
encode_text_dummy(df,'http.referer')
encode_text_dummy(df,"http.request.version")
encode_text_dummy(df,"dns.qry.name.len")
encode_text_dummy(df,"mqtt.conack.flags")
encode_text_dummy(df,"mqtt.protoname")
encode_text_dummy(df,"mqtt.topic")
Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')
For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com
More information about Dr. Mohamed Amine Ferrag is available at:
https://www.linkedin.com/in/Mohamed-Amine-Ferrag
https://dblp.uni-trier.de/pid/142/9937.html
https://www.researchgate.net/profile/Mohamed_Amine_Ferrag
https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao
https://www.scopus.com/authid/detail.uri?authorId=56115001200
https://publons.com/researcher/1322865/mohamed-amine-ferrag/
https://orcid.org/0000-0002-0632-3172
Last Updated: 27 Mar. 2023
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The CICIoT2023 dataset is a comprehensive and modern dataset designed for research in Internet of Things (IoT) security, particularly for intrusion detection and anomaly detection systems. Released by the Canadian Institute for Cybersecurity (CIC), this dataset reflects real-world IoT network traffic and attack scenarios, providing a valuable resource for machine learning and cybersecurity research.
The dataset was generated using a realistic testbed that simulates various IoT devices communicating over a network, including smart TVs, webcams, smart thermostats, and wearable devices. It captures both benign traffic and a wide variety of attack types such as Denial of Service (DoS), Distributed Denial of Service (DDoS), brute-force attacks, botnets, reconnaissance, and more advanced threats.
Key Features of CICIoT2023:
Contains a mix of normal and malicious IoT network traffic.
Includes 34 distinct attack types, covering modern and advanced cyber threat scenarios.
Provides labeled data suitable for supervised machine learning models.
Offers extracted network flow features (e.g., packet size, duration, flags, statistical summaries) which can be used for traffic classification and anomaly detection.
Supports research in intrusion detection, anomaly detection, and IoT security strategy development.
This dataset helps bridge the gap between traditional network security datasets and the unique, evolving patterns of IoT device communication, making it an excellent benchmark for evaluating the performance of AI-based security solutions.
I have further broken downed the data into these 3 parts Train: (5491971, 47) Validation: (1176851, 47) Test: (1176851, 47)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set includes attack implementations in an Internet of Things (IoT) context. The IoT nodes use Contiki-NG as their operating system and the data is collected from the Cooja simulation environment where a large number of network topologies are created. Blackhole and DIS-flooding attacks are implemented to attack the RPL routing protocol. The datasets includes log file output from the Cooja simulator and a pre-processed feature set as input to an intrusion detection model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dragon_Pi is an intrusion detection dataset for IoT devices. In the field of IoT security there are few datasets, and those which do exist tend to focus solely on network traffic. The Dragon_Pi dataset seeks to provide not only more data for the field of IoT security, but also, data of a somewhat under-published type: linear time series power consumption data.
Dragon_Pi is a fully labelled Intrusion Detection dataset for IoT devices. It is composed of both normal and under-attack power consumption data obtained from two separate testbeds - one using a DragonBoard 410c and the other a Raspberry Pi Model 3 - Hence the moniker Dragon_Pi.
These testbeds were set up with predefined normal behavour as described in the attached publications. The normal linear time series power consumption was sampled from the testbed under these normal conditions. Both testbeds were then attacked using some common attacks on IoT - the linear time series power consumption captured under these condtions as well.
Specifically, the testbeds were subjected to the Port Scan (using Nmap), SSH Brute Force (using Hydra) and SYNFlood Denial of Service (using Hping3) attacks. These attacks were repeated to gain insight to what their signatures looked like and also how varying the tool settings effected the resultant signature. A fourth type of scenario was also conducted on the testbeds - the "Capture the Flag" scenarios. In these files multiple attack types were used with a more specific target - to exfiltrate a hidden file from the testbeds.
Each file has three hierarchical levels of annotation for each sample within:
Users can decide for themselves what level of annotation they require for their specific task.
Each file in the Dragon_Pi dataset is accompanied by its own legend file. This file explains the contents of the specific .csv file and the specific indexes of the events within.
The Dragon_Pi dataset consists of approximately 67 files, as shown in Table 1. Compressed, the datset totals approximately 13GB. Completely decompressed the dataset is approximately 80GB ( 30GB Pi data, 50 GB Dragon data).
Label Type | Specific Label | Number of Files DragonBoard 410c | Number of Files Raspberry Pi |
Normal | Normal | 3 | 2 |
Port Scan Attack | Nmap_T5 | 2 | 1 |
Nmap_T4 | 1 | 1 | |
Nmap_T3 | 1 | 1 | |
Nmap_T2 | 1 | 1 | |
SSH Brute Force | Hydra_T32 | 4 | 2 |
Hydra_T16 | 16 | 2 | |
Hydra_T3 | 8 | 2 | |
Hydra_T1 | 5 | 2 | |
SYNFlood DOS | SYNFlood DOS | 1 | 1 |
Capture the Flag | Misc Attacks | 3 | 5 |
Datasets as described in the research paper "Intrusion Detection using Network Traffic Profiling and Machine Learning for IoT Applications".There are two main dataset provided here, firstly is the data relating to the initial training of the machine learning module for both normal and malicious traffic, these are in binary visulisation format, compresed into the document traffic-dataset.zip.The remainin data is provided by this repository in attackScenario.zip and attackSenarioImages.zip, thee are the images generated from each of the five attack scenario packet captures, as well as their associated PCAP files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the research article on MQTTEEB-D and is intended for public use in cybersecurity research. The MQTTEEB-D dataset is a practical real-world data set for intrusion detection improvement in Message Queuing Telemetry Transport (MQTT)-based Internet of Things (IoT) networks. In contrast to already existing datasets that are constructed on simulated network traffic, MQTTEEB-D is obtained from a real-time IoT deployment at the International University of Rabat (UIR), Morocco. Using MySignals IoT health sensors, Raspberry Pi 4, and an MQTT broker server, this dataset represents the actual complexity of the active IoT communication process, which synthetic data fails to offer. To narrow the gap between simulated and real-world attack scenarios, various cyberattacks including Denial of Service (DoS), Slow DoS against Internet of Things Environments (SlowITe), Malformed Data Injection, Brute Force, and MQTT publish flooding were carried out in real-time, permitting close monitoring of network traffic anomalies. The data was captured using Python wrapper for tshark (PyShark) and organized into multiple Comma-Separated Values (CSV) files. To ensure high data quality, we performed pre-processing steps, such as outlier removal, normalization, standardization, and class balance. Several processed forms (raw, cleaned, normalized, standardized, Synthetic Minority Over-sampling Technique (SMOTE)) applied for this dataset are provided, along with detailed metadata to facilitate ease of use in cybersecurity research. This dataset provides an opportunity for researchers to develop and validate intrusion detection models in a real-world MQTT environment - a critical ingredient in Artificial Intelligence (AI)-driven cybersecurity solutions for IoT networks. The dataset will support future research IoT security and anomaly detection domains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CICIoT2023 dataset is a large-scale, realistic intrusion detection dataset designed to support security analytics and machine learning research in the Internet of Things (IoT) domain. Created by the Canadian Institute for Cybersecurity (CIC), the dataset captures 33 different types of attacks (including DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai) executed by malicious IoT devices against other IoT targets.
The testbed consists of 105 real IoT devices of different types and manufacturers, including smart home devices and industrial equipment, configured in a complex network topology to emulate real-world conditions. The dataset includes benign and malicious traffic in various formats and supports feature extraction for both traditional ML and deep learning models.
This dataset aims to address the lack of diversity and scale in previous IoT security datasets, offering a robust benchmark for evaluating intrusion detection systems (IDS) and enabling research in IoT cybersecurity, anomaly detection, and network forensics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Even while the IoHT provides many benefits to the medical industry, there are cybersecurity threats that could jeopardize patient data and health. Hackers' ability to change biometric data from biosensors or disrupt the IoHT system is one of the primary worries. To address this problem, Intrusion Detection Systems (IDS) have been developed to protect IoHT. However, the high dimensionality of the data makes it challenging to design IDS for IoHT, which results in model overfitting and decreased detection accuracy.
The Internet of Things (IoT) is omnipresent, exposing a large number of devices that often lack security controls to the public Internet. In the modern world, many everyday processes depend on these devices, and their service outage could lead to catastrophic consequences. There are many Deep Packet Inspection (DPI) based intrusion detection systems (IDS). However, their linear computational complexity induced by the event-driven nature poses a power-demanding obstacle in resource-constrained IoT environments. In this paper, we shift away from the traditional IDS as we introduce a novel and lightweight framework, relying on a time-driven algorithm to detect Distributed Denial of Service (DDoS) attacks by employing Machine Learning (ML) algorithms leveraging the newly engineered features containing system and network utilization information. These features are periodically generated, and there are only ten of them, resulting in a low and constant algorithmic complexity. Moreover, we leverage IoT-specific patterns to detect malicious traffic as we argue that each Denial of Service (DoS) attack leaves a unique fingerprint in the proposed set of features. We construct a dataset by launching some of the most prevalent DoS attacks against an IoT device, and we demonstrate the effectiveness of our approach with high accuracy. The results show that standalone IoT devices can detect and classify DoS and, therefore, arguably, DDoS attacks against them at a low computational cost with a deterministic delay.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Reduced version of the no periodicity dataset applying the same methodology reported for the ROSPaCe reduced dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Development of an Internet of Things (IoT) Network Traffic Dataset with Simulated Attack Data.Abstract— This research focuses on the requirements for and the creation of an intrusion detection system (IDS) dataset for an Internet of Things (IoT) network domain.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
etc.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Wireless Intrusion Detection System (WIDS) market is experiencing robust growth, projected to reach $202.7 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 10.2% from 2025 to 2033. This expansion is driven by the increasing prevalence of wireless networks across various sectors, heightened cybersecurity concerns, and the rising adoption of cloud-based solutions. The demand for secure wireless infrastructure is particularly strong in industries like finance, government, IT and telecom, and healthcare, where sensitive data necessitates robust protection against unauthorized access and cyber threats. The market's segmentation reflects this diverse application landscape, with both on-premises and cloud-based WIDS solutions witnessing significant adoption. Key players like Cisco, IBM, and Check Point are actively shaping market dynamics through technological innovations and strategic partnerships, further fueling market growth. While regulatory compliance requirements contribute to market expansion, potential restraints include the complexity of integrating WIDS into existing network infrastructure and the ongoing evolution of sophisticated cyberattacks that demand constant adaptation of security measures. The projected growth trajectory suggests a substantial market expansion in the coming years. Factors such as the increasing adoption of IoT devices, the expansion of 5G networks, and the growing need for advanced threat detection capabilities are expected to further accelerate the demand for WIDS. The competitive landscape is characterized by both established players and emerging vendors, creating a dynamic environment marked by innovation and strategic acquisitions. Regional variations in market growth are expected, with North America and Europe initially leading the way, followed by increasing adoption in the Asia-Pacific region fueled by economic growth and digital transformation initiatives. Future market success will hinge on vendors' ability to deliver scalable, cost-effective, and easily deployable solutions that address the evolving security needs of diverse industries.
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai
There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.