100+ datasets found
  1. IoT Intrusion Detection

    • kaggle.com
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyber Cop (2023). IoT Intrusion Detection [Dataset]. http://doi.org/10.34740/kaggle/dsv/6142327
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cyber Cop
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai

    There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.

  2. i

    MQTT-IoT-IDS2020: MQTT Internet of Things Intrusion Detection Dataset

    • ieee-dataport.org
    Updated Aug 31, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanan Hindy (2020). MQTT-IoT-IDS2020: MQTT Internet of Things Intrusion Detection Dataset [Dataset]. https://ieee-dataport.org/open-access/mqtt-iot-ids2020-mqtt-internet-things-intrusion-detection-dataset
    Explore at:
    Dataset updated
    Aug 31, 2020
    Authors
    Hanan Hindy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    building IoT IDS requires the availability of datasets to process

  3. UNB CIC IOT Dataset 2023 (Updated 2024-10-08)

    • kaggle.com
    zip
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Abdul Al Emon (2025). UNB CIC IOT Dataset 2023 (Updated 2024-10-08) [Dataset]. https://www.kaggle.com/datasets/mdabdulalemo/cic-iot-dataset2023-updated-2024-10-08
    Explore at:
    zip(3264262523 bytes)Available download formats
    Dataset updated
    May 24, 2025
    Authors
    Md. Abdul Al Emon
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The CIC IoT Dataset 2023 is a comprehensive benchmark developed by the Canadian Institute for Cybersecurity (CIC) to advance intrusion detection research in real-world Internet of Things (IoT) environments. This dataset was created using a network of 105 actual IoT devices, encompassing smart home gadgets, sensors, and cameras, to simulate authentic IoT traffic and attack scenarios.

    Key Features:

    • Diverse Attack Scenarios: The dataset includes 33 distinct attacks categorized into seven classes: DDoS, DoS, Reconnaissance, Web-based, Brute Force, Spoofing, and Mirai. These attacks were executed by compromised IoT devices targeting other IoT devices, reflecting realistic threat vectors.(University of New Brunswick)

    • Extensive Data Collection: Network traffic was captured in real-time, resulting in over 46 million records. The data is available in various formats, including raw PCAP files and pre-extracted CSV features, facilitating different research needs.

    • Realistic IoT Topology: Unlike many datasets that rely on simulations, this dataset was generated using a large-scale IoT testbed with devices from multiple vendors, providing a heterogeneous and realistic network environment.

    • Benchmarking and Evaluation: The dataset has been utilized to evaluate the performance of machine learning and deep learning algorithms in classifying and detecting malicious versus benign IoT network traffic.(University of New Brunswick)

    This dataset serves as a valuable resource for researchers and practitioners aiming to develop and test security analytics applications, intrusion detection systems, and other cybersecurity solutions tailored for IoT ecosystems.(University of New Brunswick)

  4. i

    IoT network intrusion dataset

    • ieee-dataport.org
    Updated Sep 27, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huy Kang Kim (2019). IoT network intrusion dataset [Dataset]. https://ieee-dataport.org/open-access/iot-network-intrusion-dataset
    Explore at:
    Dataset updated
    Sep 27, 2019
    Authors
    Huy Kang Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    including some laptops or smart phones

  5. f

    IoT Network Intrusion Dataset

    • figshare.com
    csv
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qusay H. Mahmoud; Imtiaz Ullah (2025). IoT Network Intrusion Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.29143829.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 24, 2025
    Dataset provided by
    figshare
    Authors
    Qusay H. Mahmoud; Imtiaz Ullah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The exponential growth of the Internet of Things (IoT) devices provides a large attack surface for intruders to launch more destructive cyber-attacks. The intruder aimed to exhaust the target IoT network resources with malicious activity. New techniques and detection algorithms required a well-designed dataset for IoT networks. We proposed a new dataset, namely IoTID20, generated dataset from [1]. The new IoT botnet dataset has a more comprehensive network and flow-based features. The flow-based feature can be used to analyze and evaluate a flow-based intrusion detection system. Our proposed IoT botnet dataset will provide a reference point to identify anomalous activity across the IoT networks. The IoT Botnet dataset can be accessed from [2]. The new IoTID20 dataset will provide a foundation for the development of new intrusion detection techniques in IoT networks.

  6. IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

    • zenodo.org
    • data.niaid.nih.gov
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338
    Explore at:
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Article Information

    The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

    Please do cite the aforementioned article when using this dataset.

    Abstract

    The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

    ZIP Folder Content

    The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

    To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

    This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

    Datasets' Content

    Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

    Identified Key Features Within Bluetooth Dataset

    FeatureMeaning
    btle.advertising_headerBLE Advertising Packet Header
    btle.advertising_header.ch_selBLE Advertising Channel Selection Algorithm
    btle.advertising_header.lengthBLE Advertising Length
    btle.advertising_header.pdu_typeBLE Advertising PDU Type
    btle.advertising_header.randomized_rxBLE Advertising Rx Address
    btle.advertising_header.randomized_txBLE Advertising Tx Address
    btle.advertising_header.rfu.1Reserved For Future 1
    btle.advertising_header.rfu.2Reserved For Future 2
    btle.advertising_header.rfu.3Reserved For Future 3
    btle.advertising_header.rfu.4Reserved For Future 4
    btle.control.instantInstant Value Within a BLE Control Packet
    btle.crc.incorrectIncorrect CRC
    btle.extended_advertisingAdvertiser Data Information
    btle.extended_advertising.didAdvertiser Data Identifier
    btle.extended_advertising.sidAdvertiser Set Identifier
    btle.lengthBLE Length
    frame.cap_lenFrame Length Stored Into the Capture File
    frame.interface_idInterface ID
    frame.lenFrame Length Wire
    nordic_ble.board_idBoard ID
    nordic_ble.channelChannel Index
    nordic_ble.crcokIndicates if CRC is Correct
    nordic_ble.flagsFlags
    nordic_ble.packet_counterPacket Counter
    nordic_ble.packet_timePacket time (start to end)
    nordic_ble.phyPHY
    nordic_ble.protoverProtocol Version

    Identified Key Features Within IP-Based Packets Dataset

    FeatureMeaning
    http.content_lengthLength of content in an HTTP response
    http.requestHTTP request being made
    http.response.codeSequential number of an HTTP response
    http.response_numberSequential number of an HTTP response
    http.timeTime taken for an HTTP transaction
    tcp.analysis.initial_rttInitial round-trip time for TCP connection
    tcp.connection.finTCP connection termination with a FIN flag
    tcp.connection.synTCP connection initiation with SYN flag
    tcp.connection.synackTCP connection establishment with SYN-ACK flags
    tcp.flags.cwrCongestion Window Reduced flag in TCP
    tcp.flags.ecnExplicit Congestion Notification flag in TCP
    tcp.flags.finFIN flag in TCP
    tcp.flags.nsNonce Sum flag in TCP
    tcp.flags.resReserved flags in TCP
    tcp.flags.synSYN flag in TCP
    tcp.flags.urgUrgent flag in TCP
    tcp.urgent_pointerPointer to urgent data in TCP
    ip.frag_offsetFragment offset in IP packets
    eth.dst.igEthernet destination is in the internal network group
    eth.src.igEthernet source is in the internal network group
    eth.src.lgEthernet source is in the local network group
    eth.src_not_groupEthernet source is not in any network group
    arp.isannouncementIndicates if an ARP message is an announcement

    Identified Key Features Within IP-Based Flows Dataset

    FeatureMeaning
    protoTransport layer protocol of the connection
    serviceIdentification of an application protocol
    orig_bytesOriginator payload bytes
    resp_bytesResponder payload bytes
    historyConnection state history
    orig_pktsOriginator sent packets
    resp_pktsResponder sent packets
    flow_durationLength of the flow in seconds
    fwd_pkts_totForward packets total
    bwd_pkts_totBackward packets total
    fwd_data_pkts_totForward data packets total
    bwd_data_pkts_totBackward data packets total
    fwd_pkts_per_secForward packets per second
    bwd_pkts_per_secBackward packets per second
    flow_pkts_per_secFlow packets per second
    fwd_header_sizeForward header bytes
    bwd_header_sizeBackward header bytes
    fwd_pkts_payloadForward payload bytes
    bwd_pkts_payloadBackward payload bytes
    flow_pkts_payloadFlow payload bytes
    fwd_iatForward inter-arrival time
    bwd_iatBackward inter-arrival time
    flow_iatFlow inter-arrival time
    activeFlow active duration
  7. f

    Network Intrusion Detection Datasets

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ogobuchi Daniel Okey; Demostenes Zegarra Rodriguez (2023). Network Intrusion Detection Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.23118164.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Ogobuchi Daniel Okey; Demostenes Zegarra Rodriguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the continuous expansion of data exchange, the threat of cybercrime and network invasions is also on the rise. This project aims to address these concerns by investigating an innovative approach: an Attentive Transformer Deep Learning Algorithm for Intrusion Detection of IoT Systems using Automatic Xplainable Feature Selection. The primary focus of this project is to develop an effective Intrusion Detection System (IDS) using the aforementioned algorithm. To accomplish this, carefully curated datasets have been utilized, which have been created through a meticulous process involving data extraction from the University of New Brunswick repository. This repository houses the datasets used in this research and can be accessed publically in order to replicate the findings of this research.

  8. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  9. CICIoT2023

    • kaggle.com
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HIMADRI07 (2025). CICIoT2023 [Dataset]. https://www.kaggle.com/datasets/himadri07/ciciot2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HIMADRI07
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The CICIoT2023 dataset is a comprehensive and modern dataset designed for research in Internet of Things (IoT) security, particularly for intrusion detection and anomaly detection systems. Released by the Canadian Institute for Cybersecurity (CIC), this dataset reflects real-world IoT network traffic and attack scenarios, providing a valuable resource for machine learning and cybersecurity research.

    The dataset was generated using a realistic testbed that simulates various IoT devices communicating over a network, including smart TVs, webcams, smart thermostats, and wearable devices. It captures both benign traffic and a wide variety of attack types such as Denial of Service (DoS), Distributed Denial of Service (DDoS), brute-force attacks, botnets, reconnaissance, and more advanced threats.

    Key Features of CICIoT2023:

    Contains a mix of normal and malicious IoT network traffic.

    Includes 34 distinct attack types, covering modern and advanced cyber threat scenarios.

    Provides labeled data suitable for supervised machine learning models.

    Offers extracted network flow features (e.g., packet size, duration, flags, statistical summaries) which can be used for traffic classification and anomaly detection.

    Supports research in intrusion detection, anomaly detection, and IoT security strategy development.

    This dataset helps bridge the gap between traditional network security datasets and the unique, evolving patterns of IoT device communication, making it an excellent benchmark for evaluating the performance of AI-based security solutions.

    I have further broken downed the data into these 3 parts Train: (5491971, 47) Validation: (1176851, 47) Test: (1176851, 47)

  10. i

    Data generation and knowledge sharing for robust intrusion detection in IoT...

    • ieee-dataport.org
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Rohner (2024). Data generation and knowledge sharing for robust intrusion detection in IoT systems [Dataset]. https://ieee-dataport.org/open-access/data-generation-and-knowledge-sharing-robust-intrusion-detection-iot-systems
    Explore at:
    Dataset updated
    Dec 11, 2024
    Authors
    Christian Rohner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data set includes attack implementations in an Internet of Things (IoT) context. The IoT nodes use Contiki-NG as their operating system and the data is collected from the Cooja simulation environment where a large number of network topologies are created. Blackhole and DIS-flooding attacks are implemented to attack the RPL routing protocol. The datasets includes log file output from the Cooja simulator and a pre-processed feature set as input to an intrusion detection model.

  11. Data from: Dragon_Pi: IoT Side-Channel Power Data Intrusion Detection...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominic Lightbody; Dominic Lightbody; Duc-Minh NGO; Duc-Minh NGO; Andriy Temko; Andriy Temko; Colin C. Murphy; Emanuel Popovici; Emanuel Popovici; Colin C. Murphy (2024). Dragon_Pi: IoT Side-Channel Power Data Intrusion Detection Dataset and Unsupervised Convolutional Autoencoder for Intrusion Detection [Dataset]. http://doi.org/10.5281/zenodo.10784947
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominic Lightbody; Dominic Lightbody; Duc-Minh NGO; Duc-Minh NGO; Andriy Temko; Andriy Temko; Colin C. Murphy; Emanuel Popovici; Emanuel Popovici; Colin C. Murphy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dragon_Pi

    For a more in depth description of the Dragon_Pi dataset, please consult the journal article of the same name:
    Lightbody et al., Future Internet, 2024, https://doi.org/10.3390/fi16030088 - specifically Section 3.2: Dataset Overview.

    Dragon_Pi is an intrusion detection dataset for IoT devices. In the field of IoT security there are few datasets, and those which do exist tend to focus solely on network traffic. The Dragon_Pi dataset seeks to provide not only more data for the field of IoT security, but also, data of a somewhat under-published type: linear time series power consumption data.

    Dragon_Pi is a fully labelled Intrusion Detection dataset for IoT devices. It is composed of both normal and under-attack power consumption data obtained from two separate testbeds - one using a DragonBoard 410c and the other a Raspberry Pi Model 3 - Hence the moniker Dragon_Pi.

    These testbeds were set up with predefined normal behavour as described in the attached publications. The normal linear time series power consumption was sampled from the testbed under these normal conditions. Both testbeds were then attacked using some common attacks on IoT - the linear time series power consumption captured under these condtions as well.

    Specifically, the testbeds were subjected to the Port Scan (using Nmap), SSH Brute Force (using Hydra) and SYNFlood Denial of Service (using Hping3) attacks. These attacks were repeated to gain insight to what their signatures looked like and also how varying the tool settings effected the resultant signature. A fourth type of scenario was also conducted on the testbeds - the "Capture the Flag" scenarios. In these files multiple attack types were used with a more specific target - to exfiltrate a hidden file from the testbeds.

    Each file has three hierarchical levels of annotation for each sample within:

    1. A simple "Normal or Anomaly" label for the specific sample
    2. A specifc attack type label e.g. "SSH Bruteforce", for the specific sample
    3. A specific tool setting for that attack e.g. "Hydra_T16", for the specific sample

    Users can decide for themselves what level of annotation they require for their specific task.

    Each file in the Dragon_Pi dataset is accompanied by its own legend file. This file explains the contents of the specific .csv file and the specific indexes of the events within.

    The Dragon_Pi dataset consists of approximately 67 files, as shown in Table 1. Compressed, the datset totals approximately 13GB. Completely decompressed the dataset is approximately 80GB ( 30GB Pi data, 50 GB Dragon data).

    Label TypeSpecific Label Number of Files DragonBoard 410cNumber of Files Raspberry Pi
    Normal Normal 3 2
    Port Scan Attack Nmap_T521
    Nmap_T411
    Nmap_T311
    Nmap_T211
    SSH Brute ForceHydra_T3242
    Hydra_T16162
    Hydra_T382
    Hydra_T152
    SYNFlood DOSSYNFlood DOS11
    Capture the FlagMisc Attacks35
    Table 1. Enumeration of the in the Dragon_Pi dataset.
    For a more in depth description of the Dragon_Pi dataset, please consult the journal article of the same name:
    Lightbody et al., Future Internet, 2024, https://doi.org/10.3390/fi16030088 - specifically Section 3.2: Dataset Overview.
    Publication of this dataset:
    This dataset was published in Lightbody et al., Future Internet, 2024, https://doi.org/10.3390/fi16030088. Consult and cite this article for a more in depth dataset description, as well as an in depth review of first AI Intrusion Detection model trained on this dataset.
    See article Lightbody et al., Future Internet, 2023, https://doi.org/10.3390/fi15050187 for a detailed investigation on the attack signatures discovered while creating this dataset. This work was an inital investigation of the dataset and can serve as a part 1 to the Dragon_Pi paper.
    How to cite this dataset in your work:
    Please cite these two DOIs when publishing using this dataset:
    1. Dragon_Pi release publication: https://doi.org/10.3390/fi16030088 (most important)
    2. Zenodo Dataset DOI: https://doi.org/10.5281/zenodo.10784947

  12. o

    Intrusion Detection using Network Traffic Profiling and Machine Learning for...

    • explore.openaire.eu
    Updated Mar 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Rose; Matthew Swann; Keltoum Bendiab; Stavros Shiaeles (2021). Intrusion Detection using Network Traffic Profiling and Machine Learning for IoT Applications [Dataset]. http://doi.org/10.21227/pda3-zy88
    Explore at:
    Dataset updated
    Mar 6, 2021
    Authors
    Joseph Rose; Matthew Swann; Keltoum Bendiab; Stavros Shiaeles
    Description

    Datasets as described in the research paper "Intrusion Detection using Network Traffic Profiling and Machine Learning for IoT Applications".There are two main dataset provided here, firstly is the data relating to the initial training of the machine learning module for both normal and malicious traffic, these are in binary visulisation format, compresed into the document traffic-dataset.zip.The remainin data is provided by this repository in attackScenario.zip and attackSenarioImages.zip, thee are the images generated from each of the five attack scenario packet captures, as well as their associated PCAP files.

  13. m

    Data from: MQTTEEB-D: A Real-World IoT Cybersecurity Dataset for AI-Powered...

    • data.mendeley.com
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABDERRAHMANE AQACHTOUL (2025). MQTTEEB-D: A Real-World IoT Cybersecurity Dataset for AI-Powered Threat Detection in MQTT Networks [Dataset]. http://doi.org/10.17632/jfttfjn6tr.1
    Explore at:
    Dataset updated
    Mar 20, 2025
    Authors
    ABDERRAHMANE AQACHTOUL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the research article on MQTTEEB-D and is intended for public use in cybersecurity research. The MQTTEEB-D dataset is a practical real-world data set for intrusion detection improvement in Message Queuing Telemetry Transport (MQTT)-based Internet of Things (IoT) networks. In contrast to already existing datasets that are constructed on simulated network traffic, MQTTEEB-D is obtained from a real-time IoT deployment at the International University of Rabat (UIR), Morocco. Using MySignals IoT health sensors, Raspberry Pi 4, and an MQTT broker server, this dataset represents the actual complexity of the active IoT communication process, which synthetic data fails to offer. To narrow the gap between simulated and real-world attack scenarios, various cyberattacks including Denial of Service (DoS), Slow DoS against Internet of Things Environments (SlowITe), Malformed Data Injection, Brute Force, and MQTT publish flooding were carried out in real-time, permitting close monitoring of network traffic anomalies. The data was captured using Python wrapper for tshark (PyShark) and organized into multiple Comma-Separated Values (CSV) files. To ensure high data quality, we performed pre-processing steps, such as outlier removal, normalization, standardization, and class balance. Several processed forms (raw, cleaned, normalized, standardized, Synthetic Minority Over-sampling Technique (SMOTE)) applied for this dataset are provided, along with detailed metadata to facilitate ease of use in cybersecurity research. This dataset provides an opportunity for researchers to develop and validate intrusion detection models in a real-world MQTT environment - a critical ingredient in Artificial Intelligence (AI)-driven cybersecurity solutions for IoT networks. The dataset will support future research IoT security and anomaly detection domains.

  14. Large-Scale Attacks in IoT Environment

    • kaggle.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Manaenkov (2025). Large-Scale Attacks in IoT Environment [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/large-scale-attacks-in-iot-environment
    Explore at:
    zip(1474647877 bytes)Available download formats
    Dataset updated
    May 7, 2025
    Authors
    Nikita Manaenkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CICIoT2023 dataset is a large-scale, realistic intrusion detection dataset designed to support security analytics and machine learning research in the Internet of Things (IoT) domain. Created by the Canadian Institute for Cybersecurity (CIC), the dataset captures 33 different types of attacks (including DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai) executed by malicious IoT devices against other IoT targets.

    The testbed consists of 105 real IoT devices of different types and manufacturers, including smart home devices and industrial equipment, configured in a complex network topology to emulate real-world conditions. The dataset includes benign and malicious traffic in various formats and supports feature extraction for both traditional ML and deep learning models.

    This dataset aims to address the lack of diversity and scale in previous IoT security datasets, offering a robust benchmark for evaluating intrusion detection systems (IDS) and enabling research in IoT cybersecurity, anomaly detection, and network forensics.

    Source https://www.mdpi.com/1424-8220/23/13/5941

  15. f

    Intrusion Detection in IoHT

    • figshare.com
    zip
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    G. Sai Chaitanya Kumar (2025). Intrusion Detection in IoHT [Dataset]. http://doi.org/10.6084/m9.figshare.28767974.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    figshare
    Authors
    G. Sai Chaitanya Kumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Even while the IoHT provides many benefits to the medical industry, there are cybersecurity threats that could jeopardize patient data and health. Hackers' ability to change biometric data from biosensors or disrupt the IoHT system is one of the primary worries. To address this problem, Intrusion Detection Systems (IDS) have been developed to protect IoHT. However, the high dimensionality of the data makes it challenging to design IDS for IoHT, which results in model overfitting and decreased detection accuracy.

  16. P

    DDoS-detect: Towards Resource-Efficient DDoS Detection in IoT Dataset

    • paperswithcode.com
    Updated Apr 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). DDoS-detect: Towards Resource-Efficient DDoS Detection in IoT Dataset [Dataset]. https://paperswithcode.com/dataset/ddos-detect-towards-resource-efficient-ddos
    Explore at:
    Dataset updated
    Apr 19, 2024
    Description

    The Internet of Things (IoT) is omnipresent, exposing a large number of devices that often lack security controls to the public Internet. In the modern world, many everyday processes depend on these devices, and their service outage could lead to catastrophic consequences. There are many Deep Packet Inspection (DPI) based intrusion detection systems (IDS). However, their linear computational complexity induced by the event-driven nature poses a power-demanding obstacle in resource-constrained IoT environments. In this paper, we shift away from the traditional IDS as we introduce a novel and lightweight framework, relying on a time-driven algorithm to detect Distributed Denial of Service (DDoS) attacks by employing Machine Learning (ML) algorithms leveraging the newly engineered features containing system and network utilization information. These features are periodically generated, and there are only ten of them, resulting in a low and constant algorithmic complexity. Moreover, we leverage IoT-specific patterns to detect malicious traffic as we argue that each Denial of Service (DoS) attack leaves a unique fingerprint in the proposed set of features. We construct a dataset by launching some of the most prevalent DoS attacks against an IoT device, and we demonstrate the effectiveness of our approach with high accuracy. The results show that standalone IoT devices can detect and classify DoS and, therefore, arguably, DDoS attacks against them at a low computational cost with a deterministic delay.

  17. ROSPaCe_reduced_noperiodicity

    • springernature.figshare.com
    application/x-gzip
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso Puccetti; Simone Nardi; Cosimo Cinquilli; Tommaso Zoppi; Andrea Ceccarelli (2024). ROSPaCe_reduced_noperiodicity [Dataset]. http://doi.org/10.6084/m9.figshare.25594854.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    May 3, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tommaso Puccetti; Simone Nardi; Cosimo Cinquilli; Tommaso Zoppi; Andrea Ceccarelli
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Reduced version of the no periodicity dataset applying the same methodology reported for the ROSPaCe reduced dataset.

  18. i

    TCP FIN Flood and Zbassocflood Dataset

    • ieee-dataport.org
    • zenodo.org
    Updated Apr 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deris Stiawan (2021). TCP FIN Flood and Zbassocflood Dataset [Dataset]. https://ieee-dataport.org/documents/tcp-fin-flood-and-zbassocflood-dataset
    Explore at:
    Dataset updated
    Apr 5, 2021
    Authors
    Deris Stiawan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Development of an Internet of Things (IoT) Network Traffic Dataset with Simulated Attack Data.Abstract— This research focuses on the requirements for and the creation of an intrusion detection system (IDS) dataset for an Internet of Things (IoT) network domain.

  19. i

    IoT DoS and DDoS Attack Dataset

    • ieee-dataport.org
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Hussain (2025). IoT DoS and DDoS Attack Dataset [Dataset]. https://ieee-dataport.org/documents/iot-dos-and-ddos-attack-dataset
    Explore at:
    Dataset updated
    May 10, 2025
    Authors
    Faisal Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    etc.

  20. W

    Wireless Intrusion Detection System Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Wireless Intrusion Detection System Report [Dataset]. https://www.archivemarketresearch.com/reports/wireless-intrusion-detection-system-52156
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Wireless Intrusion Detection System (WIDS) market is experiencing robust growth, projected to reach $202.7 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 10.2% from 2025 to 2033. This expansion is driven by the increasing prevalence of wireless networks across various sectors, heightened cybersecurity concerns, and the rising adoption of cloud-based solutions. The demand for secure wireless infrastructure is particularly strong in industries like finance, government, IT and telecom, and healthcare, where sensitive data necessitates robust protection against unauthorized access and cyber threats. The market's segmentation reflects this diverse application landscape, with both on-premises and cloud-based WIDS solutions witnessing significant adoption. Key players like Cisco, IBM, and Check Point are actively shaping market dynamics through technological innovations and strategic partnerships, further fueling market growth. While regulatory compliance requirements contribute to market expansion, potential restraints include the complexity of integrating WIDS into existing network infrastructure and the ongoing evolution of sophisticated cyberattacks that demand constant adaptation of security measures. The projected growth trajectory suggests a substantial market expansion in the coming years. Factors such as the increasing adoption of IoT devices, the expansion of 5G networks, and the growing need for advanced threat detection capabilities are expected to further accelerate the demand for WIDS. The competitive landscape is characterized by both established players and emerging vendors, creating a dynamic environment marked by innovation and strategic acquisitions. Regional variations in market growth are expected, with North America and Europe initially leading the way, followed by increasing adoption in the Asia-Pacific region fueled by economic growth and digital transformation initiatives. Future market success will hinge on vendors' ability to deliver scalable, cost-effective, and easily deployable solutions that address the evolving security needs of diverse industries.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cyber Cop (2023). IoT Intrusion Detection [Dataset]. http://doi.org/10.34740/kaggle/dsv/6142327
Organization logo

IoT Intrusion Detection

Intrusion Detection in Internet of Things Network

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cyber Cop
License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai

There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.

Search
Clear search
Close search
Google apps
Main menu