18 datasets found
  1. IoT Intrusion Detection

    • kaggle.com
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyber Cop (2023). IoT Intrusion Detection [Dataset]. http://doi.org/10.34740/kaggle/dsv/6142327
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cyber Cop
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai

    There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.

  2. Network Intrusion Detection

    • kaggle.com
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Şahide ŞEKER (2025). Network Intrusion Detection [Dataset]. https://www.kaggle.com/datasets/sahideseker/network-intrusion-detection/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Şahide ŞEKER
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🇺🇸 English:

    This dataset simulates network traffic to help build intrusion detection models. It includes source/destination IPs, protocols, connection durations, and labels for different types of attacks.

    Use this dataset to:

    • Train anomaly detection or classification models
    • Experiment with imbalanced cybersecurity data
    • Build intrusion detection systems with ML algorithms like XGBoost or Isolation Forest

    Features:

    • src_ip: Source IP address
    • dst_ip: Destination IP address
    • protocol: Network protocol (TCP, UDP, ICMP)
    • duration: Duration of the connection
    • attack: Attack type label (e.g., normal, dos, probe, etc.)

    🇹🇷 Türkçe:

    Bu veri seti, siber güvenlik alanında ağ trafiği üzerinden saldırı tespiti yapılmasını sağlamak için oluşturulmuştur. Kaynak/varış IP'leri, protokol, bağlantı süresi ve saldırı türü etiketlerini içerir.

    Bu veri seti ile:

    • Dengesiz veri üzerinde anomali tespiti yapabilirsiniz
    • Saldırı sınıflandırma algoritmaları geliştirebilirsiniz
    • XGBoost ve Isolation Forest gibi algoritmaları test edebilirsiniz

    Özellikler:

    • src_ip: Kaynak IP adresi
    • dst_ip: Hedef IP adresi
    • protocol: Ağ protokolü (TCP, UDP, ICMP)
    • duration: Bağlantı süresi
    • attack: Saldırı türü etiketi (örneğin normal, dos, probe vs.)
  3. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  4. P

    Kitsune Network Attack Dataset Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yisroel Mirsky; Tomer Doitshman; Yuval Elovici; Asaf Shabtai (2023). Kitsune Network Attack Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/kitsune-network-attack-dataset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Authors
    Yisroel Mirsky; Tomer Doitshman; Yuval Elovici; Asaf Shabtai
    Description

    Kitsune Network Attack Dataset This is a collection of nine network attack datasets captured from a either an IP-based commercial surveillance system or a network full of IoT devices. Each dataset contains millions of network packets and diffrent cyber attack within it.

    For each attack, you are supplied with:

    A preprocessed dataset in csv format (ready for machine learning) The corresponding label vector in csv format The original network capture in pcap format (in case you want to engineer your own features)

    We will now describe in detail what's in these datasets and how they were collected.

    The Network Attacks We have collected a wide variety of attacks which you would find in a real network intrusion. The following is a list of the cyber attack datasets avalaible:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F827271%2F79e305668553e521b0709a2413323c45%2Fkaggle_dataset_table.png?generation=1598461684070844&alt=media" alt="image" width="100">

    For more details on the attacks themselves, please refer to our NDSS paper (citation below).

    The Data Collection The following figure presents the network topologies which we used to collect the data, and the corrisponding attack vectors at which the attacks were performed. The network capture took place at point 1 and point X at the router (where a network intrusion detection system could feasibly be placed). For each dataset, clean network traffic was captured for the first 1 million packets, then the cyber attack was performed.

    The Dataset Format Each preprocessed dataset csv has m rows (packets) and 115 columns (features) with no header. The 115 features were extracted using our AfterImage feature extractor, described in our NDSS paper (see below) and available in Python here. In summary, the 115 features provide a statistical snapshot of the network (hosts and behaviors) in the context of the current packet traversing the network. The AfterImage feature extractor is unique in that it can efficiently process millions of streams (network channels) in real-time, incrementally, making it suitable for handling network traffic.

    Citation If you use these datasets, please cite:

    @inproceedings{mirsky2018kitsune, title={Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection}, author={Mirsky, Yisroel and Doitshman, Tomer and Elovici, Yuval and Shabtai, Asaf}, booktitle={The Network and Distributed System Security Symposium (NDSS) 2018}, year={2018} }

  5. BATADAL: Cyber Attacks Detection in Water Systems

    • kaggle.com
    Updated Feb 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minh T. Nguyen (2023). BATADAL: Cyber Attacks Detection in Water Systems [Dataset]. https://www.kaggle.com/datasets/minhbtnguyen/batadal-a-dataset-for-cyber-attack-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Minh T. Nguyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Insights On Cyberbiosecurity Field

    According to Cyberbiosecurity: A New Perspective on Protecting U.S. Food and Agricultural System: "The US's national data and infrastructure security issues affecting the “bioeconomy” are evolving rapidly. Simultaneously, the conversation about cyber security of the U.S. food and agricultural system (cyber biosecurity) is incomplete and disjointed. The food and agricultural production sectors influence over 20% of the nation's economy ($6.7T) and 15% of U.S. employment (43.3M jobs). The food and agricultural sectors are immensely diverse and they require advanced technologies and efficiencies that rely on computer technologies, big data, cloud-based data storage, and internet accessibility. There is a critical need to safeguard the cyber biosecurity of our bio economy, but currently protections are minimal and do not broadly exist across the food and agricultural system."

    Cyberbiosecurity is an emerging discipline for protecting life sciences data, functions, operation, and the bio-economy. https://www.frontiersin.org/files/Articles/447748/fbioe-07-00063-HTML-r1/image_m/fbioe-07-00063-g001.jpg" alt="">

    Insights On The Dataset

    The BATtle of the Attack Detection ALgorithms (BATADAL) will objectively compare the performance of algorithms for the detection of cyber attacks in water distribution systems.

    C-Town Public Utility (CPU) is the main water distribution system operator of C-Town (Figure 1). For many years, CPU has operated a static distribution topology. In the last year, CPU has introduced novel smart technology to enable remote data collection from sensors in the field, and remote control of actuators. Shortly after that new technology has been introduced, anomalous low levels in Tank T5 and high levels in Tank T1 were observed. A month later, a water overflow in Tank T1 occurred. While CPU personnel at the control center were able to see the anomalous readings for the first two episodes, Tank T1 overflow took place unexpectedly while the water level readings were always below the alarm thresholds and pumping operations appeared to be normal. Searching for the causes, CPU engineers suspect potential cyber-attacks for all these episodes. In particular, they are considering adversaries that are able to activate and deactivate the actuators in C-Town, as well as altering the readings of the sensors deployed in the network and the reported status of actuators, and interfering with the connections established between networked components. The participants' task is thus to develop an online alert system for cyber-physical attacks.

    Additional Information About The Dataset From Other Papers: - SCADA data are real-time, field-based network measurements (tank water level, pump flow, etc.) transmitted to the central system by programmable logic controllers (PLCs) - C-Town consists of 388 nodes linked with 429 pipes and is divided into 5 district metered areas (DMAs). - More specifically, the SCADA data include the water level at all 7 tanks of the network (T1–T7), status and flow of all 11 pumps (PU1–PU11) and the one actuated valve (V2) of the network, and pressure at 24 pipes of the network that correspond to the inlet and outlet pressure of the pumps and the actuated valve.

    https://ascelibrary.org/cms/asset/1375ee5f-cc1e-498f-8a03-1180c61ee9fe/figure1.jpg" alt="">

    Graph Annotation: - L_T #: water level of a tank # [meter].​ - S_PU # or S_V # : status of a pump # or a valve # [dmnl]. Binary signal.​ - F_PU # or F_V # : flowrate of a pump # or a valve # [L/s].​ - P_J # : inlet and outlet pressure for a junction # [meter].

    Dataset Details (TL:DR): - There are 43 columns and a 1/0 label column, with 1 meaning that the system is under attack and 0 meaning that the system is in normal operation. - Training Dataset 1: This dataset was released on November 20 2016, and it was generated from a one-year long simulation. The dataset does not contain any attacks, i.e. all the data pertains to C-Town normal operations. - Training Dataset 2: This dataset with partially labeled data was released on November 28 2016. The dataset is around 6 months long and contains several attacks, some of which are approximately labeled. - Test Dataset: This 3-months long dataset contains several attacks but no labels. The dataset was released on February 20 2017, and it is used to compare the performance of the algorithms (see rules document for details).

    Notes

  6. i

    Unified Multimodal Network Intrusion Detection Systems Dataset

    • ieee-dataport.org
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Wali Rizvi (2024). Unified Multimodal Network Intrusion Detection Systems Dataset [Dataset]. https://ieee-dataport.org/documents/unified-multimodal-network-intrusion-detection-systems-dataset
    Explore at:
    Dataset updated
    Oct 19, 2024
    Authors
    Syed Wali Rizvi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    and contextual features

  7. P

    UNSW-NB15 Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Feb 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nour Moustafa; Jill Slay (2021). UNSW-NB15 Dataset [Dataset]. https://paperswithcode.com/dataset/unsw-nb15
    Explore at:
    Dataset updated
    Feb 20, 2021
    Authors
    Nour Moustafa; Jill Slay
    Description

    UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.

    Paper: UNSW-NB15: a comprehensive data set for network intrusion detection systems

  8. CICIoT2023

    • kaggle.com
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HIMADRI07 (2025). CICIoT2023 [Dataset]. https://www.kaggle.com/datasets/himadri07/ciciot2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HIMADRI07
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The CICIoT2023 dataset is a comprehensive and modern dataset designed for research in Internet of Things (IoT) security, particularly for intrusion detection and anomaly detection systems. Released by the Canadian Institute for Cybersecurity (CIC), this dataset reflects real-world IoT network traffic and attack scenarios, providing a valuable resource for machine learning and cybersecurity research.

    The dataset was generated using a realistic testbed that simulates various IoT devices communicating over a network, including smart TVs, webcams, smart thermostats, and wearable devices. It captures both benign traffic and a wide variety of attack types such as Denial of Service (DoS), Distributed Denial of Service (DDoS), brute-force attacks, botnets, reconnaissance, and more advanced threats.

    Key Features of CICIoT2023:

    Contains a mix of normal and malicious IoT network traffic.

    Includes 34 distinct attack types, covering modern and advanced cyber threat scenarios.

    Provides labeled data suitable for supervised machine learning models.

    Offers extracted network flow features (e.g., packet size, duration, flags, statistical summaries) which can be used for traffic classification and anomaly detection.

    Supports research in intrusion detection, anomaly detection, and IoT security strategy development.

    This dataset helps bridge the gap between traditional network security datasets and the unique, evolving patterns of IoT device communication, making it an excellent benchmark for evaluating the performance of AI-based security solutions.

    I have further broken downed the data into these 3 parts Train: (5491971, 47) Validation: (1176851, 47) Test: (1176851, 47)

  9. P

    TII-SSRC-23 Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dania Herzalla; Willian T. Lunardi; Martin Andreoni Lopez, TII-SSRC-23 Dataset [Dataset]. https://paperswithcode.com/dataset/tii-ssrc-23
    Explore at:
    Authors
    Dania Herzalla; Willian T. Lunardi; Martin Andreoni Lopez
    Description

    The TII-SSRC-23 dataset offers a comprehensive collection of network traffic patterns, meticulously compiled to support the development and research of Intrusion Detection Systems (IDS). It presents a dual structure: one part provides a tabular representation of extracted features in CSV format, while the other offers raw network traffic data for each type of traffic in PCAP files. This rich dataset captures both benign and malicious network scenarios, serving as an invaluable resource for researchers in the machine learning field.

    URL: https://www.kaggle.com/datasets/daniaherzalla/tii-ssrc-23

  10. Large-Scale Attacks in IoT Environment

    • kaggle.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Manaenkov (2025). Large-Scale Attacks in IoT Environment [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/large-scale-attacks-in-iot-environment
    Explore at:
    zip(1474647877 bytes)Available download formats
    Dataset updated
    May 7, 2025
    Authors
    Nikita Manaenkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CICIoT2023 dataset is a large-scale, realistic intrusion detection dataset designed to support security analytics and machine learning research in the Internet of Things (IoT) domain. Created by the Canadian Institute for Cybersecurity (CIC), the dataset captures 33 different types of attacks (including DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai) executed by malicious IoT devices against other IoT targets.

    The testbed consists of 105 real IoT devices of different types and manufacturers, including smart home devices and industrial equipment, configured in a complex network topology to emulate real-world conditions. The dataset includes benign and malicious traffic in various formats and supports feature extraction for both traditional ML and deep learning models.

    This dataset aims to address the lack of diversity and scale in previous IoT security datasets, offering a robust benchmark for evaluating intrusion detection systems (IDS) and enabling research in IoT cybersecurity, anomaly detection, and network forensics.

    Source https://www.mdpi.com/1424-8220/23/13/5941

  11. P

    MQTT-IoT-IDS2020 Dataset

    • paperswithcode.com
    • opendatalab.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MQTT-IoT-IDS2020 Dataset [Dataset]. https://paperswithcode.com/dataset/mqtt-iot-ids2020
    Explore at:
    Description

    Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models.

    MQTT-IoT-IDS2020 is the first dataset to simulate an MQTT-based network. The dataset is generated using a simulated MQTT network architecture. The network comprises twelve sensors, a broker, a simulated camera, and an attacker. Five scenarios are recorded: (1) normal operation, (2) aggressive scan, (3) UDP scan, (4) Sparta SSH brute-force, and (5) MQTT brute-force attack. The raw pcap files are saved, then features are extracted. Three abstraction levels of features are extracted from the raw pcap files: (a) packet features, (b) Unidirectional flow features and (c) Bidirectional flow features. The csv feature files in the dataset are suited for Machine Learning (ML) usage. Also, the raw pcap files are suitable for the deeper analysis of MQTT IoT networks communication and the associated attacks.

  12. UNB CIC IOT Dataset 2023 (Updated 2024-10-08)

    • kaggle.com
    zip
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Abdul Al Emon (2025). UNB CIC IOT Dataset 2023 (Updated 2024-10-08) [Dataset]. https://www.kaggle.com/datasets/mdabdulalemo/cic-iot-dataset2023-updated-2024-10-08
    Explore at:
    zip(3264262523 bytes)Available download formats
    Dataset updated
    May 24, 2025
    Authors
    Md. Abdul Al Emon
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The CIC IoT Dataset 2023 is a comprehensive benchmark developed by the Canadian Institute for Cybersecurity (CIC) to advance intrusion detection research in real-world Internet of Things (IoT) environments. This dataset was created using a network of 105 actual IoT devices, encompassing smart home gadgets, sensors, and cameras, to simulate authentic IoT traffic and attack scenarios.

    Key Features:

    • Diverse Attack Scenarios: The dataset includes 33 distinct attacks categorized into seven classes: DDoS, DoS, Reconnaissance, Web-based, Brute Force, Spoofing, and Mirai. These attacks were executed by compromised IoT devices targeting other IoT devices, reflecting realistic threat vectors.(University of New Brunswick)

    • Extensive Data Collection: Network traffic was captured in real-time, resulting in over 46 million records. The data is available in various formats, including raw PCAP files and pre-extracted CSV features, facilitating different research needs.

    • Realistic IoT Topology: Unlike many datasets that rely on simulations, this dataset was generated using a large-scale IoT testbed with devices from multiple vendors, providing a heterogeneous and realistic network environment.

    • Benchmarking and Evaluation: The dataset has been utilized to evaluate the performance of machine learning and deep learning algorithms in classifying and detecting malicious versus benign IoT network traffic.(University of New Brunswick)

    This dataset serves as a valuable resource for researchers and practitioners aiming to develop and test security analytics applications, intrusion detection systems, and other cybersecurity solutions tailored for IoT ecosystems.(University of New Brunswick)

  13. Cybersecurity Threat and Awareness Program Dataset

    • kaggle.com
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DatasetEngineer (2024). Cybersecurity Threat and Awareness Program Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/9665651
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DatasetEngineer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Title: Cybersecurity Threat Detection and Awareness Program Dataset (2018-2024)

    Description: This dataset provides a comprehensive collection of cybersecurity events and network traffic data, spanning from January 2018 to March 2024, collected from real-world corporate environments in Texas, USA. The data includes a diverse range of cybersecurity incidents, covering normal activity as well as various types of threats. It was gathered from multiple sources, such as network traffic logs, system logs, and external threat intelligence feeds, making it suitable for developing machine learning models aimed at threat detection, incident response, and cybersecurity awareness improvement.

    The dataset is well-suited for research and experimentation in threat intelligence, intrusion detection, cybersecurity awareness training, and anomaly detection. The included features allow for the modeling of various threat scenarios and multi-class classification tasks. The labeled data provides information on the severity and type of threats detected, supporting both supervised and unsupervised learning techniques.

    Features Overview:

    Date_Time: The timestamp of the event (e.g., 2022-05-01 14:30:00), indicating when the activity or incident occurred.

    Source_IP: IP address of the originating device involved in the event (e.g., 192.168.1.1).

    Destination_IP: IP address of the target device involved in the event (e.g., 10.0.0.5).

    Source_Port: Port number on the originating device (e.g., 443).

    Destination_Port: Port number on the target device (e.g., 80).

    Protocol_Type: The protocol used for the communication, such as TCP, UDP, ICMP.

    Flow_Duration: Duration of the network flow in milliseconds.

    Packet_Size: The size of the packet in bytes.

    Flow_Bytes/s: The number of bytes transmitted per second during the flow.

    Flow_Packets/s: The number of packets transmitted per second during the flow.

    Total_Forward_Packets: Total number of packets sent in the forward direction.

    Total_Backward_Packets: Total number of packets sent in the reverse direction.

    Packet_Length_Mean: Average packet length for the flow.

    IAT_Forward: Inter-arrival time for packets in the forward direction.

    IAT_Backward: Inter-arrival time for packets in the reverse direction.

    Active_Duration: Duration of active time for the connection.

    Idle_Duration: Duration of idle time for the connection.

    IDS_Alert_Count: Number of intrusion detection system alerts triggered during the event.

    Anomaly_Score: A score indicating the anomaly level of the event, derived from anomaly detection algorithms.

    Attack_Vector: Type of attack vector used (e.g., Phishing, DDoS, Brute Force).

    Attack_Severity: Severity of the detected threat, categorized as Low, Medium, High, or Critical.

    Compromised_Hosts_Count: Number of hosts compromised during the event.

    Botnet_Family: Family of botnet detected (if applicable), such as Mirai, Zeus.

    Malware_Type: Type of malware detected, such as Ransomware, Trojan.

    User_Login_Attempts: Number of login attempts during the event.

    Geolocation: Geographic location of the originating IP (Country, City).

    Device_Type: Type of device involved (e.g., Server, Router, Mobile).

    Firewall_Logs: Binary indicator (0 or 1) showing whether firewall logs flagged the activity.

    Antivirus_Alerts: Binary indicator (0 or 1) showing whether antivirus software detected a threat.

    Open_Ports_Count: Number of open ports on the target device.

    Reputation_Score: A score indicating the reputation of the IP/domain based on threat intelligence sources.

    Blacklisted_IP: Binary indicator (0 or 1) indicating if the IP is listed on a blacklist.

    Known_Vulnerability: Binary indicator (0 or 1) showing if the target system has known vulnerabilities (based on CVE).

    Threat_Intelligence_Source: Source from which the threat intelligence information was gathered.

    System_Patch_Status: Indicates whether the system is patched (Up-to-date, Outdated).

    CPU_Utilization: CPU usage percentage during the event.

    Memory_Utilization: Memory usage percentage during the event.

    Employee_Training_Completion: Completion rate of cybersecurity awareness training for the employee involved.

    Phishing_Simulation_Success: Result of phishing simulation attempts (Success, Failure).

    Reported_Incidents: Number of cybersecurity incidents reported by the user.

    Incident_Response_Time: Time taken to respond to the incident in minutes.

    Label (Target Variable):

    Threat_Severity: The severity level of the threat, categorized as: 0: No Threat 1: Low-Level Threat 2: Medium-Level Threat 3: High-Level Threat 4: Critical Threat Usage: This dataset is ideal for training and testing machine learning models for tasks such as:

    Multi-class classification for threat detection. Anomaly detection. Predictive modeling for incident response prioritization. Cybersecurity awareness program improvement. Researchers and...

  14. HAI Security Dataset

    • kaggle.com
    zip
    Updated Apr 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ICS Security Dataset (2022). HAI Security Dataset [Dataset]. https://www.kaggle.com/icsdataset/hai-security-dataset
    Explore at:
    zip(487855254 bytes)Available download formats
    Dataset updated
    Apr 27, 2022
    Authors
    ICS Security Dataset
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    HIL-based Augmented ICS (HAI) Security Dataset

    The HAI dataset was collected from a realistic industiral control system (ICS) testbed augmented with a Hardware-In-the-Loop (HIL) simulator that emulates steam-turbine power generation and pumped-storage hydropower generation.

    Click here to find out more about the HAI dataset.

    Please e-mail us here if you have any questions about the dataset.

    Background

    • In 2017, three laboratory-scale CPS testbeds were initially launched, namely GE’s turbine testbed, Emerson’s boiler testbed, and FESTO’s modular production system (MPS) water-treatment testbed. These testbeds are related to relatively simple processes, and were operated independently of each other.

    • In 2018, a complex process system was built to combine the three systems using a HIL simulator, where generation of thermal power and pumped-storage hydropower was simulated. This ensured that the variables were highly coupled and correlated for a richer dataset. In addition, an open platform communications united architecture (OPC-UA) gateway was installed to facilitate data collection from heterogeneous devices.

    • The first version of HAI dataset, HAI 1.0, was made available on GitHub and Kaggle in February 2020. This dataset included ICS operational data from normal and anomalous situations for 38 attacks. Subsequently, a debugged version of HAI 1.0, namely HAI 20.07, was released for the HAICon 2020 competition in August 2020.

    • HAI 21.03 was released in 2021, and was based on a more tightly coupled HIL simulator to produce clearer attack effects with additional attacks. This version provides more quantitative information and covers a variety of operational situations, and provides better insights into the dynamic changes of the physical system.

    • HAI 22.04 contained more sophisticated attacks that are significantly more difficult to detect than those in the previous versions. Comparing only the baseline TaPRs of HAICon 2020 and HAICon 2021, detection difficulty in HAI 22.04 is approximately four times higher than HAI 21.03.

    HAI Testbed

    The testbed consists of four different processes: boiler process, turbine process, water treatement process and HIL simulation:

    • Boiler Process (P1): This includes water-to-water heat trasfer at a low pressure and a moderate temperature. This process is controlled using Emerson Ovation DCS.
    • Turbine Process (P2): A rotor kit process that closely simulates the behavior of an actual rotating machine. It is controlled by GE's Mark VIe DCS.
    • Water treatment Process (P3): This process includes pumping water to the upper reservoir and releasing it back into the lower reservoir. It is controlled by Siemens's S7-300 PLC.

    • HIL Simulation(P4): Both the boiler and turbine processes are interconnected to synchronize with the rotating speed of the virtual steam-turbine power generation model. The pump and value in the water-treatment process are controlled by the pumped-storage hydropower generation model. The dSPACE's SCALEXIO system is used for the HIL simulations and is interconnected with the real-world processes through a Siemens S7-1500 PLC and ET200 remote IO devices for data-acquisition system based on the OPC gateway.

    HAI Datasets

    Two major versions of HAI datasets have been released thus far. Each dataset consists of several CSV files, and each file satisfies time continuity. The quantitative summary of each version are as follows:

    Note: The version numbering follows a date-based scheme, where the version number indicates the released date of the HAI dataset. HAI 20.07 is the bug-fixed version of HAI v1.0 released in February 2020.

    versionData Points
    (points/sec)
    Normal Datset
    Files(interval, size)
    Attack Dataset
    Files (interval, size, attack count)
    HAI 22.0486train1.csv ( 26 hours, 51 MB)
    train2.csv ( 56 hours, 109 MB)
    train3.csv (35 hours, 67 MB)
    train4.csv (24 hours, 46 MB)
    train5.csv ( 66 hours, 125 MB)
    train6.csv (72 hours, 137 MB))
    test1.csv (24 hours, 48 MB, 07 attacks)
    test2.csv (23 hours, 45 MB, 17 attacks)
    test3.csv (17 hours, 33 MB, 10 attacks)
    test4.csv (36hours, 70MB, 24 attacks)

    |HAI 21.03|78|train1.csv ( 60 hours, 100 MB)
    train2.csv ( 63 hours, 116 MB)
    train3.csv (229 hours, 246 MB) | test1.csv (12 hours, 22 MB, 05 attacks)
    test2.csv (33 hours, 62 MB, 20 attacks)
    test3.csv (30 hours, 56 MB, 08 attacks)
    test4.csv (11 hours, 20MB, 05 attacks)
    test5.csv (26 hours, 48MB, 12 attacks)| |HAI 20.07
    (HAI 1.0)| 59| train1.csv (86 hours, 127 MB)
    train2.csv (91 hours, 98 MB) | test1.csv (81 hours, 119 MB)
    test2.csv (42 hours, 62 MB)|

    Data fields

    The time-series data in each CSV file satisfies time continuity. The first column represents the observed time as “yyyy-MM-dd hh:mm:ss,” while the rest columns provide the recorded SCADA data points. The last four columns provide data labels for whether an attack occurred or not, where the attack column was applicable to all process and the other three columns were for the corresponding control processes.

    Refer to the latest technical manual for the details for each column.

    timeP1_B2004P2_B2016...P4_HT_LDattackattack_P1...attack_P3
    20190926 13:00:000.098301.07370...000...0
    20190926 13:00:010.098301.07410...010...1
    20190926 13:00:020.098301.07380...010...1
    20190926 13:00:030.098301.07360...011...1
    20190926 13:00:040.098301.07430...011...1

    Getting the dataset

    Type git clone, and the paste the below URL. $ git clone https://github.com/icsdataset/hai To unzip multiple gzip files, you can use: $ gunzip *.gz

    Performance Evaluation

    Use of eTaPR (Enhanced Time-series Aware Precision and Recall) metric is strongly recommended to evaluate your anomaly detection model, which provides fairness to performance comparisons with other studies. Got something to suggest? Let us know!

    Projects using the dataset

    Here are some projects and experiments that are using or featuring the dataset in interesting ways. Got something to add? Let us know!

    The related projects so far are as follows.

    Anomaly Detection

    Year 2022

    1. Benchmarking machine learning based detection of cyber attacks for critical infrastructure
    2. A Hybrid Algorithm Incorporating Vector Quantization and One-Class Support Vector Machine for industrial Anomaly Detection
    3. Variational restricted Boltzmann machines to automated anomaly detection

      Year 2021

    4. Research on improvement of anomaly detection performance in industrial control systems

    5. E-sfd: Explainable sensor fault detection in the ics anomaly detection system

    6. Stacked-autoencoder based anomaly detection with industrial control system

    7. Improved mitigation of cyber threats in iiot for smart cities: A new-era approach and scheme

    8. Towards building intrusion detection systems for multivariate time-series data

    9. Revitalizing self-organizing map: Anomaly detection using forecasting error patterns

    10. Cluster-based deep one-class classification model for anomaly detection

    11. Measurement data intrusion detection in industrial control systems based on unsupervised learning

    12. A machine learning approach for anomaly detection in industrial control systems based on measurement data

    Year 2020

    1. Anomaly detection in time-series data environment
    2. Detecting anomalies in time-series data using unsupervised learning and analysis on infrequent signatures

    Testbed/Dataset

    Year 2021

    1. Probabilistic attack sequence generation and execution based on mitre att&ck for ics datasets

    Year 2020

    1. [Expansion of ICS testbed for security validation based on MITRE ATT&CK techniques][TB_20_01]
    2. [Expanding a programmable cps testbed for network attack analysis][TB_20_02]
    3. [Co-occurrence based security event analysis and visualization for cyber physical systems][TB_20_03]
  15. m

    Web page phishing detection

    • data.mendeley.com
    Updated Jun 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelhakim Hannousse (2021). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.3
    Explore at:
    Dataset updated
    Jun 25, 2021
    Authors
    Abdelhakim Hannousse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.

    dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

    dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.

  16. Spam Email Detection Model

    • kaggle.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usamakhanswati (2024). Spam Email Detection Model [Dataset]. http://doi.org/10.34740/kaggle/ds/5524456
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2024
    Dataset provided by
    Kaggle
    Authors
    Usamakhanswati
    Description

    In a bustling digital landscape where businesses and individuals alike rely on email communication, a hidden threat lurks—spam emails. These unsolicited and often malicious messages clog inboxes, steal valuable time, and even endanger sensitive data. The need for a powerful shield against this growing menace has never been more urgent.

    Enter the Spam Email Detection Model—a cutting-edge creation designed to bring order to the chaos of modern email communication. Imagine a business owner named Sarah, whose company relies heavily on email for client communication, order processing, and customer support. Every day, her inbox is flooded with hundreds of emails, many of which are nothing but spam. These emails not only waste her time but also pose a risk to her company's security. The Spam Email Detection Model is a state-of-the-art solution designed to combat the ever-growing threat of spam emails with unparalleled accuracy and efficiency. Leveraging advanced machine learning algorithms, this model achieves a remarkable 99.9% accuracy rate, far surpassing the industry standard of 50%. It intelligently distinguishes between legitimate emails and spam, learning and adapting to new patterns to ensure ongoing protection.

    Designed for seamless integration, the model can be easily implemented into any existing email system, providing businesses with a robust defense against unsolicited messages and potential security threats. Its user-friendly interface allows for effortless control and customization, making it a versatile tool for businesses of all sizes.

    By dramatically reducing the time wasted on managing spam and enhancing email security, the Spam Email Detection Model empowers businesses to focus on what truly matters, offering peace of mind in a world where digital communication is vital.

  17. Location Intelligence for Cybersecurity 2025

    • kaggle.com
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wisam Abdullah (2025). Location Intelligence for Cybersecurity 2025 [Dataset]. http://doi.org/10.34740/kaggle/dsv/10694937
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wisam Abdullah
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset is designed to analyze the relationship between cyber attacks, geographic locations, and Internet of Things (IoT) device types. The data has been collected from multiple sources, including cybersecurity incident reports, infrastructure data, environmental conditions, and transportation networks. With a total of 65,450 records, this dataset provides valuable insights for cybersecurity research, smart cities, and artificial intelligence applications.

    Dataset Columns and Their Descriptions 1. ID This is a unique identifier assigned to each record in the dataset. It is stored as an integer and serves as the primary key for tracking individual entries.

    1. Latitude This column represents the latitude coordinate of the geographic location where the cyber attack or IoT device activity occurred. It is stored as a floating-point number, with values ranging from -90 to 90.

    2. Longitude Similar to latitude, this column stores the longitude coordinate of the location. It is also a floating-point number, with values ranging from -180 to 180.

    3. Location Type This field describes the type of location where the cyber attack or IoT device was recorded. It is stored as a categorical string and includes values such as "Railway," "Gas Station," "Hospital," "City Boundary," and "River".

    4. Elevation (m) This column contains the elevation of the recorded location, measured in meters above sea level. It is a floating-point number, typically ranging from 0 to 5000 meters.

    5. Population Density (people/km²) This field provides the population density of the given location, measured in people per square kilometer. It is stored as an integer, with values ranging from 50 to 10,000.

    6. Temperature (°C) This column records the temperature at the time of the cyber attack or IoT device activity, measured in degrees Celsius. It is stored as a floating-point number, with values between -30 and 50°C.

    7. Humidity (%) This field represents the relative humidity of the location at the time of the event. It is stored as a floating-point number, with values ranging from 10% to 100%.

    8. Rainfall (mm) This column captures the amount of rainfall in the location at the given time, measured in millimeters. It is stored as a floating-point number, with values ranging from 0 to 300 mm.

    9. Infrastructure Type This field indicates the type of infrastructure present at the location of the event. It is stored as a categorical string and includes values like "Bridge," "Sewage System," "Dam," and "Power Line."

    10. Air Quality Index (AQI) This column records the Air Quality Index (AQI) at the time of the attack or device activity. It is stored as an integer, with values ranging from 0 (clean air) to 500 (hazardous air quality).

    11. Traffic Flow (vehicles/hour) This field provides the number of vehicles passing through the location per hour. It is stored as an integer, typically ranging from 100 to 5000 vehicles per hour.

    12. Public Transport Station This column describes the nearest public transport station to the event location. It is stored as a categorical string and includes values such as "Bus Stop," "Metro Station," and "None."

    13. Cyber Attack Type This field identifies the type of cyber attack recorded at the given location. It is stored as a categorical string and includes values like "Phishing," "DDoS," "Malware," "Zero-Day Exploit," and "SQL Injection."

    14. IoT Device Category This column categorizes the type of IoT device involved in the event. It is stored as a categorical string and includes categories like "Smart Home," "Industrial IoT," "Smart City," "Wearable," and "Healthcare IoT."

    15. IoT Device Type This field provides a more detailed classification of the specific IoT device within the assigned category. It is stored as a categorical string and includes values such as "Smartwatch," "Security Camera," "IoT Sensor," "Smart Meter," and "Remote Patient Monitor."

    This dataset is structured to enable comprehensive cybersecurity, geospatial, and AI-driven analyses, making it valuable for research in cyber attack prevention, IoT security, and smart city planning.

    Dataset Contents The dataset includes geographic information such as latitude, longitude, and elevation, which help identify the most targeted areas for cyber attacks. It also contains population data, allowing for an analysis of how population density influences cyber threats. Environmental factors like temperature, humidity, and rainfall are included, providing insights into the impact of weather conditions on IoT security.

    Additionally, the dataset contains infrastructure-related data such as the type of facilities present at the attack locations, including bridges, sewage systems, and power lines. It also includes information on air quality (AQI index) and traffic flow data, helping analyze how congestion levels might be linked to cyber threats...

  18. Blockchain Finance Systems Using 6G

    • kaggle.com
    Updated Feb 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Programmer3 (2025). Blockchain Finance Systems Using 6G [Dataset]. https://www.kaggle.com/datasets/programmer3/blockchain-finance-systems-using-6g/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Programmer3
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is crafted to support research on optimizing blockchain finance systems within a 6G network virtualization environment, with cloud computing integration. It contains 1,000 simulated records representing various transactions in a decentralized finance system, with each entry capturing detailed parameters such as transaction type, size, latency, throughput, network congestion, cloud resource allocations, and more. The dataset includes both feature data and the target column, Transaction Success/Failure, representing whether a transaction was successfully processed or failed.

    Key features in the dataset include transaction size, network conditions, cloud CPU and RAM allocations, machine-type communication levels, latency and throughput performance, network bandwidth, and security/privacy metrics, all critical factors in optimizing blockchain performance. It aims to simulate real-world challenges such as latency reduction, transaction speed enhancement, and network congestion alleviation in blockchain financial applications.

    Ideal for developing deep learning models, this dataset can be used to explore blockchain performance optimization algorithms, test resource allocation strategies, and investigate the impact of 6G network features on blockchain-based financial systems. It provides a rich environment for experimentation, allowing researchers to focus on improving transaction throughput, reducing latency, and ensuring system security and privacy in next-generation financial networks.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cyber Cop (2023). IoT Intrusion Detection [Dataset]. http://doi.org/10.34740/kaggle/dsv/6142327
Organization logo

IoT Intrusion Detection

Intrusion Detection in Internet of Things Network

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cyber Cop
License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai

There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.

Search
Clear search
Close search
Google apps
Main menu