16 datasets found
  1. i

    M-CAN Intrusion Detection Dataset

    • ieee-dataport.org
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saehoon Oh (2025). M-CAN Intrusion Detection Dataset [Dataset]. https://ieee-dataport.org/documents/m-can-intrusion-detection-dataset
    Explore at:
    Dataset updated
    Nov 26, 2025
    Authors
    Saehoon Oh
    Description

    DLC values

  2. UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

    • zenodo.org
    • kaggle.com
    csv
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.5281/zenodo.7258579
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

    https://github.com/Yasir-ali-farrukh/Payload-Byte

    You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

    ```yaml
    @article{Payload,
    author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
    title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
    year = "2022",
    month = "9",
    url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
    doi = "10.36227/techrxiv.20714221.v1"
    }

  3. Z

    IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Areia, José; Bispo, Ivo Afonso; Santos, Leonel; Costa, Rogério Luís (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8116337
    Explore at:
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Politécnico de Leiria
    Authors
    Areia, José; Bispo, Ivo Afonso; Santos, Leonel; Costa, Rogério Luís
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Article Information

    The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

    Please do cite the aforementioned article when using this dataset.

    Abstract

    The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

    ZIP Folder Content

    The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

    To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

    This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

    Datasets' Content

    Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

    Identified Key Features Within Bluetooth Dataset

    Feature Meaning

    btle.advertising_header BLE Advertising Packet Header

    btle.advertising_header.ch_sel BLE Advertising Channel Selection Algorithm

    btle.advertising_header.length BLE Advertising Length

    btle.advertising_header.pdu_type BLE Advertising PDU Type

    btle.advertising_header.randomized_rx BLE Advertising Rx Address

    btle.advertising_header.randomized_tx BLE Advertising Tx Address

    btle.advertising_header.rfu.1 Reserved For Future 1

    btle.advertising_header.rfu.2 Reserved For Future 2

    btle.advertising_header.rfu.3 Reserved For Future 3

    btle.advertising_header.rfu.4 Reserved For Future 4

    btle.control.instant Instant Value Within a BLE Control Packet

    btle.crc.incorrect Incorrect CRC

    btle.extended_advertising Advertiser Data Information

    btle.extended_advertising.did Advertiser Data Identifier

    btle.extended_advertising.sid Advertiser Set Identifier

    btle.length BLE Length

    frame.cap_len Frame Length Stored Into the Capture File

    frame.interface_id Interface ID

    frame.len Frame Length Wire

    nordic_ble.board_id Board ID

    nordic_ble.channel Channel Index

    nordic_ble.crcok Indicates if CRC is Correct

    nordic_ble.flags Flags

    nordic_ble.packet_counter Packet Counter

    nordic_ble.packet_time Packet time (start to end)

    nordic_ble.phy PHY

    nordic_ble.protover Protocol Version

    Identified Key Features Within IP-Based Packets Dataset

    Feature Meaning

    http.content_length Length of content in an HTTP response

    http.request HTTP request being made

    http.response.code Sequential number of an HTTP response

    http.response_number Sequential number of an HTTP response

    http.time Time taken for an HTTP transaction

    tcp.analysis.initial_rtt Initial round-trip time for TCP connection

    tcp.connection.fin TCP connection termination with a FIN flag

    tcp.connection.syn TCP connection initiation with SYN flag

    tcp.connection.synack TCP connection establishment with SYN-ACK flags

    tcp.flags.cwr Congestion Window Reduced flag in TCP

    tcp.flags.ecn Explicit Congestion Notification flag in TCP

    tcp.flags.fin FIN flag in TCP

    tcp.flags.ns Nonce Sum flag in TCP

    tcp.flags.res Reserved flags in TCP

    tcp.flags.syn SYN flag in TCP

    tcp.flags.urg Urgent flag in TCP

    tcp.urgent_pointer Pointer to urgent data in TCP

    ip.frag_offset Fragment offset in IP packets

    eth.dst.ig Ethernet destination is in the internal network group

    eth.src.ig Ethernet source is in the internal network group

    eth.src.lg Ethernet source is in the local network group

    eth.src_not_group Ethernet source is not in any network group

    arp.isannouncement Indicates if an ARP message is an announcement

    Identified Key Features Within IP-Based Flows Dataset

    Feature Meaning

    proto Transport layer protocol of the connection

    service Identification of an application protocol

    orig_bytes Originator payload bytes

    resp_bytes Responder payload bytes

    history Connection state history

    orig_pkts Originator sent packets

    resp_pkts Responder sent packets

    flow_duration Length of the flow in seconds

    fwd_pkts_tot Forward packets total

    bwd_pkts_tot Backward packets total

    fwd_data_pkts_tot Forward data packets total

    bwd_data_pkts_tot Backward data packets total

    fwd_pkts_per_sec Forward packets per second

    bwd_pkts_per_sec Backward packets per second

    flow_pkts_per_sec Flow packets per second

    fwd_header_size Forward header bytes

    bwd_header_size Backward header bytes

    fwd_pkts_payload Forward payload bytes

    bwd_pkts_payload Backward payload bytes

    flow_pkts_payload Flow payload bytes

    fwd_iat Forward inter-arrival time

    bwd_iat Backward inter-arrival time

    flow_iat Flow inter-arrival time

    active Flow active duration

  4. Network traffic datasets with novel extended IP flow called NetTiSA flow

    • data.niaid.nih.gov
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Karel Hynek; Jaroslav Pešek; Tomáš Čejka (2024). Network traffic datasets with novel extended IP flow called NetTiSA flow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8301042
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    CESNEThttp://www.cesnet.cz/
    Czech Technical University in Prague
    Authors
    Josef Koumar; Karel Hynek; Jaroslav Pešek; Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets with novel extended IP flow called NetTiSA flow

    Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:

    Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286

    @article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.

    NetTiSA flow feature vector

    The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.

    Flow features

    The flow features are:

    Packets is the number of packets in the direction from the source to the destination IP address.

    Packets in reverse order is the number of packets in the direction from the destination to the source IP address.

    Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.

    Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.

    Statistical and Time-based features

    The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:

    Mean represents mean of the payload lengths of packets

    Min is the minimal value from payload lengths of all packets in a flow

    Max is the maximum value from payload lengths of all packets in a flow

    Standard deviation is a measure of the variation of payload lengths from the mean payload length

    Root mean square is the measure of the magnitude of payload lengths of packets

    Average dispersion is the average absolute difference between each payload length of the packet and the mean value

    Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution

    Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )

    Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)

    Min from time differences is the minimal value from all time differences, i.e., min space between packets.

    Max from time differences is the maximum value from all time differences, i.e., max space between packets.

    Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })

    Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})

        where \(s_n\) is number of switches.
    

    Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:

    Max minus min is the difference between minimum and maximum payload lengths

    Percent deviation is the dispersion of the average absolute difference to the mean value

    Variance is the spread measure of the data from its mean

    Burstiness is the degree of peakedness in the central part of the distribution

    Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement

    Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.

    Duration is the duration of the flow

    The NetTiSA flow is implemented into IP flow exporter ipfixprobe.

    Description of dataset files

    In the following table is a description of each dataset file:

    File name

    Detection problem

    Citation of the original raw dataset

    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

    doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

    ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.

  5. Z

    NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi...

    • nde-dev.biothings.io
    • zenodo.org
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maura Rivero, Ferran (2025). NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_14832267
    Explore at:
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Maura Rivero, Ferran
    Casasnovas, Miguel
    Bellalta, Boris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains results from streaming VR content over Wi-Fi 6 using our Air Light VR (ALVR) v20.6.0 fork. In particular, it comprises ALVR session logs with statistics in JSON format for each test in Sections VI and VII of our published paper, NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi. Additionally, for each test in Section VI, it includes tshark-processed traffic traces in space-separated CSV format, collected using Wireshark v4.0.3 at both the server and the network emulator’s Ethernet interface to the access point. Moreover, for each test in Section VI, validation result figures are included. For each test in Section VII, temporal evolution and/or boxplot figures for several Quality of Service metrics—such as delivery frame rate, bitrate, video frame round-trip time, and packet loss—are also included.

    Section VI tests use a Constant BitRate (CBR) of 100 Mbps with several emulated network effects, including limited bandwidth (100 Mbps, 95 Mbps, 90 Mbps), packet loss (0.5%, 1%, 2%), duplicated packets (0.5%, 1%, 2%), and packet jitter (0–6 ms, 0–10 ms, 0–20 ms).

    The dataset structure for Section VII includes a folder for each subsection (VII A: 7.1, VII B: 7.2, VII C: 7.3, VII D: 7.4). Section 7.1 folder includes tests on emulated limited network bandwidth (100 Mbps, 95 Mbps, 90 Mbps) using either CBR, ALVR's native Adaptive BitRate (ABR) algorithm, or our VR-tailored ABR, NeSt-VR (Network-aware Step-wise ABR algorithm for VR streaming). Section 7.2 folder contains a single-user (user A) mobility test using either CBR or NeSt-VR. Section 7.3 folder includes a multi-user test with two users (user A and user B) using either CBR or NeSt-VR, with results for both users streaming in isolation or concurrently. Section 7.4 folder contains tests with Overlapping Basic Service Set (OBSS) activity, where two access points operate on the same frequency channel with overlapping coverage areas, using either a fully overlapping channel bandwidth of 40 MHz or 80 MHz.

    ALVR session logs contain several built-in ALVR statistics (event_type:{"id":"GraphStatistics", which includes total pipeline latency and its components) and additional statistics incorporated in our ALVR fork (event_type:{"id":"GraphNetworkStatistics", which records metrics such as frame span, frame interarrival, video frame round-trip time, packet loss, instantaneous video network throughput, peak network throughput, video frame jitter, video packet jitter, and filtered one-way delay; event_type:{"id":"HeuristicStats", which includes the decision-making statistics involved in each NeSt-VR bitrate adjustment interval). Please refer to our published paper or our ALVR fork for more details.

    Tshark-processed traffic traces contain several packet-level details: the relative timestamp (frame.time_relative), source and destination IP addresses (ip.src, ip.dst), total packet length including headers and payload (frame.len), and the raw packet payload (data.data). The first 22 bytes of each packet’s payload contain ALVR’s application-specific prefix, which includes the associated frame’s payload size in bytes (4 bytes), a stream identifier (2 bytes), the frame index (4 bytes), the number of packets composing the frame (4 bytes), the packet index within the frame (4 bytes), and the packet’s relative departure time (4 bytes).

  6. IoMT Traffic Data: Benchmarking for IoMT IDS

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Mangalore (2025). IoMT Traffic Data: Benchmarking for IoMT IDS [Dataset]. https://www.kaggle.com/datasets/abhinavmangalore/iomt-traffic-data
    Explore at:
    zip(197674393 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    Abhinav Mangalore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The IoMT-TrafficData dataset has been developed to benchmark Machine Learning models for Intrusion Detection Systems (IDS) in the Internet of Medical Things (IoMT). The dataset simulates real-world attacks and normal network behavior in IoT and IoMT environments to enhance medical device security and patient data protection.

    The dataset and its benchmarking methodology are detailed in the research article.

    If you use this dataset, please credit the original authors:

    Areia, J., Bispo, I. A., Santos, L., & Costa, R. L. (2023). IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things.
    IEEE Access. DOI: 10.1109/ACCESS.2024.3437214

    Zenodo DOI: 10.5281/zenodo.8116338
    Original Source: Zenodo (Creative Commons Attribution 4.0 International License)

    Dataset Overview

    BLE Dataset Features

    FeatureMeaning
    btle.advertising_headerBLE Advertising Packet Header
    btle.advertising_header.ch_selChannel Selection Algorithm
    btle.advertising_header.lengthAdvertising Length
    btle.advertising_header.pdu_typeAdvertising PDU Type
    nordic_ble.crcokIndicates if CRC is Correct
    nordic_ble.packet_timePacket time (start to end)
    nordic_ble.phyPHY
    ...(see Zenodo for full feature list)

    IP-Based Packet Dataset Features

    FeatureMeaning
    http.content_lengthLength of HTTP response content
    tcp.analysis.initial_rttInitial round-trip time for TCP
    tcp.flags.synSYN flag in TCP
    arp.isannouncementIndicates ARP announcement
    ...(see Zenodo for full list)

    IP-Based Flows Dataset Features

    FeatureMeaning
    protoTransport layer protocol
    serviceApplication protocol
    orig_bytesOriginator payload bytes
    resp_bytesResponder payload bytes
    flow_durationDuration of the flow
    fwd_pkts_per_secForward packets per second
    flow_iatFlow inter-arrival time
    ...(see Zenodo for full list)
  7. CESNET-USTS23: a benchmark dataset of Unevenly spaced time series from...

    • data.niaid.nih.gov
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koumar, Josef; Čejka, Tomáš (2024). CESNET-USTS23: a benchmark dataset of Unevenly spaced time series from network traffic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7923744
    Explore at:
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    CESNEThttp://www.cesnet.cz/
    Czech Technical University in Prague
    Authors
    Koumar, Josef; Čejka, Tomáš
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was created to evaluate characteristics of Unevenly sampled time series from network traffic (USTS) for the paper Unevenly Spaced Time Series from Network Traffic.

    The file named time_series.tar.gz contains a folder with time series CSV files as raw data of the experiment. In the folder are the following files:

    fts.csv -- contains 2.6 million Flow time series (FTS) created from 259 million IP flows,

    pts.csv -- contains 19 million Packet time series (PTS) created from 110 million network packets,

    sfts.csv -- contains 15 million Single flow time series (SFTS) created from 160 million network packets.

    Traffic was captured on the national CESNET2 network from February 2023 to April 2023. All IP addresses in the dataset were anonymized.

    The fts.csv has the following format:

    ID_DEPENDENCY -- Identification of a network dependency observed as a Flow time series. (real IP address was anonimized by replacing with a random IP address)

    N_FLOWS -- Number of flows in time series, i.e., number of data points.

    N_PACKETS -- Number of packets in time series, i.e., the sum of metric PACKETS.

    N_BYTES -- Number of bytes in time series, i.e., the sum of metric PACKETS.

    PACKETS -- The array containing the time series metric number of packets in the IP flow.

    BYTES -- The array containing the time series metric number of bytes in the IP flow.

    START_TIMES -- The array containing the time series time axis of the flows starts.

    END_TIMES -- The array containing the time series time axis of the flows ends.

    The pts.csv has the following format:

    ID_DEPENDENCY -- Identification of a network dependency observed as a Packet time series. (real IP address was anonymized by replacing with a random IP address)

    BYTES -- The array containing the time series metric payload length of the network packet.

    TIMES -- The array containing the time series time axis of the transmission of network packets.

    The sfts.csv has the following format:

    SRC_IP -- Source IP address. (real IP address was anonimized by replacing with a random IP address)

    SRC_PORT -- Source port.

    DST_IP -- Destination IP address (real IP address was anonymized by replacing with a random IP address)

    DST_PORT -- Destination port.

    bytes -- The array containing the time series metric payload length of the network packet.

    time -- The array containing the time series time axis of the transmission of network packets.

    The file named characteristics.tar.gz contains a folder with characteristics gained by experiments from time series files. In the folder are the following files:

    fts.characteristics.csv -- Characteristics about Flow time series from the fts.csv.

    pts.characteristics.csv -- Characteristics about Packet time series from the pts.csv.

    sfts.characteristics.csv -- Characteristics about Single flow time series from the sfts.csv.

    The fts.characteristics.csv has the following format:

    LENGTH -- Number of data points in the source time series.

    DURATION -- Duration of the source time series.

    H_BYTES -- Hurst exponent of the source time series metric BYTES.

    STATIONARITY_PACKETS -- Stationarity of the source time series metric PACKETS.

    STATIONARITY_BYTES -- Stationarity of the source time series metric BYTES.

    OVERALL_STATIONARITY -- Overal stationarity created by merging STATIONARITY_PACKETS and STATIONARITY_BYTES.

    The pts.characteristics.csv and sfts.characteristics.csv have the following format:

    LENGTH -- Number of data points in the source time series.

    DURATION -- Duration of the source time series.

    H -- Hurst exponent of the source time series.

    STATIONARITY -- Stationarity of the source time series.

    We provide the samples of all zipped files for a quick lookup: fts.characteristics.sample.csv, fts.sample.csv, pts.characteristics.sample.csv, pts.sample.csv, sfts.characteristics.sample.csv, sfts.sample.csv

  8. Data from: Implementation of a Multi-Channel DASH7 IoT Communication System...

    • zenodo.org
    zip
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dennis Joosens; Dennis Joosens; Noori BniLam; Noori BniLam; Maarten Weyn; Maarten Weyn; Rafael Berkvens; Rafael Berkvens (2024). Implementation of a Multi-Channel DASH7 IoT Communication System for Packet Investigation and Validation [Dataset]. http://doi.org/10.5281/zenodo.13734533
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dennis Joosens; Dennis Joosens; Noori BniLam; Noori BniLam; Maarten Weyn; Maarten Weyn; Rafael Berkvens; Rafael Berkvens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains three cabled DASH7 data sets. All data sets are formatted as sigmf-data and sigmf-meta pairs, which can be investigated using IQEngine, GNU Radio, or MATLAB. Below you can find a more extended description of the data sets.

    CH0.zip, CH93.zip, CH186.zip:

    • Cabled data sets of 3 channels
    • 10 recordings per channel
    • 1 DASH7 packet per file pair (SigMF)
    • Fc: 866.5 MHz
    • Sample rate: 7.68 MHz
    • Data type: ci16_le
    • Length: 1 second
    • Channel class: Lo-Rate
    • Sync word: 0x0B67
    • 3 Lo-Rate channel recordings
      • channel 0 (Fc: 863.0125 MHz),
      • channel 93 (Fc: 865.3375 MHz),
      • channel 186 (Fc: 867.6625 MHz)
    • Payload: 3 bytes [counter_byte 0xAB 0xCD]
      • counter byte is always [0x00]

    logs.zip:

    • Contains all the DASH7 gateway logs per measured channel.
  9. Drone-Based Malware Detection (DBMD)

    • kaggle.com
    zip
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DatasetEngineer (2024). Drone-Based Malware Detection (DBMD) [Dataset]. https://www.kaggle.com/datasets/nasirayub2/drone-based-malware-detection-dbmd/suggestions?status=pending
    Explore at:
    zip(67433750 bytes)Available download formats
    Dataset updated
    Jul 27, 2024
    Authors
    DatasetEngineer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description Welcome to the Drone-Based Malware Detection dataset! This dataset is designed to aid researchers and practitioners in exploring innovative cybersecurity solutions using drone-collected data. The dataset contains detailed information on network traffic, drone sensor readings, malware detection indicators, and environmental conditions. It offers a unique perspective by integrating data from drones with traditional network security metrics to enhance malware detection capabilities.

    Dataset Overview The dataset comprises four main categories:

    Network Traffic Data: Captures network traffic attributes including IP addresses, ports, protocols, packet sizes, and various derived metrics. Drone Sensor Data: Includes GPS coordinates, altitude, speed, heading, battery level, and other sensor readings from drones. Malware Detection Data: Contains indicators and scores relevant to detecting malware, such as anomaly scores, suspicious IP counts, reputation scores, and attack types. Environmental Data: Provides context through environmental conditions like location type, noise level, weather conditions, and more. Files and Features The dataset is divided into four separate CSV files:

    network_traffic_data.csv

    timestamp: Date and time of the traffic event. source_ip: Source IP address. destination_ip: Destination IP address. source_port: Source port number. destination_port: Destination port number. protocol: Network protocol (TCP, UDP, ICMP). packet_length: Length of the network packet. payload_data: Content of the packet payload. flag: Network flag (SYN, ACK, FIN, RST). traffic_volume: Volume of traffic in bytes. flow_duration: Duration of the network flow. flow_bytes_per_s: Bytes per second for the flow. flow_packets_per_s: Packets per second for the flow. packet_count: Number of packets in the flow. average_packet_size: Average size of packets. min_packet_size: Minimum packet size. max_packet_size: Maximum packet size. packet_size_variance: Variance in packet sizes. header_length: Length of the packet header. payload_length: Length of the packet payload. ip_ttl: Time to live for the IP packet. tcp_window_size: TCP window size. icmp_type: ICMP type (echo_request, echo_reply, destination_unreachable). dns_query_count: Number of DNS queries. dns_response_count: Number of DNS responses. http_method: HTTP method (GET, POST, PUT, DELETE). http_status_code: HTTP status code (200, 404, 500, 301). content_type: Content type (text/html, application/json, image/png). ssl_tls_version: SSL/TLS version. ssl_tls_cipher_suite: SSL/TLS cipher suite. drone_data.csv

    latitude: Latitude of the drone. longitude: Longitude of the drone. altitude: Altitude of the drone. speed: Speed of the drone. heading: Heading of the drone. battery_level: Battery level of the drone. drone_id: Unique identifier for the drone. flight_time: Total flight time. signal_strength: Strength of the drone's signal. temperature: Temperature at the drone's location. humidity: Humidity at the drone's location. pressure: Atmospheric pressure at the drone's location. wind_speed: Wind speed at the drone's location. wind_direction: Wind direction at the drone's location. gps_accuracy: Accuracy of the GPS signal. malware_detection_data.csv

    anomaly_score: Score indicating the level of anomaly detected. suspicious_ip_count: Number of suspicious IP addresses detected. malicious_payload_indicator: Indicator for malicious payload (0 or 1). reputation_score: Reputation score for the network entity. behavioral_score: Behavioral score indicating potential malicious activity. attack_type: Type of attack (DDoS, phishing, malware). signature_match: Indicator for signature match (0 or 1). sandbox_result: Result from sandbox analysis (clean, infected). heuristic_score: Heuristic score for potential threats. traffic_pattern: Pattern of the traffic (burst, steady). environmental_data.csv

    location_type: Type of location (urban, rural). nearby_devices: Number of nearby devices. signal_interference: Level of signal interference. noise_level: Noise level in the environment. time_of_day: Time of day (morning, afternoon, evening, night). day_of_week: Day of the week. weather_conditions: Weather conditions (sunny, rainy, cloudy, stormy). Usage and Applications This dataset can be used for:

    Cybersecurity Research: Developing and testing algorithms for malware detection using drone data. Machine Learning: Training models to identify malicious activity based on network traffic and drone sensor readings. Data Analysis: Exploring the relationships between environmental conditions, drone sensor data, and network traffic anomalies. Educational Purposes: Teaching data science, machine learning, and cybersecurity concepts using a comprehensive and multi-faceted dataset.

    Acknowledgements This dataset is based on real-world data collected from drone sensors and network traffic monitoring s...

  10. MSSF-MalNet-2024

    • zenodo.org
    Updated Sep 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Saadoon Saadoon; Mohammed Saadoon Saadoon; Suhad Faisal Behadili; Suhad Faisal Behadili (2025). MSSF-MalNet-2024 [Dataset]. http://doi.org/10.5281/zenodo.15453468
    Explore at:
    Dataset updated
    Sep 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammed Saadoon Saadoon; Mohammed Saadoon Saadoon; Suhad Faisal Behadili; Suhad Faisal Behadili
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2024
    Description

    The dataset was collected using honeypots deployed with the Honeytrap agent. The honeypots captured both benign and malicious network traffic, providing valuable insights into different attack behaviors. The dataset consists of 9 features that represent various aspects of network traffic, including both structural and payload data. These features are as follows:

    1. Protocol: The communication protocol used in the network traffic, such as HTTP, FTP, or SSH.
    2. remote_ip: The IP address of the remote (attacker) system that initiated the connection.
    3. remote_port: The port number on the remote system that the connection was made to.
    4. local_ip: The IP address of the local (honeypot) system that received the connection.
    5. local_port: The port number on the local system that accepted the connection.
    6. md5_hash: The MD5 hash of the data payload (if applicable), used for identifying and comparing files or data.
    7. sha512_hash: The SHA-512 hash of the data payload (if applicable), providing a more secure representation for identifying files or data.
    8. Length: The length of the data payload (in bytes), representing the size of the network traffic.
    9. data_hex: The hexadecimal representation of the raw data payload, which can include commands or other information related to the communication.

    This dataset was used to train machine learning models to classify the network traffic as either benign or malicious. The features provide valuable information to differentiate between normal communication and suspicious activities, such as potential cyber-attacks.

  11. Cyberattacks Detection

    • kaggle.com
    zip
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lastman0800 (2024). Cyberattacks Detection [Dataset]. https://www.kaggle.com/datasets/lastman0800/cyberattacks-detection
    Explore at:
    zip(4076139 bytes)Available download formats
    Dataset updated
    Jul 28, 2024
    Authors
    lastman0800
    Description

    This dataset meticulously captured for the analysis and detection of cyberattacks using machine learning techniques. It comprises 100,000 rows, each representing a unique cyberattack event. The dataset includes a diverse range of attack types, protocols, and affected systems, making it an invaluable resource for developing and testing detection models.

    Columns and Attributes

    • Attack ID: A unique identifier assigned to each attack instance, ranging from 1 to 100,000. This column ensures each row is distinct and can be referenced individually.
    • Timestamp: The exact date and time when the attack was detected, formatted as YYYY-MM-DD HH:MM:SS. This column helps in analyzing the temporal patterns of attacks and identifying trends over time.
    • Source IP: The IP address of the machine from which the attack originated. Each IP address in the dataset is unique, simulating a diverse set of attackers and adding realism to the dataset.
    • Destination IP: The IP address of the target machine under attack. Similar to the source IPs, destination IPs are also unique, representing a wide range of potential targets and ensuring a comprehensive dataset.
    • Source Country: The country associated with the source IP address, randomly assigned from a set of major countries (e.g., USA, China, Russia). This attribute is crucial for geographic analysis of attack origins and understanding global threat landscapes.
    • Destination Country: The country associated with the destination IP address, providing context about the target locations and enabling analysis of international attack patterns.
    • Protocol: The network protocol used during the attack, such as TCP, UDP, or ICMP. This column is essential for understanding the type of communication involved in the attack and for protocol-specific analysis.
    • Source Port: The port number on the source machine used for the attack. This can be useful in identifying common ports used by attackers and understanding the methods of attack.
    • Destination Port: The port number on the destination machine targeted by the attack. This attribute, combined with the port type, helps in understanding the specific services under attack and identifying vulnerable entry points.
    • Port Type: A derived column that categorizes the destination port into common service types (e.g., HTTP, HTTPS, FTP). This simplifies the analysis of which services are frequently targeted and aids in focusing defensive measures.
    • Attack Type: A descriptive label for the type of cyberattack, including a variety of attack methods such as Distributed Denial of Service (DDoS), SQL Injection, and Phishing. The dataset includes a broad spectrum of attack types to cover different threat scenarios and provide comprehensive analysis opportunities.
    • Payload Size (bytes): The size of the data payload involved in the attack, measured in bytes. This helps in understanding the scale and potential impact of each attack, with larger payloads often indicating more significant or complex attacks.
    • Detection Label: Indicates whether the attack was detected by the system (Detected) or not (Not Detected). This binary label is crucial for evaluating the effectiveness of detection models and understanding detection rates.
    • Confidence Score: A probability score ranging from 0 to 1, representing the confidence level of the detection model for each attack instance. For detected attacks, the score is between 0.50 and 1.00, while for undetected attacks, it is between 0.00 and 0.49. This score is essential for assessing the reliability of the detection model.
    • ML Model: The type of machine learning model used to identify the attack, randomly chosen from popular models such as Random Forest, Support Vector Machine, and Neural Network. This provides insight into the model's performance and preferences, enabling comparative analysis of different models.
    • Affected System: The type of system targeted by the attack, such as a Database Server, Web Server, or IoT Device. This helps in understanding the potential impact on different infrastructure components and focusing security efforts on the most critical systems.

    Realism and Practicality

    The dataset introduces a realistic element by including null values in various columns. This simulates real-world data imperfections and prepares the dataset for more robust handling and preprocessing techniques during analysis. The inclusion of unique IP addresses for both source and destination adds to the authenticity, reflecting the diverse nature of cyberattacks in the real world.

    Overall, this dataset is a valuable resource for researchers, analysts, and developers working on cybersecurity solutions. It provides a rich, varied, and realistic foundation for developing and testing machine learning models aimed at detecting and mitigating cybe...

  12. Z

    Data from: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset...

    • data.niaid.nih.gov
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hynek, Karel; Luxemburk, Jan; Pešek, Jaroslav; Čejka, Tomáš; Pavel, Šiška (2025). CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10608606
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Czech Education and Scientific Network
    Authors
    Hynek, Karel; Luxemburk, Jan; Pešek, Jaroslav; Čejka, Tomáš; Pavel, Šiška
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo.

    The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.

    Data description The dataset consists of network flows describing encrypted TLS communications. Flows are extended with packet sequences, histograms, and fields extracted from the TLS ClientHello message, which is transmitted in the first packet of the TLS connection handshake. The most important extracted handshake field is the SNI domain, which is used for ground-truth labeling.

    Packet Sequences Sequences of packet sizes, directions, and inter-packet times are standard data input for traffic analysis. For packet sizes, we consider the payload size after transport headers (TCP headers for the TLS case). We omit packets with no TCP payload, for example ACKs, because zero-payload packets are related to the transport layer internals rather than services’ behavior. Packet directions are encoded as ±1, where +1 means a packet sent from client to server, and -1 is a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate a response. Packet sequences have a maximum length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction; in other words, each client request and server response pair counts as one roundtrip.

    Flow statistics Each data record also includes standard flow statistics, representing aggregated information about the entire bidirectional connection. The fields are the number of transmitted bytes and packets in both directions, the duration of the flow, and packet histograms. The packet histograms include binned counts (not limited to the first 30 packets) of packet sizes and inter-packet times in both directions. There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes (More information in the PHISTS plugin documentation). Moreover, each flow has its end reason---either it ended with the TCP connection termination (FIN packets), was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons.

    Dataset structure The dataset is organized per weeks and individual days. The flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the total number of saved flows and the number of flows per service. There are also files aggregating flow counts for each week (stats-week.json) and for the entire dataset (stats-dataset.json). The following list describes flow data fields in CSV files:

    ID: Unique identifier

    SRC_IP: Source IP address

    DST_IP: Destination IP address

    DST_ASN: Destination Autonomous System number

    SRC_PORT: Source port

    DST_PORT: Destination port

    PROTOCOL: Transport protocol

    FLAG_CWR: Presence of the CWR flag

    FLAG_CWR_REV: Presence of the CWR flag in the reverse direction

    FLAG_ECE: Presence of the ECE flag

    FLAG_ECE_REV: Presence of the ECE flag in the reverse direction

    FLAG_URG: Presence of the URG flag

    FLAG_URG_REV: Presence of the URG flag in the reverse direction

    FLAG_ACK: Presence of the ACK flag

    FLAG_ACK_REV: Presence of the ACK flag in the reverse direction

    FLAG_PSH: Presence of the PSH flag

    FLAG_PSH_REV: Presence of the PSH flag in the reverse direction

    FLAG_RST: Presence of the RST flag

    FLAG_RST_REV: Presence of the RST flag in the reverse direction

    FLAG_SYN: Presence of the SYN flag

    FLAG_SYN_REV: Presence of the SYN flag in the reverse direction

    FLAG_FIN: Presence of the FIN flag

    FLAG_FIN_REV: Presence of the FIN flag in the reverse direction

    TLS_SNI: Server Name Indication domain

    TLS_JA3: JA3 fingerprint of TLS client

    TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff

    TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff

    DURATION: Duration of the flow in seconds

    BYTES: Number of transmitted bytes from client to server

    BYTES_REV: Number of transmitted bytes from server to client

    PACKETS: Number of packets transmitted from client to server

    PACKETS_REV: Number of packets transmitted from server to client

    PPI: Packet sequence in the format: [[inter-packet times], [packet directions], [packet sizes], [push flags]]

    PPI_LEN: Number of packets in the PPI sequence

    PPI_DURATION: Duration of the PPI sequence in seconds

    PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence

    PHIST_SRC_SIZES: Histogram of packet sizes from client to server

    PHIST_DST_SIZES: Histogram of packet sizes from server to client

    PHIST_SRC_IPT: Histogram of inter-packet times from client to server

    PHIST_DST_IPT: Histogram of inter-packet times from server to client

    APP: Web service label

    CATEGORY: Service category

    FLOW_ENDREASON_IDLE: Flow was terminated because it was idle

    FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout

    FLOW_ENDREASON_END: Flow ended with the TCP connection termination

    FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

  13. Z

    Data from: Experimenting with Adaptive Bitrate Algorithms for Virtual...

    • data.niaid.nih.gov
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maura, Ferran; Casasnovas, Miguel; Bellalta, Boris (2024). Experimenting with Adaptive Bitrate Algorithms for Virtual Reality Streaming over Wi-Fi [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12723989
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Universitat Pompeu Fabra
    Pompeu Fabra University
    Authors
    Maura, Ferran; Casasnovas, Miguel; Bellalta, Boris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of resulting files from capturing VR traffic in Wi-Fi 6 of a fork of the Air Light Virtual Reality (ALVR) software, used to stream games from a PC to a VR HMD in real time. The dataset includes:

    Parsed Wireshark captures in csv format, both captured from server and network emulator, and corresponding ALVR session log are found for each experiment. In each folder, all files of netem, server or ALVR are found (with names corresponding to the emulated network effect, which is applied via the netem computer). We are using Constant BitRate (CBR) for each test, at 100 Mbps. The plots are added in the corresponding folder for each effect, and a metric comparison between WS and ALVR.

    ALVR session logs for a comparison on the logged metrics under tests of Mobility, using different strategies for bitrate adaptation: CBR, ABR and our own contribution.

    ALVR session logs for a comparison on the logged metrics under tests of emulated capacity drops, using different strategies for bitrate adaptation: CBR, ABR and our own contribution.

    The Wireshark captures have been parsed from UDP packets in a pcapng into a csv file (via tshark) containing the principal fields of each packet separated by a space, since the pcap captures were over 1 GB each, we keep only a subset of the first bytes of the payload and the main fields, and discard the rest. There are additional CSV files for TCP UL packets, which we parsed separately from the same captures for us to validate the measured RTT of ALVR.

    The ALVR session logs contain raw json strings in .txt format, logged from the server using our fork of ALVR. We're using some additional events from the ones ALVR originally used, in order to log our metrics at arbitrary points in the code.

    The first 22 bytes of the payload in each packet are used to parse into the StreamSocket fields that ALVR uses, and record timestamps to validate the metrics of ALVR manually; which can be used to reproduce our results. Namely, each row of the csv (frame.time_relative, ip.src, ip.dst, frame.len, data.data) contains the timestamp of the packet, its IP source, destination, length and first 22 bytes of the payload as a hexadecimal string.

  14. h

    brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electric Sheep, brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa [Dataset]. https://huggingface.co/datasets/electricsheepafrica/brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa
    Explore at:
    Dataset authored and provided by
    Electric Sheep
    License

    https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/

    Description

    Dataset Card: Africa Brain Tumor Scans with Synthetic EHR (Bundled Parquet)

    This dataset bundles single-slice brain MRI scans and richly structured, synthetic EHR data into a single Parquet file suitable for multimodal ML research. Each row contains an image struct (bytes + path), a source label column, and an EHR payload with both a full JSON record and convenient summary columns. The synthetic EHRs are Africa-focused: they encode country, urban/rural, facility level, insurance… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa.

  15. INDDOS24 Dataset

    • kaggle.com
    zip
    Updated Dec 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DatasetEngineer (2024). INDDOS24 Dataset [Dataset]. https://www.kaggle.com/datasets/datasetengineer/inddos24-dataset
    Explore at:
    zip(4040730 bytes)Available download formats
    Dataset updated
    Dec 7, 2024
    Authors
    DatasetEngineer
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The INDDOS24 Dataset is a comprehensive and synthetic dataset designed for analyzing Distributed Denial of Service (DDoS) attacks in Internet of Things (IoT) networks. The dataset spans a period from January 1, 2019, to July 1, 2024, capturing hourly network traffic from various IoT devices, including cameras, sensors, and smart appliances. This dataset simulates realistic traffic dynamics, including both normal operations and attack scenarios, providing researchers and practitioners with a rich resource to develop and evaluate machine learning and deep learning-based DDoS detection models.

    Key Features of INDDOS24 Dataset Timestamp: The date and time of each network event, recorded hourly, covering more than five years of traffic data.

    Source IP: The IP address from which the network traffic originates, representing the source device in the network.

    Destination IP: The IP address to which the network traffic is directed, representing the target device.

    Source Port: The port number used by the source device for communication.

    Destination Port: The port number used by the target device for receiving traffic.

    Protocol: The communication protocol used, including TCP, UDP, and ICMP.

    Packet Size: The size of each network packet in bytes, ranging from small control packets to large data transmissions.

    Payload Length: The length of the payload in the network packets, representing actual data being transmitted.

    Flow Duration: The duration of the network flow in seconds, capturing the session length between devices.

    Bytes in Flow: The total number of bytes transmitted during the flow.

    Packets in Flow: The total number of packets transmitted during the flow.

    Average Packet Size: The average size of packets within a flow, useful for distinguishing attack patterns from normal traffic.

    Inter-Arrival Time: The time interval between successive packets in a flow, capturing traffic burstiness.

    Rate of Packets: The rate of packets per second, highlighting high-rate traffic scenarios typical of DDoS attacks.

    Unique Source Count: The number of unique source IP addresses observed in the flow.

    Unique Destination Count: The number of unique destination IP addresses in the flow.

    Anomaly Score: A computed score indicating the likelihood of anomalous or malicious activity within the traffic.

    Device Type: The type of IoT device generating the traffic, such as Camera, Sensor, or Smart Appliance.

    Operating System: The operating system of the IoT device, including Linux, Windows, or RTOS (Real-Time Operating System).

    Firmware Version: The firmware version running on the device, reflecting device configuration.

    Attack Type: The type of attack, if detected, including "SYN Flood," "UDP Flood," "Application Layer Attack," or "No Attack."

    Attack Duration: The duration of the detected attack in seconds, where applicable.

    Target Device: The specific device targeted by the attack, if applicable, or "None" for normal traffic.

    Labels: Multi-label annotations for each record, indicating attack types or normal traffic. Labels are unbalanced to simulate real-world distributions, with "Normal" traffic dominating.

    Key Highlights Multi-Label Annotations: Each record can have multiple labels to capture complex scenarios where different attack types may occur simultaneously.

    Realistic Traffic Simulation: The dataset reflects both the prevalence of normal traffic and the intermittent nature of DDoS attacks in IoT environments.

    Diverse Features: With over 20 features, the dataset supports detailed traffic analysis and the development of robust anomaly detection systems.

    Unbalanced Distribution: Mimics real-world IoT networks where normal traffic significantly outweighs malicious activities.

    The INDDOS24 dataset serves as a valuable resource for advancing IoT network security, particularly in detecting and mitigating DDoS attacks. It is suitable for researchers, data scientists, and engineers developing machine learning and deep learning-based models for intrusion detection and network anomaly analysis.

  16. Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
    Explore at:
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    CESNEThttp://www.cesnet.cz/
    FIT Czech Technical University in Prague
    Authors
    Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

    W-2022-44

    Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

    Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

    Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

    Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

    Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

    Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

    ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

    Link to other CESNET datasets

    https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

    @article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saehoon Oh (2025). M-CAN Intrusion Detection Dataset [Dataset]. https://ieee-dataport.org/documents/m-can-intrusion-detection-dataset

M-CAN Intrusion Detection Dataset

Explore at:
Dataset updated
Nov 26, 2025
Authors
Saehoon Oh
Description

DLC values

Search
Clear search
Close search
Google apps
Main menu