16 datasets found

i
M-CAN Intrusion Detection Dataset
ieee-dataport.org
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saehoon Oh (2025). M-CAN Intrusion Detection Dataset [Dataset]. https://ieee-dataport.org/documents/m-can-intrusion-detection-dataset
Explore at:
Dataset updated
Nov 26, 2025
Authors
Saehoon Oh
Description
DLC values
UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data
zenodo.org
kaggle.com
csv
Updated Oct 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.5281/zenodo.7258579
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7258579
Dataset updated
Oct 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

https://github.com/Yasir-ali-farrukh/Payload-Byte

You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
Z
IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Areia, José; Bispo, Ivo Afonso; Santos, Leonel; Costa, Rogério Luís (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8116337
Explore at:
Dataset updated
Aug 30, 2024
Dataset provided by
Politécnico de Leiria
Authors
Areia, José; Bispo, Ivo Afonso; Santos, Leonel; Costa, Rogério Luís
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Article Information

The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

Please do cite the aforementioned article when using this dataset.

Abstract

The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

ZIP Folder Content

The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

Datasets' Content

Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

Identified Key Features Within Bluetooth Dataset

Feature Meaning

btle.advertising_header BLE Advertising Packet Header

btle.advertising_header.ch_sel BLE Advertising Channel Selection Algorithm

btle.advertising_header.length BLE Advertising Length

btle.advertising_header.pdu_type BLE Advertising PDU Type

btle.advertising_header.randomized_rx BLE Advertising Rx Address

btle.advertising_header.randomized_tx BLE Advertising Tx Address

btle.advertising_header.rfu.1 Reserved For Future 1

btle.advertising_header.rfu.2 Reserved For Future 2

btle.advertising_header.rfu.3 Reserved For Future 3

btle.advertising_header.rfu.4 Reserved For Future 4

btle.control.instant Instant Value Within a BLE Control Packet

btle.crc.incorrect Incorrect CRC

btle.extended_advertising Advertiser Data Information

btle.extended_advertising.did Advertiser Data Identifier

btle.extended_advertising.sid Advertiser Set Identifier

btle.length BLE Length

frame.cap_len Frame Length Stored Into the Capture File

frame.interface_id Interface ID

frame.len Frame Length Wire

nordic_ble.board_id Board ID

nordic_ble.channel Channel Index

nordic_ble.crcok Indicates if CRC is Correct

nordic_ble.flags Flags

nordic_ble.packet_counter Packet Counter

nordic_ble.packet_time Packet time (start to end)

nordic_ble.phy PHY

nordic_ble.protover Protocol Version

Identified Key Features Within IP-Based Packets Dataset

Feature Meaning

http.content_length Length of content in an HTTP response

http.request HTTP request being made

http.response.code Sequential number of an HTTP response

http.response_number Sequential number of an HTTP response

http.time Time taken for an HTTP transaction

tcp.analysis.initial_rtt Initial round-trip time for TCP connection

tcp.connection.fin TCP connection termination with a FIN flag

tcp.connection.syn TCP connection initiation with SYN flag

tcp.connection.synack TCP connection establishment with SYN-ACK flags

tcp.flags.cwr Congestion Window Reduced flag in TCP

tcp.flags.ecn Explicit Congestion Notification flag in TCP

tcp.flags.fin FIN flag in TCP

tcp.flags.ns Nonce Sum flag in TCP

tcp.flags.res Reserved flags in TCP

tcp.flags.syn SYN flag in TCP

tcp.flags.urg Urgent flag in TCP

tcp.urgent_pointer Pointer to urgent data in TCP

ip.frag_offset Fragment offset in IP packets

eth.dst.ig Ethernet destination is in the internal network group

eth.src.ig Ethernet source is in the internal network group

eth.src.lg Ethernet source is in the local network group

eth.src_not_group Ethernet source is not in any network group

arp.isannouncement Indicates if an ARP message is an announcement

Identified Key Features Within IP-Based Flows Dataset

Feature Meaning

proto Transport layer protocol of the connection

service Identification of an application protocol

orig_bytes Originator payload bytes

resp_bytes Responder payload bytes

history Connection state history

orig_pkts Originator sent packets

resp_pkts Responder sent packets

flow_duration Length of the flow in seconds

fwd_pkts_tot Forward packets total

bwd_pkts_tot Backward packets total

fwd_data_pkts_tot Forward data packets total

bwd_data_pkts_tot Backward data packets total

fwd_pkts_per_sec Forward packets per second

bwd_pkts_per_sec Backward packets per second

flow_pkts_per_sec Flow packets per second

fwd_header_size Forward header bytes

bwd_header_size Backward header bytes

fwd_pkts_payload Forward payload bytes

bwd_pkts_payload Backward payload bytes

flow_pkts_payload Flow payload bytes

fwd_iat Forward inter-arrival time

bwd_iat Backward inter-arrival time

flow_iat Flow inter-arrival time

active Flow active duration
Network traffic datasets with novel extended IP flow called NetTiSA flow
data.niaid.nih.gov
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josef Koumar; Karel Hynek; Jaroslav Pešek; Tomáš Čejka (2024). Network traffic datasets with novel extended IP flow called NetTiSA flow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8301042
Explore at:
Dataset updated
Apr 18, 2024
Dataset provided by
CESNEThttp://www.cesnet.cz/
Czech Technical University in Prague
Authors
Josef Koumar; Karel Hynek; Jaroslav Pešek; Tomáš Čejka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Network traffic datasets with novel extended IP flow called NetTiSA flow

Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:

Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286

@article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }

This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.

NetTiSA flow feature vector

The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.

Flow features

The flow features are:

Packets is the number of packets in the direction from the source to the destination IP address.

Packets in reverse order is the number of packets in the direction from the destination to the source IP address.

Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.

Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.

Statistical and Time-based features

The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:

Mean represents mean of the payload lengths of packets

Min is the minimal value from payload lengths of all packets in a flow

Max is the maximum value from payload lengths of all packets in a flow

Standard deviation is a measure of the variation of payload lengths from the mean payload length

Root mean square is the measure of the magnitude of payload lengths of packets

Average dispersion is the average absolute difference between each payload length of the packet and the mean value

Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution

Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )

Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)

Min from time differences is the minimal value from all time differences, i.e., min space between packets.

Max from time differences is the maximum value from all time differences, i.e., max space between packets.

Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })

Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})

where \(s_n\) is number of switches.

Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:

Max minus min is the difference between minimum and maximum payload lengths

Percent deviation is the dispersion of the average absolute difference to the mean value

Variance is the spread measure of the data from its mean

Burstiness is the degree of peakedness in the central part of the distribution

Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement

Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.

Duration is the duration of the flow

The NetTiSA flow is implemented into IP flow exporter ipfixprobe.

Description of dataset files

In the following table is a description of each dataset file:

File name

Detection problem

Citation of the original raw dataset

botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.
Z
NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi...
nde-dev.biothings.io
zenodo.org
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maura Rivero, Ferran (2025). NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_14832267
Explore at:
Dataset updated
Feb 7, 2025
Dataset provided by
Maura Rivero, Ferran
Casasnovas, Miguel
Bellalta, Boris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains results from streaming VR content over Wi-Fi 6 using our Air Light VR (ALVR) v20.6.0 fork. In particular, it comprises ALVR session logs with statistics in JSON format for each test in Sections VI and VII of our published paper, NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi. Additionally, for each test in Section VI, it includes tshark-processed traffic traces in space-separated CSV format, collected using Wireshark v4.0.3 at both the server and the network emulator’s Ethernet interface to the access point. Moreover, for each test in Section VI, validation result figures are included. For each test in Section VII, temporal evolution and/or boxplot figures for several Quality of Service metrics—such as delivery frame rate, bitrate, video frame round-trip time, and packet loss—are also included.

Section VI tests use a Constant BitRate (CBR) of 100 Mbps with several emulated network effects, including limited bandwidth (100 Mbps, 95 Mbps, 90 Mbps), packet loss (0.5%, 1%, 2%), duplicated packets (0.5%, 1%, 2%), and packet jitter (0–6 ms, 0–10 ms, 0–20 ms).

The dataset structure for Section VII includes a folder for each subsection (VII A: 7.1, VII B: 7.2, VII C: 7.3, VII D: 7.4). Section 7.1 folder includes tests on emulated limited network bandwidth (100 Mbps, 95 Mbps, 90 Mbps) using either CBR, ALVR's native Adaptive BitRate (ABR) algorithm, or our VR-tailored ABR, NeSt-VR (Network-aware Step-wise ABR algorithm for VR streaming). Section 7.2 folder contains a single-user (user A) mobility test using either CBR or NeSt-VR. Section 7.3 folder includes a multi-user test with two users (user A and user B) using either CBR or NeSt-VR, with results for both users streaming in isolation or concurrently. Section 7.4 folder contains tests with Overlapping Basic Service Set (OBSS) activity, where two access points operate on the same frequency channel with overlapping coverage areas, using either a fully overlapping channel bandwidth of 40 MHz or 80 MHz.

ALVR session logs contain several built-in ALVR statistics (event_type:{"id":"GraphStatistics", which includes total pipeline latency and its components) and additional statistics incorporated in our ALVR fork (event_type:{"id":"GraphNetworkStatistics", which records metrics such as frame span, frame interarrival, video frame round-trip time, packet loss, instantaneous video network throughput, peak network throughput, video frame jitter, video packet jitter, and filtered one-way delay; event_type:{"id":"HeuristicStats", which includes the decision-making statistics involved in each NeSt-VR bitrate adjustment interval). Please refer to our published paper or our ALVR fork for more details.

Tshark-processed traffic traces contain several packet-level details: the relative timestamp (frame.time_relative), source and destination IP addresses (ip.src, ip.dst), total packet length including headers and payload (frame.len), and the raw packet payload (data.data). The first 22 bytes of each packet’s payload contain ALVR’s application-specific prefix, which includes the associated frame’s payload size in bytes (4 bytes), a stream identifier (2 bytes), the frame index (4 bytes), the number of packets composing the frame (4 bytes), the packet index within the frame (4 bytes), and the packet’s relative departure time (4 bytes).

IoMT Traffic Data: Benchmarking for IoMT IDS

kaggle.com

zip

Updated Nov 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Abhinav Mangalore (2025). IoMT Traffic Data: Benchmarking for IoMT IDS [Dataset]. https://www.kaggle.com/datasets/abhinavmangalore/iomt-traffic-data

Explore at:

zip(197674393 bytes)Available download formats

Dataset updated

Nov 12, 2025

Authors

Abhinav Mangalore

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The IoMT-TrafficData dataset has been developed to benchmark Machine Learning models for Intrusion Detection Systems (IDS) in the Internet of Medical Things (IoMT). The dataset simulates real-world attacks and normal network behavior in IoT and IoMT environments to enhance medical device security and patient data protection.

The dataset and its benchmarking methodology are detailed in the research article.

If you use this dataset, please credit the original authors:

Areia, J., Bispo, I. A., Santos, L., & Costa, R. L. (2023). IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things.
IEEE Access. DOI: 10.1109/ACCESS.2024.3437214

Zenodo DOI: 10.5281/zenodo.8116338
Original Source: Zenodo (Creative Commons Attribution 4.0 International License)

Dataset Overview

BLE Dataset Features

Feature	Meaning
btle.advertising_header	BLE Advertising Packet Header
btle.advertising_header.ch_sel	Channel Selection Algorithm
btle.advertising_header.length	Advertising Length
btle.advertising_header.pdu_type	Advertising PDU Type
nordic_ble.crcok	Indicates if CRC is Correct
nordic_ble.packet_time	Packet time (start to end)
nordic_ble.phy	PHY
...	(see Zenodo for full feature list)

IP-Based Packet Dataset Features

Feature	Meaning
http.content_length	Length of HTTP response content
tcp.analysis.initial_rtt	Initial round-trip time for TCP
tcp.flags.syn	SYN flag in TCP
arp.isannouncement	Indicates ARP announcement
...	(see Zenodo for full list)

IP-Based Flows Dataset Features

Feature	Meaning
proto	Transport layer protocol
service	Application protocol
orig_bytes	Originator payload bytes
resp_bytes	Responder payload bytes
flow_duration	Duration of the flow
fwd_pkts_per_sec	Forward packets per second
flow_iat	Flow inter-arrival time
...	(see Zenodo for full list)

CESNET-USTS23: a benchmark dataset of Unevenly spaced time series from...
data.niaid.nih.gov
Updated Mar 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koumar, Josef; Čejka, Tomáš (2024). CESNET-USTS23: a benchmark dataset of Unevenly spaced time series from network traffic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7923744
Explore at:
Dataset updated
Mar 21, 2024
Dataset provided by
CESNEThttp://www.cesnet.cz/
Czech Technical University in Prague
Authors
Koumar, Josef; Čejka, Tomáš
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created to evaluate characteristics of Unevenly sampled time series from network traffic (USTS) for the paper Unevenly Spaced Time Series from Network Traffic.

The file named time_series.tar.gz contains a folder with time series CSV files as raw data of the experiment. In the folder are the following files:

fts.csv -- contains 2.6 million Flow time series (FTS) created from 259 million IP flows,

pts.csv -- contains 19 million Packet time series (PTS) created from 110 million network packets,

sfts.csv -- contains 15 million Single flow time series (SFTS) created from 160 million network packets.

Traffic was captured on the national CESNET2 network from February 2023 to April 2023. All IP addresses in the dataset were anonymized.

The fts.csv has the following format:

ID_DEPENDENCY -- Identification of a network dependency observed as a Flow time series. (real IP address was anonimized by replacing with a random IP address)

N_FLOWS -- Number of flows in time series, i.e., number of data points.

N_PACKETS -- Number of packets in time series, i.e., the sum of metric PACKETS.

N_BYTES -- Number of bytes in time series, i.e., the sum of metric PACKETS.

PACKETS -- The array containing the time series metric number of packets in the IP flow.

BYTES -- The array containing the time series metric number of bytes in the IP flow.

START_TIMES -- The array containing the time series time axis of the flows starts.

END_TIMES -- The array containing the time series time axis of the flows ends.

The pts.csv has the following format:

ID_DEPENDENCY -- Identification of a network dependency observed as a Packet time series. (real IP address was anonymized by replacing with a random IP address)

BYTES -- The array containing the time series metric payload length of the network packet.

TIMES -- The array containing the time series time axis of the transmission of network packets.

The sfts.csv has the following format:

SRC_IP -- Source IP address. (real IP address was anonimized by replacing with a random IP address)

SRC_PORT -- Source port.

DST_IP -- Destination IP address (real IP address was anonymized by replacing with a random IP address)

DST_PORT -- Destination port.

bytes -- The array containing the time series metric payload length of the network packet.

time -- The array containing the time series time axis of the transmission of network packets.

The file named characteristics.tar.gz contains a folder with characteristics gained by experiments from time series files. In the folder are the following files:

fts.characteristics.csv -- Characteristics about Flow time series from the fts.csv.

pts.characteristics.csv -- Characteristics about Packet time series from the pts.csv.

sfts.characteristics.csv -- Characteristics about Single flow time series from the sfts.csv.

The fts.characteristics.csv has the following format:

LENGTH -- Number of data points in the source time series.

DURATION -- Duration of the source time series.

H_BYTES -- Hurst exponent of the source time series metric BYTES.

STATIONARITY_PACKETS -- Stationarity of the source time series metric PACKETS.

STATIONARITY_BYTES -- Stationarity of the source time series metric BYTES.

OVERALL_STATIONARITY -- Overal stationarity created by merging STATIONARITY_PACKETS and STATIONARITY_BYTES.

The pts.characteristics.csv and sfts.characteristics.csv have the following format:

LENGTH -- Number of data points in the source time series.

DURATION -- Duration of the source time series.

H -- Hurst exponent of the source time series.

STATIONARITY -- Stationarity of the source time series.

We provide the samples of all zipped files for a quick lookup: fts.characteristics.sample.csv, fts.sample.csv, pts.characteristics.sample.csv, pts.sample.csv, sfts.characteristics.sample.csv, sfts.sample.csv
Data from: Implementation of a Multi-Channel DASH7 IoT Communication System...
zenodo.org
zip
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dennis Joosens; Dennis Joosens; Noori BniLam; Noori BniLam; Maarten Weyn; Maarten Weyn; Rafael Berkvens; Rafael Berkvens (2024). Implementation of a Multi-Channel DASH7 IoT Communication System for Packet Investigation and Validation [Dataset]. http://doi.org/10.5281/zenodo.13734533
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13734533
Dataset updated
Sep 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dennis Joosens; Dennis Joosens; Noori BniLam; Noori BniLam; Maarten Weyn; Maarten Weyn; Rafael Berkvens; Rafael Berkvens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains three cabled DASH7 data sets. All data sets are formatted as sigmf-data and sigmf-meta pairs, which can be investigated using IQEngine, GNU Radio, or MATLAB. Below you can find a more extended description of the data sets.

CH0.zip, CH93.zip, CH186.zip:

Cabled data sets of 3 channels

10 recordings per channel

1 DASH7 packet per file pair (SigMF)

Fc: 866.5 MHz

Sample rate: 7.68 MHz

Data type: ci16_le

Length: 1 second

Channel class: Lo-Rate

Sync word: 0x0B67

3 Lo-Rate channel recordings

channel 0 (Fc: 863.0125 MHz),

channel 93 (Fc: 865.3375 MHz),

channel 186 (Fc: 867.6625 MHz)

Payload: 3 bytes [counter_byte 0xAB 0xCD]

counter byte is always [0x00]

logs.zip:

Contains all the DASH7 gateway logs per measured channel.
Drone-Based Malware Detection (DBMD)
kaggle.com
zip
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2024). Drone-Based Malware Detection (DBMD) [Dataset]. https://www.kaggle.com/datasets/nasirayub2/drone-based-malware-detection-dbmd/suggestions?status=pending
Explore at:
zip(67433750 bytes)Available download formats
Dataset updated
Jul 27, 2024
Authors
DatasetEngineer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description Welcome to the Drone-Based Malware Detection dataset! This dataset is designed to aid researchers and practitioners in exploring innovative cybersecurity solutions using drone-collected data. The dataset contains detailed information on network traffic, drone sensor readings, malware detection indicators, and environmental conditions. It offers a unique perspective by integrating data from drones with traditional network security metrics to enhance malware detection capabilities.

Dataset Overview The dataset comprises four main categories:

Network Traffic Data: Captures network traffic attributes including IP addresses, ports, protocols, packet sizes, and various derived metrics. Drone Sensor Data: Includes GPS coordinates, altitude, speed, heading, battery level, and other sensor readings from drones. Malware Detection Data: Contains indicators and scores relevant to detecting malware, such as anomaly scores, suspicious IP counts, reputation scores, and attack types. Environmental Data: Provides context through environmental conditions like location type, noise level, weather conditions, and more. Files and Features The dataset is divided into four separate CSV files:

network_traffic_data.csv

timestamp: Date and time of the traffic event. source_ip: Source IP address. destination_ip: Destination IP address. source_port: Source port number. destination_port: Destination port number. protocol: Network protocol (TCP, UDP, ICMP). packet_length: Length of the network packet. payload_data: Content of the packet payload. flag: Network flag (SYN, ACK, FIN, RST). traffic_volume: Volume of traffic in bytes. flow_duration: Duration of the network flow. flow_bytes_per_s: Bytes per second for the flow. flow_packets_per_s: Packets per second for the flow. packet_count: Number of packets in the flow. average_packet_size: Average size of packets. min_packet_size: Minimum packet size. max_packet_size: Maximum packet size. packet_size_variance: Variance in packet sizes. header_length: Length of the packet header. payload_length: Length of the packet payload. ip_ttl: Time to live for the IP packet. tcp_window_size: TCP window size. icmp_type: ICMP type (echo_request, echo_reply, destination_unreachable). dns_query_count: Number of DNS queries. dns_response_count: Number of DNS responses. http_method: HTTP method (GET, POST, PUT, DELETE). http_status_code: HTTP status code (200, 404, 500, 301). content_type: Content type (text/html, application/json, image/png). ssl_tls_version: SSL/TLS version. ssl_tls_cipher_suite: SSL/TLS cipher suite. drone_data.csv

latitude: Latitude of the drone. longitude: Longitude of the drone. altitude: Altitude of the drone. speed: Speed of the drone. heading: Heading of the drone. battery_level: Battery level of the drone. drone_id: Unique identifier for the drone. flight_time: Total flight time. signal_strength: Strength of the drone's signal. temperature: Temperature at the drone's location. humidity: Humidity at the drone's location. pressure: Atmospheric pressure at the drone's location. wind_speed: Wind speed at the drone's location. wind_direction: Wind direction at the drone's location. gps_accuracy: Accuracy of the GPS signal. malware_detection_data.csv

anomaly_score: Score indicating the level of anomaly detected. suspicious_ip_count: Number of suspicious IP addresses detected. malicious_payload_indicator: Indicator for malicious payload (0 or 1). reputation_score: Reputation score for the network entity. behavioral_score: Behavioral score indicating potential malicious activity. attack_type: Type of attack (DDoS, phishing, malware). signature_match: Indicator for signature match (0 or 1). sandbox_result: Result from sandbox analysis (clean, infected). heuristic_score: Heuristic score for potential threats. traffic_pattern: Pattern of the traffic (burst, steady). environmental_data.csv

location_type: Type of location (urban, rural). nearby_devices: Number of nearby devices. signal_interference: Level of signal interference. noise_level: Noise level in the environment. time_of_day: Time of day (morning, afternoon, evening, night). day_of_week: Day of the week. weather_conditions: Weather conditions (sunny, rainy, cloudy, stormy). Usage and Applications This dataset can be used for:

Cybersecurity Research: Developing and testing algorithms for malware detection using drone data. Machine Learning: Training models to identify malicious activity based on network traffic and drone sensor readings. Data Analysis: Exploring the relationships between environmental conditions, drone sensor data, and network traffic anomalies. Educational Purposes: Teaching data science, machine learning, and cybersecurity concepts using a comprehensive and multi-faceted dataset.

Acknowledgements This dataset is based on real-world data collected from drone sensors and network traffic monitoring s...
MSSF-MalNet-2024
zenodo.org
Updated Sep 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Saadoon Saadoon; Mohammed Saadoon Saadoon; Suhad Faisal Behadili; Suhad Faisal Behadili (2025). MSSF-MalNet-2024 [Dataset]. http://doi.org/10.5281/zenodo.15453468
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15453468
Dataset updated
Sep 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mohammed Saadoon Saadoon; Mohammed Saadoon Saadoon; Suhad Faisal Behadili; Suhad Faisal Behadili
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2024
Description
The dataset was collected using honeypots deployed with the Honeytrap agent. The honeypots captured both benign and malicious network traffic, providing valuable insights into different attack behaviors. The dataset consists of 9 features that represent various aspects of network traffic, including both structural and payload data. These features are as follows:

Protocol: The communication protocol used in the network traffic, such as HTTP, FTP, or SSH.

remote_ip: The IP address of the remote (attacker) system that initiated the connection.

remote_port: The port number on the remote system that the connection was made to.

local_ip: The IP address of the local (honeypot) system that received the connection.

local_port: The port number on the local system that accepted the connection.

md5_hash: The MD5 hash of the data payload (if applicable), used for identifying and comparing files or data.

sha512_hash: The SHA-512 hash of the data payload (if applicable), providing a more secure representation for identifying files or data.

Length: The length of the data payload (in bytes), representing the size of the network traffic.

data_hex: The hexadecimal representation of the raw data payload, which can include commands or other information related to the communication.

This dataset was used to train machine learning models to classify the network traffic as either benign or malicious. The features provide valuable information to differentiate between normal communication and suspicious activities, such as potential cyber-attacks.
Cyberattacks Detection
kaggle.com
zip
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lastman0800 (2024). Cyberattacks Detection [Dataset]. https://www.kaggle.com/datasets/lastman0800/cyberattacks-detection
Explore at:
zip(4076139 bytes)Available download formats
Dataset updated
Jul 28, 2024
Authors
lastman0800
Description
This dataset meticulously captured for the analysis and detection of cyberattacks using machine learning techniques. It comprises 100,000 rows, each representing a unique cyberattack event. The dataset includes a diverse range of attack types, protocols, and affected systems, making it an invaluable resource for developing and testing detection models.

Columns and Attributes

Attack ID: A unique identifier assigned to each attack instance, ranging from 1 to 100,000. This column ensures each row is distinct and can be referenced individually.

Timestamp: The exact date and time when the attack was detected, formatted as YYYY-MM-DD HH:MM:SS. This column helps in analyzing the temporal patterns of attacks and identifying trends over time.

Source IP: The IP address of the machine from which the attack originated. Each IP address in the dataset is unique, simulating a diverse set of attackers and adding realism to the dataset.

Destination IP: The IP address of the target machine under attack. Similar to the source IPs, destination IPs are also unique, representing a wide range of potential targets and ensuring a comprehensive dataset.

Source Country: The country associated with the source IP address, randomly assigned from a set of major countries (e.g., USA, China, Russia). This attribute is crucial for geographic analysis of attack origins and understanding global threat landscapes.

Destination Country: The country associated with the destination IP address, providing context about the target locations and enabling analysis of international attack patterns.

Protocol: The network protocol used during the attack, such as TCP, UDP, or ICMP. This column is essential for understanding the type of communication involved in the attack and for protocol-specific analysis.

Source Port: The port number on the source machine used for the attack. This can be useful in identifying common ports used by attackers and understanding the methods of attack.

Destination Port: The port number on the destination machine targeted by the attack. This attribute, combined with the port type, helps in understanding the specific services under attack and identifying vulnerable entry points.

Port Type: A derived column that categorizes the destination port into common service types (e.g., HTTP, HTTPS, FTP). This simplifies the analysis of which services are frequently targeted and aids in focusing defensive measures.

Attack Type: A descriptive label for the type of cyberattack, including a variety of attack methods such as Distributed Denial of Service (DDoS), SQL Injection, and Phishing. The dataset includes a broad spectrum of attack types to cover different threat scenarios and provide comprehensive analysis opportunities.

Payload Size (bytes): The size of the data payload involved in the attack, measured in bytes. This helps in understanding the scale and potential impact of each attack, with larger payloads often indicating more significant or complex attacks.

Detection Label: Indicates whether the attack was detected by the system (Detected) or not (Not Detected). This binary label is crucial for evaluating the effectiveness of detection models and understanding detection rates.

Confidence Score: A probability score ranging from 0 to 1, representing the confidence level of the detection model for each attack instance. For detected attacks, the score is between 0.50 and 1.00, while for undetected attacks, it is between 0.00 and 0.49. This score is essential for assessing the reliability of the detection model.

ML Model: The type of machine learning model used to identify the attack, randomly chosen from popular models such as Random Forest, Support Vector Machine, and Neural Network. This provides insight into the model's performance and preferences, enabling comparative analysis of different models.

Affected System: The type of system targeted by the attack, such as a Database Server, Web Server, or IoT Device. This helps in understanding the potential impact on different infrastructure components and focusing security efforts on the most critical systems.

Realism and Practicality

The dataset introduces a realistic element by including null values in various columns. This simulates real-world data imperfections and prepares the dataset for more robust handling and preprocessing techniques during analysis. The inclusion of unique IP addresses for both source and destination adds to the authenticity, reflecting the diverse nature of cyberattacks in the real world.

Overall, this dataset is a valuable resource for researchers, analysts, and developers working on cybersecurity solutions. It provides a rich, varied, and realistic foundation for developing and testing machine learning models aimed at detecting and mitigating cybe...
Z
Data from: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset...
data.niaid.nih.gov
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hynek, Karel; Luxemburk, Jan; Pešek, Jaroslav; Čejka, Tomáš; Pavel, Šiška (2025). CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10608606
Explore at:
Dataset updated
Mar 24, 2025
Dataset provided by
Czech Education and Scientific Network
Authors
Hynek, Karel; Luxemburk, Jan; Pešek, Jaroslav; Čejka, Tomáš; Pavel, Šiška
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo.

The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.

Data description The dataset consists of network flows describing encrypted TLS communications. Flows are extended with packet sequences, histograms, and fields extracted from the TLS ClientHello message, which is transmitted in the first packet of the TLS connection handshake. The most important extracted handshake field is the SNI domain, which is used for ground-truth labeling.

Packet Sequences Sequences of packet sizes, directions, and inter-packet times are standard data input for traffic analysis. For packet sizes, we consider the payload size after transport headers (TCP headers for the TLS case). We omit packets with no TCP payload, for example ACKs, because zero-payload packets are related to the transport layer internals rather than services’ behavior. Packet directions are encoded as ±1, where +1 means a packet sent from client to server, and -1 is a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate a response. Packet sequences have a maximum length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction; in other words, each client request and server response pair counts as one roundtrip.

Flow statistics Each data record also includes standard flow statistics, representing aggregated information about the entire bidirectional connection. The fields are the number of transmitted bytes and packets in both directions, the duration of the flow, and packet histograms. The packet histograms include binned counts (not limited to the first 30 packets) of packet sizes and inter-packet times in both directions. There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes (More information in the PHISTS plugin documentation). Moreover, each flow has its end reason---either it ended with the TCP connection termination (FIN packets), was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons.

Dataset structure The dataset is organized per weeks and individual days. The flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the total number of saved flows and the number of flows per service. There are also files aggregating flow counts for each week (stats-week.json) and for the entire dataset (stats-dataset.json). The following list describes flow data fields in CSV files:

ID: Unique identifier

SRC_IP: Source IP address

DST_IP: Destination IP address

DST_ASN: Destination Autonomous System number

SRC_PORT: Source port

DST_PORT: Destination port

PROTOCOL: Transport protocol

FLAG_CWR: Presence of the CWR flag

FLAG_CWR_REV: Presence of the CWR flag in the reverse direction

FLAG_ECE: Presence of the ECE flag

FLAG_ECE_REV: Presence of the ECE flag in the reverse direction

FLAG_URG: Presence of the URG flag

FLAG_URG_REV: Presence of the URG flag in the reverse direction

FLAG_ACK: Presence of the ACK flag

FLAG_ACK_REV: Presence of the ACK flag in the reverse direction

FLAG_PSH: Presence of the PSH flag

FLAG_PSH_REV: Presence of the PSH flag in the reverse direction

FLAG_RST: Presence of the RST flag

FLAG_RST_REV: Presence of the RST flag in the reverse direction

FLAG_SYN: Presence of the SYN flag

FLAG_SYN_REV: Presence of the SYN flag in the reverse direction

FLAG_FIN: Presence of the FIN flag

FLAG_FIN_REV: Presence of the FIN flag in the reverse direction

TLS_SNI: Server Name Indication domain

TLS_JA3: JA3 fingerprint of TLS client

TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff

TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff

DURATION: Duration of the flow in seconds

BYTES: Number of transmitted bytes from client to server

BYTES_REV: Number of transmitted bytes from server to client

PACKETS: Number of packets transmitted from client to server

PACKETS_REV: Number of packets transmitted from server to client

PPI: Packet sequence in the format: [[inter-packet times], [packet directions], [packet sizes], [push flags]]

PPI_LEN: Number of packets in the PPI sequence

PPI_DURATION: Duration of the PPI sequence in seconds

PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence

PHIST_SRC_SIZES: Histogram of packet sizes from client to server

PHIST_DST_SIZES: Histogram of packet sizes from server to client

PHIST_SRC_IPT: Histogram of inter-packet times from client to server

PHIST_DST_IPT: Histogram of inter-packet times from server to client

APP: Web service label

CATEGORY: Service category

FLOW_ENDREASON_IDLE: Flow was terminated because it was idle

FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout

FLOW_ENDREASON_END: Flow ended with the TCP connection termination

FLOW_ENDREASON_OTHER: Flow was terminated for other reasons
Z
Data from: Experimenting with Adaptive Bitrate Algorithms for Virtual...
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maura, Ferran; Casasnovas, Miguel; Bellalta, Boris (2024). Experimenting with Adaptive Bitrate Algorithms for Virtual Reality Streaming over Wi-Fi [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12723989
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Universitat Pompeu Fabra
Pompeu Fabra University
Authors
Maura, Ferran; Casasnovas, Miguel; Bellalta, Boris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of resulting files from capturing VR traffic in Wi-Fi 6 of a fork of the Air Light Virtual Reality (ALVR) software, used to stream games from a PC to a VR HMD in real time. The dataset includes:

Parsed Wireshark captures in csv format, both captured from server and network emulator, and corresponding ALVR session log are found for each experiment. In each folder, all files of netem, server or ALVR are found (with names corresponding to the emulated network effect, which is applied via the netem computer). We are using Constant BitRate (CBR) for each test, at 100 Mbps. The plots are added in the corresponding folder for each effect, and a metric comparison between WS and ALVR.

ALVR session logs for a comparison on the logged metrics under tests of Mobility, using different strategies for bitrate adaptation: CBR, ABR and our own contribution.

ALVR session logs for a comparison on the logged metrics under tests of emulated capacity drops, using different strategies for bitrate adaptation: CBR, ABR and our own contribution.

The Wireshark captures have been parsed from UDP packets in a pcapng into a csv file (via tshark) containing the principal fields of each packet separated by a space, since the pcap captures were over 1 GB each, we keep only a subset of the first bytes of the payload and the main fields, and discard the rest. There are additional CSV files for TCP UL packets, which we parsed separately from the same captures for us to validate the measured RTT of ALVR.

The ALVR session logs contain raw json strings in .txt format, logged from the server using our fork of ALVR. We're using some additional events from the ones ALVR originally used, in order to log our metrics at arbitrary points in the code.

The first 22 bytes of the payload in each packet are used to parse into the StreamSocket fields that ALVR uses, and record timestamps to validate the metrics of ALVR manually; which can be used to reproduce our results. Namely, each row of the csv (frame.time_relative, ip.src, ip.dst, frame.len, data.data) contains the timestamp of the packet, its IP source, destination, length and first 22 bytes of the payload as a hexadecimal string.
h
brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Electric Sheep, brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa [Dataset]. https://huggingface.co/datasets/electricsheepafrica/brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa
Explore at:
Dataset authored and provided by
Electric Sheep
License
https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
Description
Dataset Card: Africa Brain Tumor Scans with Synthetic EHR (Bundled Parquet)

This dataset bundles single-slice brain MRI scans and richly structured, synthetic EHR data into a single Parquet file suitable for multimodal ML research. Each row contains an image struct (bytes + path), a source label column, and an EHR payload with both a full JSON record and convenient summary columns. The synthetic EHRs are Africa-focused: they encode country, urban/rural, facility level, insurance… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa.
INDDOS24 Dataset
kaggle.com
zip
Updated Dec 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2024). INDDOS24 Dataset [Dataset]. https://www.kaggle.com/datasets/datasetengineer/inddos24-dataset
Explore at:
zip(4040730 bytes)Available download formats
Dataset updated
Dec 7, 2024
Authors
DatasetEngineer
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The INDDOS24 Dataset is a comprehensive and synthetic dataset designed for analyzing Distributed Denial of Service (DDoS) attacks in Internet of Things (IoT) networks. The dataset spans a period from January 1, 2019, to July 1, 2024, capturing hourly network traffic from various IoT devices, including cameras, sensors, and smart appliances. This dataset simulates realistic traffic dynamics, including both normal operations and attack scenarios, providing researchers and practitioners with a rich resource to develop and evaluate machine learning and deep learning-based DDoS detection models.

Key Features of INDDOS24 Dataset Timestamp: The date and time of each network event, recorded hourly, covering more than five years of traffic data.

Source IP: The IP address from which the network traffic originates, representing the source device in the network.

Destination IP: The IP address to which the network traffic is directed, representing the target device.

Source Port: The port number used by the source device for communication.

Destination Port: The port number used by the target device for receiving traffic.

Protocol: The communication protocol used, including TCP, UDP, and ICMP.

Packet Size: The size of each network packet in bytes, ranging from small control packets to large data transmissions.

Payload Length: The length of the payload in the network packets, representing actual data being transmitted.

Flow Duration: The duration of the network flow in seconds, capturing the session length between devices.

Bytes in Flow: The total number of bytes transmitted during the flow.

Packets in Flow: The total number of packets transmitted during the flow.

Average Packet Size: The average size of packets within a flow, useful for distinguishing attack patterns from normal traffic.

Inter-Arrival Time: The time interval between successive packets in a flow, capturing traffic burstiness.

Rate of Packets: The rate of packets per second, highlighting high-rate traffic scenarios typical of DDoS attacks.

Unique Source Count: The number of unique source IP addresses observed in the flow.

Unique Destination Count: The number of unique destination IP addresses in the flow.

Anomaly Score: A computed score indicating the likelihood of anomalous or malicious activity within the traffic.

Device Type: The type of IoT device generating the traffic, such as Camera, Sensor, or Smart Appliance.

Operating System: The operating system of the IoT device, including Linux, Windows, or RTOS (Real-Time Operating System).

Firmware Version: The firmware version running on the device, reflecting device configuration.

Attack Type: The type of attack, if detected, including "SYN Flood," "UDP Flood," "Application Layer Attack," or "No Attack."

Attack Duration: The duration of the detected attack in seconds, where applicable.

Target Device: The specific device targeted by the attack, if applicable, or "None" for normal traffic.

Labels: Multi-label annotations for each record, indicating attack types or normal traffic. Labels are unbalanced to simulate real-world distributions, with "Normal" traffic dominating.

Key Highlights Multi-Label Annotations: Each record can have multiple labels to capture complex scenarios where different attack types may occur simultaneously.

Realistic Traffic Simulation: The dataset reflects both the prevalence of normal traffic and the intermittent nature of DDoS attacks in IoT environments.

Diverse Features: With over 20 features, the dataset supports detailed traffic analysis and the development of robust anomaly detection systems.

Unbalanced Distribution: Mimics real-world IoT networks where normal traffic significantly outweighs malicious activities.

The INDDOS24 dataset serves as a valuable resource for advancing IoT network security, particularly in detecting and mitigating DDoS attacks. It is suitable for researchers, data scientists, and engineers developing machine learning and deep learning-based models for intrusion detection and network anomaly analysis.
Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...
data.niaid.nih.gov
zenodo.org
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
Explore at:
Dataset updated
Feb 29, 2024
Dataset provided by
CESNEThttp://www.cesnet.cz/
FIT Czech Technical University in Prague
Authors
Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

W-2022-44

Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

Link to other CESNET datasets

https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

@article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Saehoon Oh (2025). M-CAN Intrusion Detection Dataset [Dataset]. https://ieee-dataport.org/documents/m-can-intrusion-detection-dataset

M-CAN Intrusion Detection Dataset

Explore at:

Dataset updated

Nov 26, 2025

Authors

Saehoon Oh

Description

DLC values

Clear search

Close search

Google apps

Main menu

M-CAN Intrusion Detection Dataset

UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

Network traffic datasets with novel extended IP flow called NetTiSA flow

NeSt-VR: Adaptive Bitrate Algorithm for Virtual Reality Streaming over Wi-Fi...

IoMT Traffic Data: Benchmarking for IoMT IDS

Description

Dataset Overview

BLE Dataset Features

IP-Based Packet Dataset Features

IP-Based Flows Dataset Features

CESNET-USTS23: a benchmark dataset of Unevenly spaced time series from...

Data from: Implementation of a Multi-Channel DASH7 IoT Communication System...

Drone-Based Malware Detection (DBMD)

MSSF-MalNet-2024

Cyberattacks Detection

Columns and Attributes

Realism and Practicality

Data from: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset...

Data from: Experimenting with Adaptive Bitrate Algorithms for Virtual...

brain-tumor-single-slice-MRI-scan-with-synthetic-ehr-africa

INDDOS24 Dataset

Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

M-CAN Intrusion Detection Dataset