Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
This data set was originally downloaded from: https://www.unb.ca/cic/datasets/ids-2018.html
The data set has a weight of 466GB.
When the download is done, the file contains 2 folders: Processed Traffic Data for ML Algorithms and Original network traffic and log data.
The "Processed Traffic Data for ML Algorithms" folder contains 10 csv files with the following names:
And the "Original Network Traffic and Log data" folder contains 10 folders, each folder is named as the previous files. Each folder contains in turn two folders logs and pcap.
Here is the PCAP for Friday-02-03-2018.
This dataset was generated on a small-scale process automation scenario using MODBUS/TCP equipment, for research on the application of ML techniques to cybersecurity in Industrial Control Systems. The testbed emulates a CPS process controlled by a SCADA system using the MODBUS/TCP protocol. It consists of a liquid pump simulated by an electric motor controlled by a variable frequency drive (allowing for multiple rotor speeds), which in its turn controlled by a Programmable Logic Controller (PLC). The motor speed is determined by a set of predefined liquid temperature thresholds, whose measurement is provided by a MODBUS Remote Terminal Unit (RTU) device providing a temperature gauge, which is simulated by a potentiometer connected to an Arduino. The PLC communicates horizontally with the RTU, providing insightful knowledge of how this type of communications may have an effect on the overall system. The PLC also communicates with the Human-Machine Interface (HMI) controlling the system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset is a result of a version-sensitive network traffic classification framework. The framework's goal is to distinguish between different application versions based on the observed network traffic. The framework is part of the author's master thesis. The source code and thesis paper is available in the author's Github.
Dataset
The dataset consists of PCAP files from different Kubernetes application versions. The dataset also contains fingerprint comparison files in a PCAP and CSV format. Finally the dataset contains classification results between different application versions. More details in the thesis paper and source code.
Structure
The data is stored in the following format and hierarchy:
- Root folder
- There are multiple csv files that contain aggregated statistical information about the data, applications and results.
- There are subfolders for each recorded application. The folders are named `
- Each subfolder contains all the recorded data for each application version deployment in PCAP format. Each recording is named `
- Each subfolder also contains a config file that can be used to recapture the recorded data.
- Each subfolder also has a pod metadata file and an output csv file that contains a summary of the recorded PCAP files.
- Each subfolder also contains a subfolder named `fingerprint_comparison` that contains the fingerprint comparison results and the final classification results.
- The comparison results are stored in a csv and PCAP format. PCAP is more used for debugging and the csv file is used to generate the final `aggregated_results.csv` file which is fed to the machine learning model.
- The final classification results are stored in the `prediction_results.csv` file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes Pcap files with DDoS and background traffic related to a host machine. Data can be used to test host-based DDoS detection solutions. DDoS traffic has been generated using the following tool: https://github.com/ricardojoserf/ddos simulation/tree/master
Each archive include a different number of malicious IP addresses, specified in the .zip file name.
If you use this dataset, please credit us by citing our paper:
M. Zang, F. De Iaco, J. Wu, M. Savi, In-Kernel Traffic Sketching for Volumetric DDoS Detection, in IEEE International Conference on Communications (ICC), Jun. 2025
If using LaTeX, you can use the following BibTeX:@inproceedings{zang2025ebpfsketching, title={In-Kernel Traffic Sketching for Volumetric DDoS Detection}, author={Zang, Mingyuan and De-Iaco, Federico and Wu, Jie and Savi, Marco}, booktitle={IEEE International Conference on Communications (ICC)}, year={2025},}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains an anonymized packets of 802.11 probe requests captured throughout March of 2023 at Universitat Jaume I. The packet capture file is in the standardized *.pcap binary format and can be opened with any packet analysis tool such as Wireshark or scapy (Python packet analysis and manipulation package).
The dataset is usable for analyzis of Wi-Fi probe requests, presence detection, occupancy estimation or signal stability analyzis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive contains the datasets used for the experiments in the paper "Non-cooperative 802.11 MAC layer fingerprinting and tracking of mobile devices", namely:
Glimps 2015 dataset (mac_info collection): A collection of 122,989 Probe Request frames captured by 8 monitoring stations at the Glimps music festival in Ghent, Belgium (10 - 12 December 2015). To minimize overhead, each monitoring station individually stored only one Probe Request per unique MAC. The dataset was used to show that the high entropy in Probe Requests can be used to deanonymize devices that use MAC address randomization. Only the source MAC and Information Elements (IEs) were captured for this purpose.
Research center 2016 dataset (mac_research collection): A complete collection of all management and control frames (including Radiotap headers) observed at our research lab from 28 January to 8 Febuary 2016. This dataset was used to calculate the "stability" and "variability" of Probe Request IEs (see our paper for more details on these metrics).
Transmission rate datasets (mac_research_0 - mac_research_4 collections): Observations of mobile devices when actively instigated for extra transmissions. These observations were used in the paper to calculate the effectiveness of the various stimulus frame techniques. This dataset should only be used to verify the results in the paper. The other datasets could be used for related experiments.
All datasets were anonymized by applying the following rules:
The 3 least significant bytes of each MAC address were uniquely and consistently mapped to a different value, with exception of "ff:ff:ff" and "00:00:00".
The SSID IE has its SSID field replaced with the string "Hidden", with exception of the wildcard (empty) SSID.
The Vendor Specific WPS IE was replaced with a hash of its payload given the amount of sensitive information (device serial / model number, UUID, etc.) contained within it, and the length of the IE was updated accordingly. Unfortunately, Wireshark stops parsing the remainder of Probes containing this anonymized IE, so it should be noted that further parsing beyond the WPS IE must be done manually (e.g. by using Scapy or by changing the Wireshark dissector).
The datasets are provided as MongoDB collections with the following document format:
_id: ObjectID of the document
info_length: Length of the binary blob
info: Binary blob of the Radiotap frame (mac_research) or only the Information Elements (mac_info)
mac_addr: Transmitter of the frame
To install the dataset, execute the command "mongorestore --gzip -d anonymized ./anonymized" after extracting the .tar.xz file.
A .pcap format of the mac_info (wrapped in a dummy Radiotap frame) and mac_research datasets is additionally provided at crawdad.org.
The mac_info dataset can be visually explored on https://wicability.net/datasets (Glimps 2015 dataset).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NOTICE: The data file 1_chrome_ffmuc.pcap is corrupted and is not used in our articles that work with this dataset.
Dataset of DNS over HTTPS traffic from Chrome (CloudFlare, FFMuc, Google, Hostux, OpenDNS, Quad9, Switch)
The dataset contains DoH and HTTPS traffic that was captured in controlled environment and generated automatically by Chrome browser with enabled DoH towards 7 different DoH servers (CloudFlare, FFMuc, Google, Hostux, OpenDNS, Quad9, Switch) and a web page loads towards a sample of web pages taken from Majestic Million dataset. The data are provided in the form of PCAP files. However, we also provided TLS enriched flow data that are generated with opensource ipfixprobe flow exporter. Other than TLS related information is not relevant since the dataset comprises only encrypted TLS traffic. The TLS enriched flow data are provided in the form of CSV files with the following columns:
Column Name | Column Description |
---|---|
DST_IP | Destination IP address |
SRC_IP | Source IP address |
BYTES | The number of transmitted bytes from Source to Destination |
BYTES_REV | The number of transmitted bytes from Destination to Source |
TIME_FIRST | Timestamp of the first packet in the flow in format YYYY-MM-DDTHH-MM-SS |
TIME_LAST | Timestamp of the last packet in the flow in format YYYY-MM-DDTHH-MM-SS |
PACKETS | The number of packets transmitted from Source to Destination |
PACKETS_REV | The number of packets transmitted from Destination to Source |
DST_PORT | Destination port |
SRC_PORT | Source port |
PROTOCOL | The number of transport protocol |
TCP_FLAGS | Logic OR across all TCP flags in the packets transmitted from Source to Destination |
TCP_FLAGS_REV | Logic OR across all TCP flags in the packets transmitted from Destination to Source |
TLS_ALPN | The Value of Application Protocol Negotiation Extension sent from Server |
TLS_JA3 | The JA3 fingerprint |
TLS_SNI | The value of Server Name Indication Extension sent by Client |
The DoH resolvers in the dataset can be identified by IP addresses written in doh_resolver_ip.csv file.
The main part of the dataset is located in DoH-Gen-C-CFGHOQS.tar.gz and has the following structure:
.
└─── data | - Main directory with data
└── generated | - Directory with generated captures
├── pcap | - Generated PCAPs
│ └── chrome
└── tls-flow-csv | - Generated CSV flow data
└── chrome
Total stats of generated data:
Name | Value |
---|---|
Total Data Size | 41.5 GB |
Total files | 14 |
DoH extracted tls flows | ~41 K |
Non-DoH extracted tls flows | ~284 K |
DoH Server information
EPA's Priority Climate Action Plan (PCAP) Directory organizes data collected from 211 PCAPs submitted by states, Metropolitan Statistical Areas (MSAs), Tribes, and territories under EPA's Climate Pollution Reduction Grants (CPRG) program. PCAPs are a compilation of each jurisdiction's identified priority actions (or measures) to reduce greenhouse gas (GHG) emissions. The directory presents information from more than 30 data categories related to GHG inventories, GHG reduction measures, benefits for low-income and disadvantaged communities (LIDACs), and other PCAP elements. Archived from https://www.epa.gov/inflation-reduction-act/priority-climate-action-plan-directory This archive contains raw input data for the Public Utility Data Liberation (PUDL) software developed by Catalyst Cooperative. It is organized into Frictionless Data Packages. For additional information about this data and PUDL, see the following resources: The PUDL Repository on GitHub PUDL Documentation Other Catalyst Cooperative data archives
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: If you use this dataset, please cite the following paper:
Brenner, B., Fabini, J., Offermanns, M., Semper, S., & Zseby, T. (2024). Malware communication in smart factories: A network traffic data set. Computer Networks, 255, 110804.
or in BibTeX:
@article{brenner2024malware,
title={Malware communication in smart factories: A network traffic data set},
author={Brenner, Bernhard and Fabini, Joachim and Offermanns, Magnus and Semper, Sabrina and Zseby, Tanja},
journal={Computer Networks},
volume={255},
pages={110804},
year={2024},
publisher={Elsevier}
}
Machine learning-based intrusion detection requires suitable and realistic data sets for training and testing. However, data sets that originate from real networks are rare. Network data is considered privacy-sensitive, and the purposeful introduction of malicious traffic is usually not possible.
In this paper, we introduce a labeled data set captured at a smart factory located in Vienna, Austria, during normal operation and during penetration tests with different attack types. The data set contains 173 GB of PCAP files, representing 16 days (395 hours) of factory operation. It includes MQTT, OPC UA, and Modbus/TCP traffic.
The captured malicious traffic originated from a professional penetration tester who performed two types of attacks:
(a) Aggressive attacks that are easier to detect.
(b) Stealthy attacks that are harder to detect.
Our data set includes the raw PCAP files and extracted flow data. Labels for packets and flows indicate whether they originated from a specific attack or from benign communication.
We describe the methodology for creating the dataset, conduct an analysis of the data, and provide detailed information about the recorded traffic itself. The dataset is freely available to support reproducible research and the comparability of results in the area of intrusion detection in industrial networks.
This dataset contains data from the UCSD Network Telescope for three days between November 2008
and January 2009, exactly one month apart. The first day (2008-11-21) covers the onset of the
Conficker A infection. On the second day, 2008-12-21, only Conficker A was active; and during
the third and final day (2009-01-21) both Conficker A and B were active.
The dataset consists of 68 compressed pcap files each containing one
hour of traffic observed by the Network Telescope.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises network traffic collected from 24 Internet of Things (IoT) devices over a span of 119 days, capturing a total of over 110 million packets. The devices represent 19 distinct types and were monitored in a controlled environment under normal operating conditions, reflecting a variety of functions and behaviors typical of consumer IoT products (pcapIoT). The packet capture (pcap) files preserve complete packet information across all protocol layers, including ARP, TCP, HTTP, and various application-layer protocols. Raw pcap files (pcapFull) are also provided, which contain traffic from 36 non-IoT devices present in the network. To facilitate device-specific analysis, a CSV file is included that maps each IoT device to its unique MAC address. This mapping simplifies the identification and filtering of packets belonging to each device within the pcap files. 3 extra CSV (CSVs) files provide metadate about the states that the devices were in at different times. Additionally, Python scripts (Scripts) are provided to assist in extracting and processing packets. These scripts include functionalities such as packet filtering based on MAC addresses and protocol-specific data extraction, serving as practical examples for data manipulation and analysis techniques. This dataset is valuable for researchers interested in network behavior analysis, anomaly detection, and the development of IoT-specific network policies. It enables the study and differentiation of network behaviors based on device functions and supports behavior-based profiling to identify irregular activities or potential security threats.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The raw traffic data contains 4 files, including 2 compressed files and 2 csv files: 1. Anonymized_bras_dataset.rar contains raw traffic data (PCAP format) captured from BRAS network units, covering 7 business application categories described in the article, with a total of 23 data files and a size of 5.36GB. 2. Anonymized_onu_dataset.rar contains raw traffic data (PCAP format) captured from ONU network units, covering 7 business application categories described in the article, with a total of 41 data files and a size of 5.05GB. 3. Bras_features.csv is a feature file which extracts featrues from PCAP files obtained from BRAS network units using the methods introduced in the article. 4. Onu_features.csv is a feature file which extracts featrues from PCAP files obtained from ONU network units using the methods introduced in the article.
Campus DNS network traffic consisting of more than 4000 active users (in peak load hours) for 10 random days in the month of April-May, 2016 is available in hourly PCAP files in the dataset. (At present only traffic for Day0(Full) and Day1(partial) could be uploaded due to 10GB data limit)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
In the digital era of the Industrial Internet of Things (IIoT), the conventional Critical Infrastructures (CIs) are transformed into smart environments with multiple benefits, such as pervasive control, self-monitoring and self-healing. However, this evolution is characterised by several cyberthreats due to the necessary presence of insecure technologies. DNP3 is an industrial communication protocol which is widely adopted in the CIs of the US. In particular, DNP3 allows the remote communication between Industrial Control Systems (ICS) and Supervisory Control and Data Acquisition (SCADA). It can support various topologies, such as Master-Slave, Multi-Drop, Hierarchical and Multiple-Server. Initially, the architectural model of DNP3 consists of three layers: (a) Application Layer, (b) Transport Layer and (c) Data Link Layer. However, DNP3 can be now incorporated into the Transmission Control Protocol/Internet Protocol (TCP/IP) stack as an application-layer protocol. However, similarly to other industrial protocols (e.g., Modbus and IEC 60870-5-104), DNP3 is characterised by severe security issues since it does not include any authentication or authorisation mechanisms. More information about the DNP3 security issue is provided in [1-3]. This dataset contains labelled Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics (Common-Separated Values - CSV format) and DNP3 flow statistics (CSV format) related to 9 DNP3 cyberattacks. These cyberattacks are focused on DNP3 unauthorised commands and Denial of Service (DoS). The network traffic data are provided through Packet Capture (PCAP) files. Consequently, this dataset can be used to implement Artificial Intelligence (AI)-powered Intrusion Detection and Prevention (IDPS) systems that rely on Machine Learning (ML) and Deep Learning (DL) techniques.
2.Instructions
This DNP3 Intrusion Detection Dataset was implemented following the methodological frameworks of A. Gharib et al. in [4] and S. Dadkhah et al in [5], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.
A network topology consisting of (a) eight industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to implement this DNP3 Intrusion Detection Dataset. In particular, the following cyberattacks were implemented.
The aforementioned DNP3 cyberattacks were executed, utilising penetration testing tools, such as Nmap and Scapy. For each attack, a relevant folder is provided, including the network traffic and the network flow statistics for each entity. In particular, for each cyberattack, a folder is given, providing (a) the pcap files for each entity, (b) the Transmission Control Protocol (TCP)/ Internet Protocol (IP) network flow statistics for 120 seconds in a CSV format and (c) the DNP3 flow statistics for each entity (using different timeout values in terms of second (such as 45, 60, 75, 90, 120 and 240 seconds)). The TCP/IP network flow statistics were produced by using the CICFlowMeter, while the DNP3 flow statistics were generated based on a Custom DNP3 Python Parser, taking full advantage of Scapy.
3. Dataset Structure
The dataset consists of the following folders:
Each folder includes respective subfolders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the DNP3 network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) generated by CICFlowMeter for the timeout value of 120 seconds and finally (c) the DNP3 flow statistics (CSV file) from the Custom DNP3 Python Parser. Finally, it is noteworthy that the network flows from both CICFlowMeter and Custom DNP3 Python Parser in each CSV file are labelled based on the DNP3 cyberattacks executed for the generation of this dataset. The description of these attacks is provided in the following section, while the various features from CICFlowMeter and Custom DNP3 Python Parser are presented in Section 5.
4.Testbed & DNP3 Attacks
The following figure shows the testbed utilised for the generation of this dataset. It is composed of eight industrial entities that play the role of the DNP3 outstations/slaves, such as Remote Terminal Units (RTUs) and Intelligent Electron Devices (IEDs). Moreover, there is another workstation which plays the role of the Master station like a Master Terminal Unit (MTU). For the communication between, the DNP3 outstations/slaves and the master station, opendnp3 was used.
Table 1: DNP3 Attacks Description
DNP3 Attack |
Description |
Dataset Folder |
DNP3 Disable Unsolicited Message Attack |
This attack targets a DNP3 outstation/slave, establishing a connection with it, while acting as a master station. The false master then transmits a packet with the DNP3 Function Code 21, which requests to disable all the unsolicited messages on the target. |
20200514_DNP3_Disable_Unsolicited_Messages_Attack |
DNP3 Cold Restart Attack |
The malicious entity acts as a master station and sends a DNP3 packet that includes the “Cold Restart” function code. When the target receives this message, it initiates a complete restart and sends back a reply with the time window before the restart process. |
20200515_DNP3_Cold_Restart_Attack |
DNP3 Warm Restart Attack |
This attack is quite similar to the “Cold Restart Message”, but aims to trigger a partial restart, re-initiating a DNP3 service on the target outstation. |
20200515_DNP3_Warm_Restart_Attack |
DNP3 Enumerate Attack |
This reconnaissance attack aims to discover which DNP3 services and functional codes are used by the target system. |
20200516_DNP3_Enumerate |
DNP3 Info Attack |
This attack constitutes another reconnaissance attempt, aggregating various DNP3 diagnostic information related the DNP3 usage. |
20200516_DNP3_Ιnfo |
Data Initialisation Attack |
This cyberattack is related to Function Code 15 (Initialize Data). It is an unauthorised access attack, which demands from the slave to re-initialise possible configurations to their initial values, thus changing potential values defined by legitimate masters |
20200518_Initialize_Data_Attack |
MITM-DoS Attack |
In |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A recorded GOOSE data set as described in
Sven Zemanek, Immanuel Hacker, Konrad Wolsing, Eric Wagner, Martin Henze, and Martin Serror. 2022. PowerDuck: A GOOSE Data Set of Cyberattacks in Substations. In Cyber Security Experimentation and Test Workshop (CSET ’22), August 8, 2022, Virtual, CA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3546096.3546102
The data set contains network traces of GOOSE communication recorded in a physical substation testbed. Further, it includes recordings of various scenarios with and without the presence of attacks. All network packets originating from the attacker are clearly labeled as such to facilitate their identification using the Industrial Protocol Abstraction Layer (IPAL) format. We thus envision PowerDuck improving and complementing existing data sets of substations, which are often generated synthetically, and thus aim to enhance the security of power grids.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The numbers are the average of 10-fold cross-validation. By optimizing PCAP90, we can generally achieve better PCAP90 values (and/or better precision) compared to optimizing F1-scores. In all cases, by optimizing PCAP90, we can achieve the desired precision of 90%. However, by optimizing F1-scores, this cannot be always achieved (colored as red).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Activities:
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes
The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
This dataset was created as suplementary material for research article: Influence of Measured Radio Environment Map Interpolation on Indoor Positioning Algorithms This package contains packet capture files of 802.11 probe requests captured at Geotec office at University Jaume I, Spain by 5 ESP32 microcontrollers. The packet capture files are in the standardized *.pcap binary format and can be opened with any packet analysis tool such as Wireshark or scapy (Python packet analysis and manipulation package). The data are split between radio map data captured at all accessible reference positions in our office spread in 1m grid and evaluation data gathered alligned to 0.5m grid, as well as in hard to access locations. The location the data were collected are available in the office. The dataset has 4 parts, and all subsets of the dataset can be generated from the captured pcap files: Data This folder contains pcap files from all 5 ESP32 stations representing the whole radio environment map. The folder name stands for each of the 5 ESP32 sniffer stations and the name of the file points to a reference location the data were captured in. Example of the coordinates matching the reference location grid names are in following table: Data Point Coordinates X Y X Y ... A1 0.85 0.1 B1 1.85 0.1 ... A2 0.85 1.1 B2 1.85 1.1 ... A3 0.85 2.1 B3 1.85 2.1 ... ... ... ... ... ... ... ... A11 0.85 10.1 B11 1.85 10.1 ... Data_Eval This folder contains pcap files from all 5 ESP32 stations with data captured at 31 locations not found in the original reference location grid. The naming corresponds to the X and Y location in which the data were collected. Processed_Data Additionally, there are 3 folders with processed CSV files. One folder that combines all radio map values, second folder contains combined evaluation values and third is with linearly interpolated radio map values. The CSV files are in a format: X, Y, RSSI_1, RSSI_2, RSSI_3, RSSI_4, RSSI_5
Data_Scenarios This folder for the ease of use, contains data for exact reproducibility of our results in the paper. There 14 scenarios described in the following table: Scenario Descriptions Data Name Scenario Description GPR00 Only measured data, 50 samples per reference position GPR01 Measured data with empty spots filled using Linear interpolation, 50 samples per reference position GPR02 Gaussian Regression trained only on measured data - 1m output grid, 50 samples per reference position GPR03 Gaussian Regression trained only on measured data - 0.5m output grid, 50 samples per reference position GPR04 Gaussian Regression trained on linearly interpolated data - 1m output grid, 50 samples per reference position GPR05 Gaussian Regression trained on linearly interpolated data - 0.5m output grid, 50 samples per reference position GPR06 Gaussian Regression trained selection of linearly interpolated data - 1m output grid, 50 samples per reference position GPR07 Gaussian Regression trained selection of linearly interpolated data - 0.5m output grid, 50 samples per reference position GPR08 Gaussian Regression trained only on measured data - 1m output grid, 1 sample per reference position GPR09 Gaussian Regression trained only on measured data - 0.5m output grid, 1 sample per reference position GPR10 Gaussian Regression trained on linearly interpolated data - 1m output grid, 1 sample per reference position GPR11 Gaussian Regression trained on linearly interpolated data - 0.5m output grid, 1 sample per reference position GPR12 Gaussian Regression trained selection of linearly interpolated data - 1m output grid, 1 sample per reference position GPR13 Gaussian Regression trained selection of linearly interpolated data - 0.5m output grid, 1 sample per reference position The folder contains 4 files for each scenario. The Beginning of the filename corresponds to the data name, with suffix describing what data are in the file. The descriptions of used suffixes are in the following table: File Suffix Descriptions Suffix Suffix Description _trncrd Training Labels _trnrss Training RSSI Values _tstcrd Evaluation Labels _tstrss Evaluation RSSI Values These data are in format compatible with systems that apart from X and Y coordinates also detect, building, floor etc. The RSSI data are in format: RSSI_1, RSSI_2, RSSI_3, RSSI_4, RSSI_5 The Labels are in format: (Since we only use positioning in 1 office, apart X and Y coordinates are set to 0) X, Y, 0, 0, 0
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global PCAP multi-touch industrial monitor market is experiencing robust growth, driven by increasing automation across various industries and the rising demand for intuitive human-machine interfaces (HMIs). The market, estimated at $500 million in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching approximately $1.5 billion by 2033. This expansion is fueled by several key factors. The burgeoning adoption of Industry 4.0 technologies, including smart factories and digital twins, necessitates advanced HMI solutions offering seamless interaction and data visualization. Furthermore, the growing preference for touch-enabled interfaces over traditional button-based controls is contributing to market growth. The automotive, manufacturing, and healthcare sectors are significant drivers, with their increasing reliance on sophisticated control systems and data monitoring. However, the market faces challenges such as high initial investment costs for implementing PCAP technology and concerns about durability and maintenance in harsh industrial environments. Nevertheless, advancements in ruggedized display technologies and the development of cost-effective solutions are expected to mitigate these restraints and further propel market expansion. The market is segmented based on screen size, resolution, and application. Large-format monitors are gaining popularity for their enhanced visualization capabilities. Higher resolutions are also in demand to improve data clarity and usability. Key players in this competitive landscape include STX Technology, Beckhoff Automation, Siemens, Cincoze, Winmate, Axiomtek, Teguar Computers, Advantech, AAEON, B&R Industrial Automation, Contec, ADLINK Technology, DFI, Kontron, and TRU-Vu. These companies are focusing on product innovation, strategic partnerships, and geographical expansion to maintain a competitive edge. Regional analysis indicates strong growth potential in North America and Asia-Pacific, driven by robust industrial automation adoption and technological advancements. Europe and other regions are also experiencing steady growth, although at a slightly slower pace compared to the leading regions. The forecast period (2025-2033) presents promising opportunities for market players to capitalize on the expanding demand for sophisticated and user-friendly industrial HMI solutions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}