42 datasets found

Z
Network traffic datasets created by Single Flow Time Series Analysis
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josef Koumar (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8035723
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Tomáš Čejka
Karel Hynek
Josef Koumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Network traffic datasets created by Single Flow Time Series Analysis

Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

In the following table is a description of each dataset file:

File name Detection problem Citation of original raw dataset

botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

doh_cic.csv Binary detection of DoH

Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.

tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.

vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.

vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.

vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

vpn_vnat_multiclass.csv Multi-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
m
Behaviour Biometrics Dataset
data.mendeley.com
Updated Jun 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nonso Nnamoko (2022). Behaviour Biometrics Dataset [Dataset]. http://doi.org/10.17632/fnf8b85kr6.1
Explore at:
Unique identifier
https://doi.org/10.17632/fnf8b85kr6.1
Dataset updated
Jun 20, 2022
Authors
Nonso Nnamoko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset provides a collection of behaviour biometrics data (commonly known as Keyboard, Mouse and Touchscreen (KMT) dynamics). The data was collected for use in a FinTech research project undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. The project called CyberSIgnature uses KMT dynamics data to distinguish between legitimate card owners and fraudsters. An application was developed that has a graphical user interface (GUI) similar to a standard online card payment form including fields for card type, name, card number, card verification code (cvc) and expiry date. Then, user KMT dynamics were captured while they entered fictitious card information on the GUI application.

The dataset consists of 1,760 KMT dynamic instances collected over 88 user sessions on the GUI application. Each user session involves 20 iterations of data entry in which the user is assigned a fictitious card information (drawn at random from a pool) to enter 10 times and subsequently presented with 10 additional card information, each to be entered once. The 10 additional card information is drawn from a pool that has been assigned or to be assigned to other users. A KMT data instance is collected during each data entry iteration. Thus, a total of 20 KMT data instances (i.e., 10 legitimate and 10 illegitimate) was collected during each user entry session on the GUI application.

The raw dataset is stored in .json format within 88 separate files. The root folder named behaviour_biometrics_dataset' consists of two sub-foldersraw_kmt_dataset' and `feature_kmt_dataset'; and a Jupyter notebook file (kmt_feature_classificatio.ipynb). Their folder and file content is described below:

-- raw_kmt_dataset': this folder contains 88 files, each namedraw_kmt_user_n.json', where n is a number from 0001 to 0088. Each file contains 20 instances of KMT dynamics data corresponding to a given fictitious card; and the data instances are equally split between legitimate (n = 10) and illegitimate (n = 10) classes. The legitimate class corresponds to KMT dynamics captured from the user that is assigned to the card detail; while the illegitimate class corresponds to KMT dynamics data collected from other users entering the same card detail.

-- feature_kmt_dataset': this folder contains two sub-folders, namely:feature_kmt_json' and feature_kmt_xlsx'. Each folder contains 88 files (of the relevant format: .json or .xlsx) , each namedfeature_kmt_user_n', where n is a number from 0001 to 0088. Each file contains 20 instances of features extracted from the corresponding `raw_kmt_user_n' file including the class labels (legitimate = 1 or illegitimate = 0).

-- `kmt_feature_classification.ipynb': this file contains python code necessary to generate features from the raw KMT files and apply simple machine learning classification task to generate results. The code is designed to run with minimal effort from the user.
Cybersecurity Threat and Awareness Program Dataset
kaggle.com
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2024). Cybersecurity Threat and Awareness Program Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/9665651
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9665651
Dataset updated
Oct 19, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DatasetEngineer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Title: Cybersecurity Threat Detection and Awareness Program Dataset (2018-2024)

Description: This dataset provides a comprehensive collection of cybersecurity events and network traffic data, spanning from January 2018 to March 2024, collected from real-world corporate environments in Texas, USA. The data includes a diverse range of cybersecurity incidents, covering normal activity as well as various types of threats. It was gathered from multiple sources, such as network traffic logs, system logs, and external threat intelligence feeds, making it suitable for developing machine learning models aimed at threat detection, incident response, and cybersecurity awareness improvement.

The dataset is well-suited for research and experimentation in threat intelligence, intrusion detection, cybersecurity awareness training, and anomaly detection. The included features allow for the modeling of various threat scenarios and multi-class classification tasks. The labeled data provides information on the severity and type of threats detected, supporting both supervised and unsupervised learning techniques.

Features Overview:

Date_Time: The timestamp of the event (e.g., 2022-05-01 14:30:00), indicating when the activity or incident occurred.

Source_IP: IP address of the originating device involved in the event (e.g., 192.168.1.1).

Destination_IP: IP address of the target device involved in the event (e.g., 10.0.0.5).

Source_Port: Port number on the originating device (e.g., 443).

Destination_Port: Port number on the target device (e.g., 80).

Protocol_Type: The protocol used for the communication, such as TCP, UDP, ICMP.

Flow_Duration: Duration of the network flow in milliseconds.

Packet_Size: The size of the packet in bytes.

Flow_Bytes/s: The number of bytes transmitted per second during the flow.

Flow_Packets/s: The number of packets transmitted per second during the flow.

Total_Forward_Packets: Total number of packets sent in the forward direction.

Total_Backward_Packets: Total number of packets sent in the reverse direction.

Packet_Length_Mean: Average packet length for the flow.

IAT_Forward: Inter-arrival time for packets in the forward direction.

IAT_Backward: Inter-arrival time for packets in the reverse direction.

Active_Duration: Duration of active time for the connection.

Idle_Duration: Duration of idle time for the connection.

IDS_Alert_Count: Number of intrusion detection system alerts triggered during the event.

Anomaly_Score: A score indicating the anomaly level of the event, derived from anomaly detection algorithms.

Attack_Vector: Type of attack vector used (e.g., Phishing, DDoS, Brute Force).

Attack_Severity: Severity of the detected threat, categorized as Low, Medium, High, or Critical.

Compromised_Hosts_Count: Number of hosts compromised during the event.

Botnet_Family: Family of botnet detected (if applicable), such as Mirai, Zeus.

Malware_Type: Type of malware detected, such as Ransomware, Trojan.

User_Login_Attempts: Number of login attempts during the event.

Geolocation: Geographic location of the originating IP (Country, City).

Device_Type: Type of device involved (e.g., Server, Router, Mobile).

Firewall_Logs: Binary indicator (0 or 1) showing whether firewall logs flagged the activity.

Antivirus_Alerts: Binary indicator (0 or 1) showing whether antivirus software detected a threat.

Open_Ports_Count: Number of open ports on the target device.

Reputation_Score: A score indicating the reputation of the IP/domain based on threat intelligence sources.

Blacklisted_IP: Binary indicator (0 or 1) indicating if the IP is listed on a blacklist.

Known_Vulnerability: Binary indicator (0 or 1) showing if the target system has known vulnerabilities (based on CVE).

Threat_Intelligence_Source: Source from which the threat intelligence information was gathered.

System_Patch_Status: Indicates whether the system is patched (Up-to-date, Outdated).

CPU_Utilization: CPU usage percentage during the event.

Memory_Utilization: Memory usage percentage during the event.

Employee_Training_Completion: Completion rate of cybersecurity awareness training for the employee involved.

Phishing_Simulation_Success: Result of phishing simulation attempts (Success, Failure).

Reported_Incidents: Number of cybersecurity incidents reported by the user.

Incident_Response_Time: Time taken to respond to the incident in minutes.

Label (Target Variable):

Threat_Severity: The severity level of the threat, categorized as: 0: No Threat 1: Low-Level Threat 2: Medium-Level Threat 3: High-Level Threat 4: Critical Threat Usage: This dataset is ideal for training and testing machine learning models for tasks such as:

Multi-class classification for threat detection. Anomaly detection. Predictive modeling for incident response prioritization. Cybersecurity awareness program improvement. Researchers and...
u
Data from: The VNF cybersecurity dataset for research (VNFCYBERDATA)
drum.um.edu.mt
bin
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BELIEVE SEGUN AYODELE; VICTOR BUTTIGIEG (2024). The VNF cybersecurity dataset for research (VNFCYBERDATA) [Dataset]. http://doi.org/10.60809/drum.24998543.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.60809/drum.24998543.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Malta
Authors
BELIEVE SEGUN AYODELE; VICTOR BUTTIGIEG
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Virtualisation has received widespread adoption and deployment across a wide range of enterprises and industries throughout the years. Network Function Virtualisation (NFV) is a technical concept that presents a method for dynamically delivering virtualised network functions as virtualised or software components. Virtualised Network Function (VNF) has distinct advantages, but it also faces serious security challenges. Cyberattacks such as Denial of Service (DoS), malware/rootkit injection, port scanning, and so on can target VNF appliances just like any other network infrastructure. To create exceptional training exercises for machine or deep learning (ML/DL) models to combat cyberattacks in VNF, a suitable dataset (VNFCYBERDATA) exhibiting an actual reflection, or one that is reasonably close to an actual reflection, of the problem that the ML/DL model could address is required. This article describes a real VNF dataset that contains over seven million data points and twenty-five cyberattacks generated from five VNF appliances. To facilitate a realistic examination of VNF traffic, the dataset includes both benign and malicious traffic.CitationIf you are using this dataset for your research, please reference it as"Ayodele, B.; Buttigieg, V. The VNF Cybersecurity Dataset for Research (VNFCYBERDATA). Data 2024, 9, 132. https://doi.org/10.3390/data9110132"DocumentationDataset documentation is available at: https://www.mdpi.com/2306-5729/9/11/132
CTU-SME-11: a labeled dataset with real benign and malicious network traffic...
zenodo.org
data.niaid.nih.gov
bin, bz2, csv, html
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Štěpán Bendl; Štěpán Bendl; Veronica Valeros; Veronica Valeros; Sebastian Garcia; Sebastian Garcia (2023). CTU-SME-11: a labeled dataset with real benign and malicious network traffic mimicking a small medium-size enterprise environment [Dataset]. http://doi.org/10.5281/zenodo.7958259
Explore at:
csv, html, bz2, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7958259
Dataset updated
May 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Štěpán Bendl; Štěpán Bendl; Veronica Valeros; Veronica Valeros; Sebastian Garcia; Sebastian Garcia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As technology advances, the number and complexity of cyber-attacks increase, forcing defense techniques to be updated and improved. To help develop effective tools for detecting security threats it is essential to have reliable and representative security datasets. Many existing security datasets have limitations that make them unsuitable for research, including lack of labels, unbalanced traffic, and outdated threats.

CTU-SME-11 is a labeled network dataset designed to address the limitations of previous datasets. The dataset was captured in a real network that mimics a small-medium enterprise setting. Raw network traffic (packets) was captured from 11 devices using tcpdump for a duration of 7 days, from 20th to 26th of February, 2023 in Prague, Czech Republic. The devices were chosen based on the enterprise setting and consists of IoT, desktop and mobile devices, both bare metal and virtualized. The devices were infected with malware or exposed to Internet attacks, and factory reset to restore benign behavior.

The raw data was processed to generate network flows (Zeek logs) which were analyzed and labeled. The dataset contains two types of levels, a high level label and a descriptive label, which were put by experts. The former can take three values, benign, malicious or background. The latter contains detailed information about the specific behavior observed in the network flows. The dataset contains 99 million labeled network flows. The overall compressed size of the dataset is 80GB and the uncompressed size is 170GB.
Real-Time DDoS Traffic Dataset for ML
kaggle.com
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kalireadhat (2024). Real-Time DDoS Traffic Dataset for ML [Dataset]. https://www.kaggle.com/datasets/kalireadhat/realtime-ddos-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
kalireadhat
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Real-Time DDoS Traffic Dataset for ML is designed to support the development, testing, and validation of machine learning models focused on detecting Distributed Denial of Service (DDoS) attacks in real-time. As cybersecurity threats evolve, particularly in the realm of network traffic anomalies like DDoS, having access to labeled data that mirrors real-world attack scenarios is essential. This dataset aims to bridge this gap by providing comprehensive, structured network traffic data that includes both normal and DDoS attack instances, facilitating machine learning research and experimentation in DDoS detection and prevention.

The dataset is compiled from network traffic that either replicates real-time conditions or is simulated under carefully controlled network configurations to generate authentic DDoS attack traffic. This data encompasses variations in packet transmission and byte flow, which are key indicators in distinguishing between typical network behavior and DDoS attack patterns. The primary motivation behind this dataset is to aid machine learning practitioners and cybersecurity experts in training models that can effectively differentiate between benign and malicious traffic, even under high-stress network conditions.

Data Source and Collection: Include information on how the data was collected, whether it was simulated or recorded from real systems, and any specific tools or configurations used.

Dataset Structure: List and explain the features or columns in the dataset. For instance, you might describe columns such as:

traffic_type: Indicates whether the traffic is normal or DDoS.

packet_count: The number of packets in a session.

packet_count_per_second: The rate of packets over time.

byte_count: Total data in bytes for a session.

byte_count_per_second: The data transfer rate over time.

This dataset is ideal for a range of applications in cybersecurity and machine learning:

1.Training DDoS Detection Models: The dataset is specifically structured for use in supervised learning models that aim to identify DDoS attacks in real time. Researchers and developers can train and test models using the labeled data provided.

2.Real-Time Anomaly Detection: Beyond DDoS detection, the dataset can serve as a foundation for models focused on broader anomaly detection tasks in network traffic monitoring.

3.Benchmarking and Comparative Studies: By providing data for both normal and attack traffic, this dataset is suitable for benchmarking various algorithms, allowing comparisons across different detection methods and approaches.

4.Cybersecurity Education: The dataset can also be used in educational contexts, allowing students and professionals to gain hands-on experience with real-world data, fostering deeper understanding of network anomalies and cybersecurity threats.

Limitations and Considerations While the dataset provides realistic DDoS patterns, it is essential to note a few limitations:

Data Origin: The dataset may contain simulated attack patterns, which could differ from real-world DDoS attack traffic in more complex network environments.

Sampling Bias: Certain features or types of attacks may be overrepresented due to the specific network setup used during data collection. Users should consider this when generalizing their models to other environments.

Ethical Considerations: This dataset is intended for educational and research purposes only and should be used responsibly to enhance network security.

Acknowledgments This dataset is an open-source contribution to the cybersecurity and machine learning communities, and it is designed to empower researchers, educators, and industry professionals in developing stronger defenses against DDoS attacks.
Network Digital Twin-Generated Dataset for Machine Learning-based Detection...
zenodo.org
zip
Updated Jun 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amit Karamchandani Batra; Amit Karamchandani Batra; Javier Nuñez Fuente; Luis de la Cal García; Luis de la Cal García; Yenny Moreno Meneses; Alberto Mozo Velasco; Alberto Mozo Velasco; Antonio Pastor Perales; Antonio Pastor Perales; Diego R. López; Diego R. López; Javier Nuñez Fuente; Yenny Moreno Meneses (2025). Network Digital Twin-Generated Dataset for Machine Learning-based Detection of Benign and Malicious Heavy Hitter Flows [Dataset]. http://doi.org/10.5281/zenodo.14841650
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14841650
Dataset updated
Jun 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Amit Karamchandani Batra; Amit Karamchandani Batra; Javier Nuñez Fuente; Luis de la Cal García; Luis de la Cal García; Yenny Moreno Meneses; Alberto Mozo Velasco; Alberto Mozo Velasco; Antonio Pastor Perales; Antonio Pastor Perales; Diego R. López; Diego R. López; Javier Nuñez Fuente; Yenny Moreno Meneses
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 11, 2024
Description
Overview

This record provides a dataset created as part of the study presented in the following publication and is made publicly available for research purposes. The associated article provides a comprehensive description of the dataset, its structure, and the methodology used in its creation. If you use this dataset, please cite the following article published in the journal IEEE Communications Magazine:

A. Karamchandani, J. Nunez, L. de-la-Cal, Y. Moreno, A. Mozo, and A. Pastor, “On the Applicability of Network Digital Twins in Generating Synthetic Data for Heavy Hitter Discrimination,” IEEE Communications Magazine, pp. 2–8, 2025, DOI: 10.1109/MCOM.003.2400648.

More specifically, the record contains several synthetic datasets generated to differentiate between benign and malicious heavy hitter flows within a realistic virtualized network environment. Heavy Hitter flows, which include high-volume data transfers, can significantly impact network performance, leading to congestion and degraded quality of service. Distinguishing legitimate heavy hitter activity from malicious Distributed Denial-of-Service traffic is critical for network management and security, yet existing datasets lack the granularity needed for training machine learning models to effectively make this distinction.

To address this, a Network Digital Twin (NDT) approach was utilized to emulate realistic network conditions and traffic patterns, enabling automated generation of labeled data for both benign and malicious HH flows alongside regular traffic.

Feature Set:

The feature set includes the following flow statistics commonly used in the literature on network traffic classification:

The protocol used for the connection, identifying whether it is TCP, UDP, ICMP, or OSPF.

The time (relative to the connection start) of the most recent packet sent from source to destination at the time of each snapshot.

The time (relative to the connection start) of the most recent packet sent from destination to source at the time of each snapshot.

The cumulative count of data packets sent from source to destination at the time of each snapshot.

The cumulative count of data packets sent from destination to source at the time of each snapshot.

The cumulative bytes sent from source to destination at the time of each snapshot.

The cumulative bytes sent from destination to source at the time of each snapshot.

The time difference between the first packet sent from source to destination and the first packet sent from destination to source.

Dataset Variations:

To accommodate diverse research needs and scenarios, the dataset is provided in the following variations:

All at Once:

Contains a synthetic dataset where all traffic types, including benign, normal, and malicious DDoS heavy hitter (HH) flows, are combined into a single dataset.

This version represents a holistic view of the traffic environment, simulating real-world scenarios where all traffic occurs simultaneously.

Balanced Traffic Generation:

Represents a balanced traffic dataset with an equal proportion of benign, normal, and malicious DDoS traffic.

Designed for scenarios where a balanced dataset is needed for fair training and evaluation of machine learning models.

DDoS at Intervals:

Contains traffic data where malicious DDoS HH traffic occurs at specific time intervals, mimicking real-world attack patterns.

Useful for studying the impact and detection of intermittent malicious activities.

Only Benign HH Traffic:

Includes only benign HH traffic flows.

Suitable for training and evaluating models to identify and differentiate benign heavy hitter traffic patterns.

Only DDoS Traffic:

Contains only malicious DDoS HH traffic.

Helps in isolating and analyzing attack characteristics for targeted threat detection.

Only Normal Traffic:

Comprises only regular, non-HH traffic flows.

Useful for understanding baseline network behavior in the absence of heavy hitters.

Unbalanced Traffic Generation:

Features an unbalanced dataset with varying proportions of benign, normal, and malicious traffic.

Simulates real-world scenarios where certain types of traffic dominate, providing insights into model performance in unbalanced conditions.

For each variation, the output of the different packet aggregators is provided separated in its respective folder.

Each variation was generated using the NDT approach to demonstrate its flexibility and ensure the reproducibility of our study's experiments, while also contributing to future research on network traffic patterns and the detection and classification of heavy hitter traffic flows. The dataset is designed to support research in network security, machine learning model development, and applications of digital twin technology.
m
SDFVD2.0: Extension of Small Scale Deep Fake Video Dataset
data.mendeley.com
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shilpa Kaman (2025). SDFVD2.0: Extension of Small Scale Deep Fake Video Dataset [Dataset]. http://doi.org/10.17632/zzb7jyy8w8.1
Explore at:
Unique identifier
https://doi.org/10.17632/zzb7jyy8w8.1
Dataset updated
Jan 27, 2025
Authors
Shilpa Kaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SDFVD 2.0 is an augmented extension of the original SDFVD dataset, which originally contained 53 real and 53 fake videos. This new version has been created to enhance the diversity and robustness of the dataset by applying various augmentation techniques like horizontal flip, rotation, shear, brightness and contrast adjustment, additive gaussian noise, downscaling and upscaling to the original videos. These augmentations help simulate a wider range of conditions and variations, making the dataset more suitable for training and evaluating deep learning models for deepfake detection. This process has significantly expanded the dataset resulting in 461 real and 461 forged videos, providing a richer and more varied collection of video data for deepfake detection research and development. Dataset Structure The dataset is organized into two main directories: real and fake, each containing the original and augmented videos. Each augmented video file is named following the pattern: ‘
Network Intrusion Detection
kaggle.com
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Şahide ŞEKER (2025). Network Intrusion Detection [Dataset]. https://www.kaggle.com/datasets/sahideseker/network-intrusion-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 3, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Şahide ŞEKER
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🇺🇸 English:

This dataset simulates network traffic to help build intrusion detection models. It includes source/destination IPs, protocols, connection durations, and labels for different types of attacks.

Use this dataset to:

Train anomaly detection or classification models

Experiment with imbalanced cybersecurity data

Build intrusion detection systems with ML algorithms like XGBoost or Isolation Forest

Features:

src_ip: Source IP address

dst_ip: Destination IP address

protocol: Network protocol (TCP, UDP, ICMP)

duration: Duration of the connection

attack: Attack type label (e.g., normal, dos, probe, etc.)

🇹🇷 Türkçe:

Bu veri seti, siber güvenlik alanında ağ trafiği üzerinden saldırı tespiti yapılmasını sağlamak için oluşturulmuştur. Kaynak/varış IP'leri, protokol, bağlantı süresi ve saldırı türü etiketlerini içerir.

Bu veri seti ile:

Dengesiz veri üzerinde anomali tespiti yapabilirsiniz

Saldırı sınıflandırma algoritmaları geliştirebilirsiniz

XGBoost ve Isolation Forest gibi algoritmaları test edebilirsiniz

Özellikler:

src_ip: Kaynak IP adresi

dst_ip: Hedef IP adresi

protocol: Ağ protokolü (TCP, UDP, ICMP)

duration: Bağlantı süresi

attack: Saldırı türü etiketi (örneğin normal, dos, probe vs.)
UNB CIC IOT Dataset 2023 (Updated 2024-10-08)
kaggle.com
zip
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Abdul Al Emon (2025). UNB CIC IOT Dataset 2023 (Updated 2024-10-08) [Dataset]. https://www.kaggle.com/datasets/mdabdulalemo/cic-iot-dataset2023-updated-2024-10-08
Explore at:
zip(3264262523 bytes)Available download formats
Dataset updated
May 24, 2025
Authors
Md. Abdul Al Emon
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The CIC IoT Dataset 2023 is a comprehensive benchmark developed by the Canadian Institute for Cybersecurity (CIC) to advance intrusion detection research in real-world Internet of Things (IoT) environments. This dataset was created using a network of 105 actual IoT devices, encompassing smart home gadgets, sensors, and cameras, to simulate authentic IoT traffic and attack scenarios.

Key Features:

Diverse Attack Scenarios: The dataset includes 33 distinct attacks categorized into seven classes: DDoS, DoS, Reconnaissance, Web-based, Brute Force, Spoofing, and Mirai. These attacks were executed by compromised IoT devices targeting other IoT devices, reflecting realistic threat vectors.(University of New Brunswick)

Extensive Data Collection: Network traffic was captured in real-time, resulting in over 46 million records. The data is available in various formats, including raw PCAP files and pre-extracted CSV features, facilitating different research needs.

Realistic IoT Topology: Unlike many datasets that rely on simulations, this dataset was generated using a large-scale IoT testbed with devices from multiple vendors, providing a heterogeneous and realistic network environment.

Benchmarking and Evaluation: The dataset has been utilized to evaluate the performance of machine learning and deep learning algorithms in classifying and detecting malicious versus benign IoT network traffic.(University of New Brunswick)

This dataset serves as a valuable resource for researchers and practitioners aiming to develop and test security analytics applications, intrusion detection systems, and other cybersecurity solutions tailored for IoT ecosystems.(University of New Brunswick)
UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data
zenodo.org
csv
Updated Oct 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian (2022). UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data [Dataset]. http://doi.org/10.5281/zenodo.7258579
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7258579
Dataset updated
Oct 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian; Yasir Ali Farrukh Farrukh; Irfan Khan; Syed Wali; David Bierbrauer; John A Pavlik; Nathaniel D. Bastian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: *Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live*. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:

https://github.com/Yasir-ali-farrukh/Payload-Byte

You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:

```yaml
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
f
Summary of LITNET-2020 dataset.
plos.figshare.com
bin
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber (2023). Summary of LITNET-2020 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0284795.t006
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0284795.t006
Dataset updated
Aug 1, 2023
Dataset provided by
PLOS ONE
Authors
Asmaa Ahmed Awad; Ahmed Fouad Ali; Tarek Gaber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the years, intrusion detection system has played a crucial role in network security by discovering attacks from network traffics and generating an alarm signal to be sent to the security team. Machine learning methods, e.g., Support Vector Machine, K Nearest Neighbour, have been used in building intrusion detection systems but such systems still suffer from low accuracy and high false alarm rate. Deep learning models (e.g., Long Short-Term Memory, LSTM) have been employed in designing intrusion detection systems to address this issue. However, LSTM needs a high number of iterations to achieve high performance. In this paper, a novel, and improved version of the Long Short-Term Memory (ILSTM) algorithm was proposed. The ILSTM is based on the novel integration of the chaotic butterfly optimization algorithm (CBOA) and particle swarm optimization (PSO) to improve the accuracy of the LSTM algorithm. The ILSTM was then used to build an efficient intrusion detection system for binary and multi-class classification cases. The proposed algorithm has two phases: phase one involves training a conventional LSTM network to get initial weights, and phase two involves using the hybrid swarm algorithms, CBOA and PSO, to optimize the weights of LSTM to improve the accuracy. The performance of ILSTM and the intrusion detection system were evaluated using two public datasets (NSL-KDD dataset and LITNET-2020) under nine performance metrics. The results showed that the proposed ILSTM algorithm outperformed the original LSTM and other related deep-learning algorithms regarding accuracy and precision. The ILSTM achieved an accuracy of 93.09% and a precision of 96.86% while LSTM gave an accuracy of 82.74% and a precision of 76.49%. Also, the ILSTM performed better than LSTM in both datasets. In addition, the statistical analysis showed that ILSTM is more statistically significant than LSTM. Further, the proposed ISTLM gave better results of multiclassification of intrusion types such as DoS, Prob, and U2R attacks.
Bot_IoT
kaggle.com
Updated Mar 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vignesh Venkateswaran (2023). Bot_IoT [Dataset]. https://www.kaggle.com/datasets/vigneshvenkateswaran/bot-iot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vignesh Venkateswaran
Description
INFO ABOUT THE BOT-IOT DATASET, NOTE: only the csv files stated in the description are used

The BoT-IoT dataset can be downloaded from HERE. You can also use our new datasets: the TON_IoT and UNSW-NB15.

--------------------------------------------------------------------------

The BoT-IoT dataset was created by designing a realistic network environment in the Cyber Range Lab of UNSW Canberra. The network environment incorporated a combination of normal and botnet traffic. The dataset’s source files are provided in different formats, including the original pcap files, the generated argus files and csv files. The files were separated, based on attack category and subcategory, to better assist in labeling process.

The captured pcap files are 69.3 GB in size, with more than 72.000.000 records. The extracted flow traffic, in csv format is 16.7 GB in size. The dataset includes DDoS, DoS, OS and Service Scan, Keylogging and Data exfiltration attacks, with the DDoS and DoS attacks further organized, based on the protocol used.

To ease the handling of the dataset, we extracted 5% of the original dataset via the use of select MySQL queries. The extracted 5%, is comprised of 4 files of approximately 1.07 GB total size, and about 3 million records.

--------------------------------------------------------------------------

Free use of the Bot-IoT dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes should be agreed by the authors. The authors have asserted their rights under the Copyright. To whom intent the use of the Bot-IoT dataset, the authors have to cite the following papers that has the dataset’s details: .

Koroniotis, Nickolaos, Nour Moustafa, Elena Sitnikova, and Benjamin Turnbull. "Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset." Future Generation Computer Systems 100 (2019): 779-796. Public Access Here.

Koroniotis, Nickolaos, Nour Moustafa, Elena Sitnikova, and Jill Slay. "Towards developing network forensic mechanism for botnet activities in the iot based on machine learning techniques." In International Conference on Mobile Networks and Management, pp. 30-44. Springer, Cham, 2017.

Koroniotis, Nickolaos, Nour Moustafa, and Elena Sitnikova. "A new network forensic framework based on deep learning for Internet of Things networks: A particle deep framework." Future Generation Computer Systems 110 (2020): 91-106.

Koroniotis, Nickolaos, and Nour Moustafa. "Enhancing network forensics with particle swarm and deep learning: The particle deep framework." arXiv preprint arXiv:2005.00722 (2020).

Koroniotis, Nickolaos, Nour Moustafa, Francesco Schiliro, Praveen Gauravaram, and Helge Janicke. "A Holistic Review of Cybersecurity and Reliability Perspectives in Smart Airports." IEEE Access (2020).

Koroniotis, Nickolaos. "Designing an effective network forensic framework for the investigation of botnets in the Internet of Things." PhD diss., The University of New South Wales Australia, 2020.

--------------------------------------------------------------------------

Data from: SQL Injection Attack Netflow

zenodo.org
portalcienciaytecnologia.jcyl.es
+1more

Updated Sep 28, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. http://doi.org/10.5281/zenodo.6907252

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.6907252

Dataset updated

Sep 28, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

Dataset	Aim	Samples	Benign-malicious traffic ratio
D1	Training	400,003	50%
D2	Test	57,239	50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

Parameters	Description
'--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'	Enumerate users, password hashes, privileges, roles, databases, tables and columns
--level=5	Increase the probability of a false positive identification
--risk=3	Increase the probability of extracting data
--random-agent	Select the User-Agent randomly
--batch	Never ask for user input, use the default behavior
--answers="follow=Y"	Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12.
Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

C
Cloud Network Security Market Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Cloud Network Security Market Report [Dataset]. https://www.marketreportanalytics.com/reports/cloud-network-security-market-87773
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Apr 30, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The cloud network security market is experiencing robust growth, fueled by the increasing adoption of cloud computing and the expanding attack surface associated with it. A compound annual growth rate (CAGR) of 18.10% from 2019 to 2024 suggests a significant market expansion. This growth is driven by several key factors. The heightened need for robust identity and access management (IAM) solutions to protect sensitive data in the cloud is a major catalyst. Data loss prevention (DLP) is also gaining traction as organizations grapple with increasingly sophisticated cyber threats. Furthermore, the demand for comprehensive security information and event management (SIEM) systems is surging as organizations strive to gain better visibility into their cloud environments and respond effectively to security incidents. The market is segmented across various security types, including application, database, network, web, and email security, reflecting the diverse security needs within the cloud ecosystem. Large enterprises are currently the largest consumers of cloud network security solutions, but the small and medium-sized enterprise (SME) segment is showing significant growth potential, driven by increased cloud adoption among SMEs and government initiatives promoting cloud adoption across all sectors. Key regional markets include North America, Europe, and Asia Pacific, with North America currently holding a leading position. However, rapid growth is expected in the Asia-Pacific region due to its increasing digitalization and technological advancement. Leading players like Amazon Web Services, Microsoft, Cisco, and Palo Alto Networks are actively shaping the market through innovation and strategic acquisitions. The future trajectory of the cloud network security market is projected to remain positive, further propelled by the ongoing digital transformation across industries. The increasing reliance on cloud-based applications and services, coupled with the growing sophistication of cyberattacks, will necessitate robust security measures. The emergence of new technologies, such as artificial intelligence (AI) and machine learning (ML), in cybersecurity is expected to further enhance the effectiveness of cloud security solutions. However, challenges remain, including the complexity of managing cloud security across diverse environments and the shortage of skilled cybersecurity professionals. Regulatory compliance requirements and data privacy concerns will also continue to influence the market's development. The ongoing expansion of the Internet of Things (IoT) and its integration with cloud platforms will generate additional security needs, further fueling market expansion in the coming years. We project the market to continue its strong growth trajectory, driven by the factors mentioned above, leading to substantial market expansion by 2033. Recent developments include: December 2022: The Pentagon signed cloud computing contracts worth $9 billion with Alphabet Inc.'s Google, Amazon Web Services Inc., Microsoft Corp., and Oracle Corp. IBM was kept off this contract list. These contracts will build a bridge between the Defense Department and private sector companies. The Department of Defense will have access to enterprise-wide, globally available cloud services across all security domains and classification levels., November 2022: IBM announced the release of a technical preview of its Incident Management Software as a Service (SaaS) offering. According to IBM, this new SaaS service is designed to help organizations understand the status of their application and infrastructure resources and provide the ability to address some of the core challenges faced by central IT Operations during incident triage., October 2022: Intel and Google Cloud launched a new chip to improve data center performance. This co-designed chip can make data centers more secure and efficient. Companies are using progressively bigger data sets, but the performance improvement of chips like CPUs is slowing down. To make the data center itself more productive, cloud companies are therefore looking for innovative ways for data storage and security.. Key drivers for this market are: Rapid adaptation of cloud based services among organisations, Increased Cyber Attacks; Rising trend of BYOD and CYOD to boost cloud security demand. Potential restraints include: Rapid adaptation of cloud based services among organisations, Increased Cyber Attacks; Rising trend of BYOD and CYOD to boost cloud security demand. Notable trends are: Application-based Classification and Products to have Significant Demand for Cloud Network Security.
d
Dataset of legitimate IoT data
data.gouv.fr
csv
Updated Dec 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Télécom SudParis (2022). Dataset of legitimate IoT data [Dataset]. https://www.data.gouv.fr/es/datasets/dataset-of-legitimate-iot-data/
Explore at:
csv(20451059), csv(20957206), csv(20214687), csv(20775395), csv(21768881), csv(20485613), csv(20620148), csv(20567664), csv(20178551), csv(20997417), csv(20106659), csv(20227271), csv(19670453), csv(20263943), csv(20490473), csv(20585580), csv(20366938), csv(21433048), csv(21673237), csv(20584090)Available download formats
Dataset updated
Dec 9, 2022
Dataset authored and provided by
Télécom SudParis
License
http://www.opendefinition.org/licenses/cc-by-sahttp://www.opendefinition.org/licenses/cc-by-sa
Description
This dataset presents the IoT network traffic generated by connected objects. In order to understand and characterise the legitimate behaviour of network traffic, a platform is created to generate IoT traffic under realistic conditions. This platform contains different IoT devices: voice assistants, smart cameras, connected printers, connected light bulbs, motion sensors, etc. Then, a set of interactions with these objects is performed to allow the generation of real traffic. This data is used to identify anomalies and intrusions using machine learning algorithms and to improve existing detection models. Our dataset is available in two formats: pcap and csv and was created as part of the EU CEF VARIoT project https://variot.eu. To download the data in pcap format and for more information, our database is available on this web portal : https://www.variot.telecom-sudparis.eu/.
f
Description of the NSL-KDD dataset attack categories.
plos.figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Description of the NSL-KDD dataset attack categories. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t002
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of the NSL-KDD dataset attack categories.
Synthetic Cybersecurity Logs for Anomaly Detection
kaggle.com
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fcWebDev (2024). Synthetic Cybersecurity Logs for Anomaly Detection [Dataset]. http://doi.org/10.34740/kaggle/dsv/10211131
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10211131
Dataset updated
Dec 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
fcWebDev
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains synthetic HTTP log data designed for cybersecurity analysis, particularly for anomaly detection tasks.

Dataset Features Timestamp: Simulated time for each log entry. IP_Address: Randomized IP addresses to simulate network traffic. Request_Type: Common HTTP methods (GET, POST, PUT, DELETE). Status_Code: HTTP response status codes (e.g., 200, 404, 403, 500). Anomaly_Flag: Binary flag indicating anomalies (1 = anomaly, 0 = normal). User_Agent: Simulated user agents for device and browser identification. Session_ID: Random session IDs to simulate user activity. Location: Geographic locations of requests. Applications This dataset can be used for:

Anomaly Detection: Identify suspicious network activity or attacks. Machine Learning: Train models for classification tasks (e.g., detect anomalies). Cybersecurity Analysis: Analyze HTTP traffic patterns and identify threats. Example Challenge Build a machine learning model to predict the Anomaly_Flag based on the features provided.
Ai Based Fraud Detection Tools Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Ai Based Fraud Detection Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/ai-based-fraud-detection-tools-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
AI-Based Fraud Detection Tools Market Outlook

The global AI-based fraud detection tools market size was valued at approximately USD 6.5 billion in 2023 and is projected to reach USD 22.8 billion by 2032, growing at a robust CAGR of 15.1% during the forecast period. The significant growth factors driving this market include the increasing sophistication of fraudulent activities, the growing adoption of AI and machine learning technologies in various sectors, and the heightened demand for real-time fraud detection solutions.

One of the primary growth factors for the AI-based fraud detection tools market is the rising complexity of fraudulent activities. In today's digital age, fraudsters are employing increasingly sophisticated techniques to breach security systems, making traditional detection methods inadequate. AI-based solutions, which leverage advanced algorithms and machine learning, are capable of analyzing large volumes of data to identify patterns and anomalies indicative of fraud. This capability is crucial for organizations seeking to protect their assets and maintain customer trust in an environment where cyber threats are continually evolving.

Another significant growth driver is the widespread adoption of AI and machine learning technologies across various industries. Businesses are recognizing the potential of these technologies to enhance their fraud detection capabilities, leading to increased investments in AI-driven solutions. The banking and financial services sector, in particular, has been at the forefront of adopting AI-based fraud detection tools to combat financial crimes such as identity theft, credit card fraud, and money laundering. Furthermore, the retail and e-commerce sectors are increasingly implementing these tools to safeguard against fraudulent transactions and account takeovers.

The growing demand for real-time fraud detection solutions is also propelling the market forward. Traditional fraud detection systems often rely on rule-based approaches that can be slow and reactive, allowing fraudulent activities to go undetected until significant damage has been done. In contrast, AI-based solutions can process and analyze data in real-time, enabling organizations to identify and respond to threats rapidly. This real-time capability is essential for minimizing losses and mitigating risks, particularly in sectors where the speed of transactions is critical, such as online retail and financial services.

Regionally, North America currently dominates the AI-based fraud detection tools market, owing to the high adoption rate of advanced technologies and the presence of major industry players. However, other regions like Asia Pacific and Europe are also experiencing significant growth. Asia Pacific, in particular, is expected to exhibit the highest CAGR during the forecast period, driven by the increasing digitization of economies, rising internet penetration, and the growing awareness of cybersecurity threats. Europe is also witnessing substantial growth due to stringent regulatory requirements and the increasing focus on data privacy and security.

Component Analysis

The AI-based fraud detection tools market can be segmented by component into software, hardware, and services. The software segment is expected to hold the largest market share during the forecast period. This dominance can be attributed to the continuous advancements in AI algorithms and machine learning models, which enhance the accuracy and efficiency of fraud detection systems. Furthermore, the software solutions are designed to be scalable and easily integrated into existing systems, making them an attractive option for organizations of all sizes.

Hardware components, though not as dominant as software, play a crucial role in the deployment of AI-based fraud detection systems. High-performance computing hardware, including GPUs and specialized AI processors, are essential for handling the large datasets and complex computations required for real-time fraud detection. As the demand for more powerful and efficient hardware grows, this segment is expected to see steady growth, particularly in large enterprises that require robust infrastructure to support their AI initiatives.

The services segment, encompassing consulting, integration, and maintenance services, is also poised for significant growth. Organizations often lack the in-house expertise required to develop and implement AI-based fraud detection systems, leading to an increased reliance on external service providers. These services help organizations to customize and opti
HTTPS Brute-force dataset with extended network flows
zenodo.org
explore.openaire.eu
+1more
zip
Updated Apr 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Luxemburk; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Jan Luxemburk (2022). HTTPS Brute-force dataset with extended network flows [Dataset]. http://doi.org/10.5281/zenodo.4275775
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4275775
Dataset updated
Apr 11, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jan Luxemburk; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Jan Luxemburk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We are publishing a dataset we created for designing a brute-force detector of attacks in HTTPS. The dataset consists of extended network flows that we captured with flow exporter Ipifixprobe. Apart from traditional fields like source and destination IP addresses and ports, each flow contains information (size, direction, inter-packet time, TCP flags) about up to the first 100 packets. The sizes of packets are taken from the transport layer (TCP, UPD); packets with zero payload (e.g., TCP ACKs) are ignored.

We publish three files:

flows.csv, which contains raw flow data.

aggregated_flows.csv, which contains aggregated flows

samples.csv, which contains samples with extracted features. This data can be used for training a machine-learning classification model.

All IP addresses, source ports, TLS SNIs are sha256-hashed. Column CLASS is 0 for benign samples and 1 for brute-force samples.

Brute-force data
The brute-force data were generated with three popular attack tools - Ncrack, Thc-hydra, and Patator. Attacks were performed against these applications:

WordPress

Joomla

MediaWiki

Ghost

Grafana

Discourse

PhpBB

OpenCart

Redmine

Nginx

Apache

The SCENARIO columns indicate which tool and application were used to generate the sample.

Benign data
Bening data consists of eight captures from a backbone network. The SCENARIO column indicates individual captures.

Facebook

Twitter

Click to copy link

Link copied

Cite

Josef Koumar (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8035723

Network traffic datasets created by Single Flow Time Series Analysis

Explore at:

Dataset updated

Jul 11, 2024

Dataset provided by

Tomáš Čejka
Karel Hynek
Josef Koumar

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Network traffic datasets created by Single Flow Time Series Analysis

Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

In the following table is a description of each dataset file:

File name Detection problem Citation of original raw dataset

botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

doh_cic.csv Binary detection of DoH

Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.

tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.

vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.

vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.

vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

vpn_vnat_multiclass.csv Multi-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

Clear search

Close search

Google apps

Main menu

Network traffic datasets created by Single Flow Time Series Analysis

Behaviour Biometrics Dataset

Cybersecurity Threat and Awareness Program Dataset

Data from: The VNF cybersecurity dataset for research (VNFCYBERDATA)

CTU-SME-11: a labeled dataset with real benign and malicious network traffic...

Real-Time DDoS Traffic Dataset for ML

Network Digital Twin-Generated Dataset for Machine Learning-based Detection...

Overview

Feature Set:

Dataset Variations:

SDFVD2.0: Extension of Small Scale Deep Fake Video Dataset

Network Intrusion Detection

UNB CIC IOT Dataset 2023 (Updated 2024-10-08)

UNSW-NB15 and CIC-IDS2017 Labelled PCAP Data

Summary of LITNET-2020 dataset.

Bot_IoT

INFO ABOUT THE BOT-IOT DATASET, NOTE: only the csv files stated in the description are used

The BoT-IoT dataset can be downloaded from HERE. You can also use our new datasets: the TON_IoT and UNSW-NB15.

--------------------------------------------------------------------------

To ease the handling of the dataset, we extracted 5% of the original dataset via the use of select MySQL queries. The extracted 5%, is comprised of 4 files of approximately 1.07 GB total size, and about 3 million records.

--------------------------------------------------------------------------

Koroniotis, Nickolaos, Nour Moustafa, Elena Sitnikova, and Benjamin Turnbull. "Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset." Future Generation Computer Systems 100 (2019): 779-796. Public Access Here.

Koroniotis, Nickolaos, Nour Moustafa, Elena Sitnikova, and Jill Slay. "Towards developing network forensic mechanism for botnet activities in the iot based on machine learning techniques." In International Conference on Mobile Networks and Management, pp. 30-44. Springer, Cham, 2017.

Koroniotis, Nickolaos, Nour Moustafa, and Elena Sitnikova. "A new network forensic framework based on deep learning for Internet of Things networks: A particle deep framework." Future Generation Computer Systems 110 (2020): 91-106.

Koroniotis, Nickolaos, and Nour Moustafa. "Enhancing network forensics with particle swarm and deep learning: The particle deep framework." arXiv preprint arXiv:2005.00722 (2020).

Koroniotis, Nickolaos, Nour Moustafa, Francesco Schiliro, Praveen Gauravaram, and Helge Janicke. "A Holistic Review of Cybersecurity and Reliability Perspectives in Smart Airports." IEEE Access (2020).

Koroniotis, Nickolaos. "Designing an effective network forensic framework for the investigation of botnets in the Internet of Things." PhD diss., The University of New South Wales Australia, 2020.

--------------------------------------------------------------------------

Data from: SQL Injection Attack Netflow

Cloud Network Security Market Report

Dataset of legitimate IoT data

Description of the NSL-KDD dataset attack categories.

Synthetic Cybersecurity Logs for Anomaly Detection

Ai Based Fraud Detection Tools Market Report | Global Forecast From 2025 To...

AI-Based Fraud Detection Tools Market Outlook

Component Analysis

HTTPS Brute-force dataset with extended network flows

Network traffic datasets created by Single Flow Time Series Analysis