Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries. It is designed for threat intelligence analysis, cybersecurity trend forecasting, and machine learning model development to enhance global digital security.
| Column Name | Description |
|---|---|
| Country | Country where the attack occurred |
| Year | Year of the incident |
| Threat Type | Type of cybersecurity threat (e.g., Malware, DDoS) |
| Attack Vector | Method of attack (e.g., Phishing, SQL Injection) |
| Affected Industry | Industry targeted (e.g., Finance, Healthcare) |
| Data Breached (GB) | Volume of data compromised |
| Financial Impact ($M) | Estimated financial loss in millions |
| Severity Level | Low, Medium, High, Critical |
| Response Time (Hours) | Time taken to mitigate the attack |
| Mitigation Strategy | Countermeasures taken |
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Cybersecurity Intrusion Detection Dataset is designed for detecting cyber intrusions based on network traffic and user behavior. Below, I’ll explain each aspect in detail, including the dataset structure, feature importance, possible analysis approaches, and how it can be used for machine learning.
The dataset consists of network-based and user behavior-based features. Each feature provides valuable information about potential cyber threats.
These features describe network-level information such as packet size, protocol type, and encryption methods.
network_packet_size (Packet Size in Bytes)
protocol_type (Communication Protocol)
encryption_used (Encryption Protocol)
These features track user activities, such as login attempts and session duration.
login_attempts (Number of Logins)
session_duration (Session Length in Seconds)
failed_logins (Failed Login Attempts)
unusual_time_access (Login Time Anomaly)
0 or 1) indicating whether access happened at an unusual time.ip_reputation_score (Trustworthiness of IP Address)
browser_type (User’s Browser)
attack_detected)1 means an attack was detected, 0 means normal activity.This dataset can be used for intrusion detection systems (IDS) and cybersecurity research. Some key applications include:
Supervised Learning Approaches
attack_detected as the target).Deep Learning Approaches
If attack labels are missing, anomaly detection can be used: - Autoencoders: Learn normal traffic and flag anomalies. - Isolation Forest: Detects outliers based on feature isolation. - One-Class SVM: Learns normal behavior and detects deviations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.
Contents
The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.
The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.
Each archive listed above includes a directory of the same name with the following four files, ready to be processed.
Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.
Facebook
TwitterThe National Institute of Standards and Technology (NIST) provides a Cybersecurity Framework (CSF) for benchmarking and measuring the maturity level of cybersecurity programs across all industries. The City uses this framework and toolset to measure and report on its internal cybersecurity program. The foundation for this measure is the Framework Core, a set of cybersecurity activities, desired outcomes, and applicable references that are common across critical infrastructure/industry sectors. These activities come from the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF) published standard, along with the information security and customer privacy controls it references (NIST 800 Series Special Publications). The Framework Core presents industry standards, guidelines, and practices in a manner that allows for communication of cybersecurity activities and outcomes across the organization from the executive level to the implementation/operations level. The Framework Core consists of five concurrent and continuous functions: identify, protect, detect, respond, and recover. When considered together, these functions provide a high-level, strategic view of the lifecycle of an organization’s management of cybersecurity risk. The Framework Core identifies underlying key categories and subcategories for each function, and matches them with example references, such as existing standards, guidelines, and practices for each subcategory. This page provides data for the Cybersecurity performance measure. Cybersecurity Framework (CSF) scores by each CSF category per fiscal year quarter (Performance Measure 5.12) The performance measure dashboard is available at 5.12 Cybersecurity. Additional InformationSource: Maturity assessment /https://www.nist.gov/topics/cybersecurityContact: Scott CampbellContact E-Mail: Scott_Campbell@tempe.govData Source Type: ExcelPreparation Method: The data is a summary of a detailed and confidential analysis of the city's cybersecurity program. Maturity scores of subcategories within NIST CFS are combined, averaged, and rolled up to a summary score for each major category.Publish Frequency: AnnualPublish Method: ManualData Dictionary
Facebook
TwitterThe Dataset "Cyber Security Indexes" includes four indicators which illustrate the current cyber security situation around the world. The data is provided on 193 countries and territories, grouped by five geographical regions - Africa, North America, South America, Europe and Asia-Pasific.
The Cybersecurity Exposure Index (CEI) defines the level of exposure to cybercrime by country from 0 to 1; the higher the score, the higher the exposure (provided by 10guard). The indicator was last updated in 2020.
The Global Cyber Security Index (GCI) is a trusted reference that measures the commitment of countries to cybersecurity at a global level – to raise awareness of the importance and different dimensions of the issue (provided by the International Telecommunication Union - ITU). The indicator was last updated in 2021.
The National Cyber Security Index (NCSI) measures a country's readiness to address cyber threats and manage cyber incidents. It is composed of categories, capacities, and indicators (provided by NCSI). The indicator was last updated in January 2023.
The Digital Development Level (DDL) defines the average percentage the country received from the maximum value of both indices (provided by NCSI). The indicator was last updated in January 2023.
The dataset can be used for practising data cleaning, data visualization (on maps and round/bar charts), finding correlations between the indexes and predicting the missing data.
The data was used in the analytical article research The Geography of Cybersecurity: Cyber Threats and Vulnerabilities
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Business Context: We are in a time where businesses are more digitally advanced than ever, and as technology improves, organizations’ security postures must be enhanced as well. Failure to do so could result in a costly data breach, as we’ve seen happen with many businesses. The cybercrime landscape has evolved, and threat actors are going after any type of organization, so in order to protect your business’s data, money and reputation, it is critical that you invest in an advanced security system. Cyber security can be described as the collective methods, technologies, and processes to help protect the confidentiality, integrity, and availability of computer systems, networks and data, against cyber-attacks or unauthorized access. a. Information Security vs. Cyber Security vs. Network Security: Information security (also known as InfoSec) ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. Information security differs from cyber security in that InfoSec aims to keep data in any form secure, whereas cyber security protects only digital data. Cyber security, a subset of information security, is the practice of defending your organization’s networks, computers and data from unauthorized digital access, attack or damage by implementing various processes, technologies and practices. With the countless sophisticated threat actors targeting all types of organizations, it is critical that your IT infrastructure is secured at all times to prevent a full-scale attack on your network and risk exposing your company’ data and reputation. Network security, a subset of cyber security, aims to protect any data that is being sent through devices in your network to ensure that the information is not changed or intercepted. The role of network security is to protect the organization’s IT infrastructure from all types of cyber threats including: Viruses, worms and Trojan horses a. Zero-day attacks b. Hacker attacks c. Denial of service attacks d. Spyware and adware Your network security team implements the hardware and software necessary to guard your security architecture. With the proper network security in place, your system can detect emerging threats before they infiltrate your network and compromise your data. There are many components to a network security system that work together to improve your security posture. The most common network security components include: a. Firewalls b. Anti-virus software c. Intrusion detection and prevention systems (IDS/IPS) d. Virtual private networks (VPN) Network Intrusions vs. Computer intrusions vs. Cyber Attacks 1. Computer Intrusions: Computer intrusions occur when someone tries to gain access to any part of your computer system. Computer intruders or hackers typically use automated computer programs when they try to compromise a computer’s security. There are several ways an intruder can try to gain access to your computer. They can Access your a. Computer to view, change, or delete information on your computer, b. Crash or slow down your computer c. Access your private data by examining the files on your system d. Use your computer to access other computers on the Internet. 2. Network Intrusions: A network intrusion refers to any unauthorized activity on a digital network. Network intrusions often involve stealing valuable network resources and almost always jeopardize the security of networks and/or their data. In order to proactively detect and respond to network intrusions, organizations and their cyber security teams need to have a thorough understanding of how network intrusions work and implement network intrusion, detection, and response systems that are designed with attack techniques and cover-up methods in mind. Network Intrusion Attack Techniques: Given the amount of normal activity constantly taking place on digital networks, it can be very difficult to pinpoint anomalies that could indicate a network intrusion has occurred. Below are some of the most common network intrusion attack techniques that organizations should continually look for: Living Off the Land: Attackers increasingly use existing tools and processes and stolen credentials when compromising networks. These tools like operating system utilities, business productivity software and scripting languages are clearly not malware and have very legitimate usage as well. In fact, in most cases, the vast majority of the usage is business justified, allowing an attacker to blend in. Multi-Routing: If a network allows for asymmetric routing, attackers will often leverage multiple routes to access the targeted device or network. This allows them to avoid being detected by having a large portion of suspicious packets bypass certain network segments and any relevant network intrusion systems. Buffer Overwrit...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
model_details: description: A comprehensive database and analysis tool for cyber exploits, vulnerabilities, and related information. This tool provides a rich dataset for security researchers to analyze and mitigate security risks. task_categories:
data_analysis
structure:
data/ exploits.csv vulnerabilities.csv
assets/ favicon.svg
.streamlit/ config.toml
main.py data_processor.py visualizations.py README.md
intended_use: Designed for security researchers, developers, and… See the full description on the dataset page: https://huggingface.co/datasets/Canstralian/CyberExploitDB.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The result of the survey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set was acquired using a survey which intends to measure: • Participants previous experience of cybersecurity training • Participants perception of ideal cybersecurity training • Participants perception of a specific cybersecurity training type called ContextBased MicroTraining • What usability aspects the participants find most important for security features Data was acquired from Sweden, UK and Italy to allow for comparative analysis. Demographic data was collected to allow for further analysis based on those. The files included in this data set are: • Completesurvey: This document includes the full survey presented to the participants. • Dataset: This file contains the variables and data for the different questions (available as .sav (SPSS and .csv)). • Var_info: contains information about the variables in the dataset • Overview: Contains frequency tables for the survey question (for the complete data set) • Sweden, UK, and Italy: Contains frequency tables for the survey questions divided by national sample groups.
Se attahed description
Facebook
Twitterhttps://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
Cybersecurity Incident Logs
Dataset Description
Security events including intrusions, DDoS attacks, and malware on telecom infrastructure
Dataset Information
Category: Emerging and Advanced Format: CSV, Parquet Rows: 30,000 Columns: 14 Date Generated: 2025-10-05 Location: data/cybersecurity_incident_logs/
Schema
Column Type Sample Values
incident_id String SEC00000001
detected_at Datetime 2025-09-30 08:18:00
incident_type String… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/nigerian-telecom-cybersecurity-incident-logs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of the research work titled "A Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations," which is currently awaiting approval for publication. The dataset has been meticulously curated to support the development and evaluation of machine learning models tailored for detecting cyber intrusions in the context of electrical substations. It is intended to facilitate research and advancements in cybersecurity for critical infrastructure, specifically focusing on real-world scenarios within electrical substation environments. We encourage its use for experimentation and benchmarking in related areas of study.
The following sections list the content of the dataset generated.
The outcomes of different test executions are available as follows:
Each test consists of the model results in Python pickle format (with a .pkl extension) and a detailed description of the execution conditions in an output log file (with a .log extension).
A snapshot of the source code used to process these files is included under the filename source-code-cybersecurity-datasets-v2.0.zip. For an updated version, please consider visiting github repository.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This dataset contains web traffic records collected through AWS CloudWatch, aimed at detecting suspicious activities and potential attack attempts.
The data were generated by monitoring traffic to a production web server, using various detection rules to identify anomalous patterns.
In today's cloud environments, cybersecurity is more crucial than ever. The ability to detect and respond to threats in real time can protect organizations from significant consequences. This dataset provides a view of web traffic that has been labeled as suspicious, offering a valuable resource for developers, data scientists, and security experts to enhance threat detection techniques.
Each entry in the dataset represents a stream of traffic to a web server, including the following columns:
bytes_in: Bytes received by the server.
bytes_out: Bytes sent from the server.
creation_time: Timestamp of when the record was created.
end_time: Timestamp of when the connection ended.
src_ip: Source IP address.
src_ip_country_code: Country code of the source IP.
protocol: Protocol used in the connection.
response.code: HTTP response code.
dst_port: Destination port on the server.
dst_ip: Destination IP address.
rule_names: Name of the rule that identified the traffic as suspicious.
observation_name: Observations associated with the traffic.
source.meta: Metadata related to the source.
source.name: Name of the traffic source.
time: Timestamp of the detected event.
detection_types: Type of detection applied.
This dataset is ideal for:
Facebook
Twitterhttps://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Strengthen your cyber defense with our extensive, daily-updated WHOIS database. Accessible in CSV, JSON, and XML, it's a crucial asset for any security strategy.
Facebook
TwitterAttribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
This dataset contains all exploits registered on the exploit-db website, from 02 January 2019 to 06 November 2020. 2,665 exploits were found in this time range, and stored in CSV file. The CSV fields are as follows:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.
Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary materials for the following conference paper:
Valdemar Švábenský, Jan Vykopal, Pavel Čeleda. What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE 2020). https://doi.org/10.1145/3328778.3366816
Preprint available at: https://arxiv.org/abs/1911.11675
How to cite
If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).
@inproceedings{Svabensky2020what, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and \v{C}eleda, Pavel}, title = {{What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences}}, booktitle = {Proceedings of the 51st ACM Technical Symposium on Computer Science Education}, series = {SIGCSE '20}, location = {Portland, OR, USA}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, month = {03}, year = {2020}, pages = {2--8}, numpages = {7}, isbn = {978-1-4503-6793-6}, url = {https://doi.org/10.1145/3328778.3366816}, doi = {10.1145/3328778.3366816}, }
Attached content
The file "SIGCSE 2020 Literature Review.xlsx" is an Excel spreadsheet with three sheets corresponding to 1) all papers found by automated search, 2) manually excluded papers, and 3) papers included in the literature review. There are also three CSV files that correspond to the three individual sheets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For the home environment we have: 01 Wifi Modem Router, 03 Smartphones, 01 server, 01 desktop, 01 Multifunction Printer, 01 network extender, 01 SmartTV, 01 Cable TV decoder and 01 firewall. This environment is a local network. The server has the Monitoring Environment and a network card, which provides connectivity and receives all network traffic for analysis.
The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases.
events.csv.gz - Suricata / Evebox collections
net.csv.gz - Telegraf collections from the TICK stack
netstat.csv.gz - Telegraf collections from the TICK stack
For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files.
The interval between collections, non-consecutive, was from 2018-09-15 to 2019-02-04
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The AdDDoSDN dataset is a comprehensive network traffic corpus built for defensive SDN research, capturing coordinated DDoS attacks and benign enterprise activity through controlled Mininet experiments driven by a remote Ryu L3 controller to deliver high-quality labeled data for real-time detection development. The environment emulates a segmented four-subnet enterprise: h1 (192.168.10.10/24) acts as the external attacker, h2–h5 (192.168.20.10–13/24) form the corporate client subnet with h2 handling ICMP exchanges and h3/h5 generating rich TCP and UDP application sessions, h6 (192.168.30.10/24) resides in the server/DMZ subnet as the primary victim, and controller services operate on 192.168.0.0/24, providing realistic inter-subnet attack paths while preserving centralized SDN visibility.
The dataset follows a structured, configurable timeline sourced from config.json, with the default cycle spanning roughly 35 minutes per run: a 5-second initialization period, 1,600 seconds of benign traffic mixing ICMP, Telnet, SSH, FTP, HTTP/S, and DNS exchanges, enhanced traditional attacks from h1 including an 88-second SYN flood and 176-second UDP flood against h6, plus an 88-second ICMP flood toward h4, and adversarial attacks from h1 to h6 comprising a 72-second TCP state-exhaustion phase with human-like timing patterns, a 24-second application-layer mimicry burst combining heavy HTTP range/post requests with legitimate queries, and a 72-second slow-read phase sustaining long-lived connections. Traditional phases operate around 20–30 packets per second with protocol-compliant options, while adversarial scripts emphasize mimicry and timing jitter.
The dataset provides three synchronized data products derived from each capture cycle: 1. Packet-level data (adddosdn_packet_dataset.csvv): 30 header fields + 2 labels extracted directly from PCAP phases. 2. SDN flow-level data (adddosdn_flow_dataset.csv): Controller statistics with derived rates and labels collected via the Ryu REST API. 3. CICFlow aggregated data (adddosdn_cicflow_dataset.csv): 85 bidirectional behavioral features generated with CICFlowMeter.
The dataset demonstrates exceptional quality containing 3.5 million total records across dataset instances, each representing different temporal scenarios. Labels span normal, syn_flood, udp_flood, icmp_flood, ad_syn, ad_udp, and ad_slow, with Label_binary collapsing them into benign (0) versus malicious (1) classes to maintain consistency across packet, controller-flow, and behavioral representations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is related to the operation of the second KIOS CoE sandboxing use case (SUC2) which inclused 3 scenarios (S1-S3) which examins the behavious a WAP scheme of power grids in case of a short circuit fault and in case of two types of cyber attacks. The description of the architecture of the University of Cyprus/ KIOS CoE sandboxing environmnet used for extracting these datasets along with the full list of scenarios and their detailed implementation are described in the supporting documents. Brief description of each of the 3 scenarios of this SUC2 are provided below. The datasets for the first scenario (S1) of SUC2 examines the operation of a wide area protection scheme in a transmission line which receives data sent from PMUs at the two ends of the lines, when a short-circuit fault occurred in the range of the transmission line between buses 7 and 8 of the system. More details about the scenario SUC2/S1 related to this scenario's dataset can be found in Section 1.3.1 of the SUC2 supporting document. The dataset includes electrical measurements of the current flow in line 7-8 (of the IEEE 9-bus system), in both magnitude and sinusoidal form. The dataset is provided in the form of time-series measurements available as MATLAB (.mat) and CSV files, which were recorded with a 30-second and 40-second time resolution, respectively. The measurements of RMS values were recorded by the Typhoon controller as they were sent by the two PMUs, while the sine wave measurements were recorder through the OPAL-RT The datasets for second scenario (S2) of SUC2 investigates the operation of a wide area protection scheme which receives data sent from PMUs when a MITM FDI cyber-attack is conducted on the measurements of bus 7, virtually implemented within the sandboxing, and introduces a multiplicative change to the current measurements before they are received by the Typhoon controller via IEEE C37.118 protocol. Section 1.3.2 of the SUC2 supporting document provides more details about the scenario related to this dataset. This dataset includes electrical measurements of the current flow, in magnitude and sinusoidal format, of the transmission line between buses 7 and 8 of the digital twin of the IEEE 9-bus system. The dataset is provided in the form of time-series measurements available as MATLAB (.mat) and CSV files which were recorded with a 30-second and 40-second time resolution, respectively. The measurements of magnitude values were recorded by the Typhoon controller, while the data from the sinusoidal waveform were recorder by OPAL-RT. Thie dataset of the SUC2/S3 examines the operation of a wide area protection scheme which receives data sent from PMUs when a combined MITM with DoS cyber-attack is conducted, as actual attack, in the isolated communication network of the sandboxing environment, disrupting the C37.118 UDP communication exchanged between OPAL-RT 5707, where the digital twin of IEEE 9-bus system was implemented, and Typhoon controller. More details about this scenario associated to this dataset can be found in Section 1.3.3 of the supporting document of SUC2. This dataset includes electrical measurements of current’s flow magnitude of the transmission line between buses 7 and 8 of the digital twin of the IEEE 9-bus system. The dataset was recorded by the Typhoon controller, and it is provided in the form of time-series measurements available as MATLAB (.mat) and CSV files which were recorded with a 30-second and 40-second time resolution, respectively. In addition, the dataset includes network traffic packets captured as .pcapng and .csv files.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MedSec-25 is a comprehensive, labeled network traffic dataset designed specifically for the Internet of Medical Things (IoMT) in healthcare environments. It addresses the limitations of existing generic IoT datasets by capturing realistic traffic from a custom-built healthcare IoT lab that mimics real-world hospital operations. The dataset includes both benign (normal) traffic and malicious traffic from multi-staged attack campaigns inspired by the MITRE ATT&CK framework. This allows for the development and evaluation of machine learning-based intrusion detection systems (IDS) tailored to IoMT scenarios, where patient safety and data privacy are critical. The dataset was generated using a variety of medical sensors (e.g., ECG, EEG, HHI, Respiration, SpO2) and environmental sensors (e.g., thermistor, ultrasonic, PIR, flame) connected via Raspberry Pi nodes and an IoT server. Traffic was captured over 7.5 hours using tools like Wireshark and tcpdump, resulting in PCAPNG files. These were processed with CICFlowMeter to extract flow-based features, producing a cleaned CSV dataset with 554,534 bidirectional network flows and 84 features.
Realistic Setup: Built in a physical lab at Rochester Institute of Technology, Dubai, incorporating diverse IoMT devices, protocols (e.g., MQTT, SSH, Telnet, FTP, HTTP, DNS), and real-time patient interactions (anonymized to comply with privacy regulations like HIPAA).
Multi-Staged Attacks: Unlike datasets focusing on isolated attacks, MedSec-25 simulates full attack chains: Reconnaissance (e.g., SYN/TCP scans, OS fingerprinting), Initial Access (e.g., brute-force, malformed MQTT packets), Lateral Movement (e.g., exploiting vulnerabilities to pivot between devices), and Exfiltration (e.g., data theft via MQTT).
Imbalanced Nature: This is the cleaned (imbalanced) version of the dataset. Users may need to apply balancing techniques (e.g., SMOTE oversampling + random undersampling) for model training, as demonstrated in the associated paper.
Size and Quality: 554,534 rows, no duplicates, no missing values (except 111 NaNs in Flow Byts/s, ~0.02%, which can be handled via imputation). Data types include float64 (45 columns), int64 (34 columns), and object (5 columns: Flow ID, Src IP, Dst IP, Timestamp, Label).
Utility: Preliminary models trained on this dataset (e.g., KNN: 98.09% accuracy, Decision Tree: 98.35% accuracy) show excellent performance for detecting attack stages.
This dataset is ideal for researchers in cybersecurity, machine learning, and healthcare IoT, enabling the creation of an IDS that can detect attacks at different phases to prevent escalation.
Benign Traffic: Generated over two days with active sensors, services (HTTP dashboard for patient monitoring, SSH/Telnet for remote access, FTP for file transfers), and real users (students/faculty) interacting with medical devices. No personally identifiable information was stored.
Malicious Traffic: Two Kali Linux attacker machines simulated MITRE ATT&CK-inspired campaigns using tools like Nmap, Scapy, Metasploit, and custom Python scripts.
Capture Tools: Wireshark and tcpdump for PCAPNG files (total ~1GB: 600MB benign, 400MB malicious).
Processing: Combined PCAP files per label, extracted features with CICFlowMeter, labeled flows manually based on attack phases, and cleaned for ML readiness. The final cleaned CSV is ~350MB.
The dataset includes 84 features extracted by CICFlowMeter, categorized as:
Identifiers: Flow ID, Src IP, Src Port, Dst IP, Dst Port, Protocol, Timestamp.
Time-Series Metrics: Flow Duration, Flow IAT Mean/Std/Max/Min, Fwd/Bwd IAT Tot/Mean/Std/Max/Min.
Size/Count Statistics: Tot Fwd/Bwd Pkts, TotLen Fwd/Bwd Pkts, Fwd/Bwd Pkt Len Max/Min/Mean/Std, Pkt Len Min/Max/Mean/Std/Var, Pkt Size Avg.
Flag Counts: Fwd/Bwd PSH/URG Flags, FIN/SYN/RST/PSH/ACK/URG/CWE/ECE Flag Cnt.
Rates and Ratios: Flow Byts/s, Flow Pkts/s, Fwd/Bwd Pkts/s, Down/Up Ratio, Active/Idle Mean/Std/Max/Min.
Segmentation and Others: Fwd/Bwd Seg Size Avg/Min, Subflow Fwd/Bwd Pkts/Byts, Init Fwd/Bwd Win Byts, Fwd Act Data Pkts, Fwd/Bwd Byts/b Avg, Fwd/Bwd Pkts/b Avg, Fwd/Bwd Blk Rate Avg.
The dataset is labeled with 5 classes representing benign behavior and attack stages:
Reconnaissance: 401,683 flows Initial Access: 102,090 flows Exfiltration: 25,915 flows Lateral Movement: 12,498 flows Benign: 12,348 flows
Note: The dataset is imbalanced, with Reconnaissance dominating. Apply balancing techniques for optimal ML performance.
Preprocessing Suggestions: Encode categorical features (e.g., Protocol, Label) using LabelEncoder. Normalize numerical features with Min-Max Scaler or StandardScaler. Handle the minor NaNs in Flow Byts/s via mean imputation.
Model Training: Split into train/test (e.g., 80/20). Suitable for classification tasks w...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries. It is designed for threat intelligence analysis, cybersecurity trend forecasting, and machine learning model development to enhance global digital security.
| Column Name | Description |
|---|---|
| Country | Country where the attack occurred |
| Year | Year of the incident |
| Threat Type | Type of cybersecurity threat (e.g., Malware, DDoS) |
| Attack Vector | Method of attack (e.g., Phishing, SQL Injection) |
| Affected Industry | Industry targeted (e.g., Finance, Healthcare) |
| Data Breached (GB) | Volume of data compromised |
| Financial Impact ($M) | Estimated financial loss in millions |
| Severity Level | Low, Medium, High, Critical |
| Response Time (Hours) | Time taken to mitigate the attack |
| Mitigation Strategy | Countermeasures taken |