100+ datasets found
  1. 🌐 Global Cybersecurity Threats (2015-2024)

    • kaggle.com
    zip
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). 🌐 Global Cybersecurity Threats (2015-2024) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/global-cybersecurity-threats-2015-2024
    Explore at:
    zip(48178 bytes)Available download formats
    Dataset updated
    Mar 16, 2025
    Authors
    Atharva Soundankar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📂

    The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries. It is designed for threat intelligence analysis, cybersecurity trend forecasting, and machine learning model development to enhance global digital security.

    📊 Column Descriptions

    Column NameDescription
    CountryCountry where the attack occurred
    YearYear of the incident
    Threat TypeType of cybersecurity threat (e.g., Malware, DDoS)
    Attack VectorMethod of attack (e.g., Phishing, SQL Injection)
    Affected IndustryIndustry targeted (e.g., Finance, Healthcare)
    Data Breached (GB)Volume of data compromised
    Financial Impact ($M)Estimated financial loss in millions
    Severity LevelLow, Medium, High, Critical
    Response Time (Hours)Time taken to mitigate the attack
    Mitigation StrategyCountermeasures taken
  2. Cybersecurity 🪪 Intrusion 🦠 Detection Dataset

    • kaggle.com
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dinesh Naveen Kumar Samudrala (2025). Cybersecurity 🪪 Intrusion 🦠 Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dnkumars/cybersecurity-intrusion-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dinesh Naveen Kumar Samudrala
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Cybersecurity Intrusion Detection Dataset is designed for detecting cyber intrusions based on network traffic and user behavior. Below, I’ll explain each aspect in detail, including the dataset structure, feature importance, possible analysis approaches, and how it can be used for machine learning.

    1. Understanding the Features

    The dataset consists of network-based and user behavior-based features. Each feature provides valuable information about potential cyber threats.

    A. Network-Based Features

    These features describe network-level information such as packet size, protocol type, and encryption methods.

    1. network_packet_size (Packet Size in Bytes)

      • Represents the size of network packets, ranging between 64 to 1500 bytes.
      • Packets on the lower end (~64 bytes) may indicate control messages, while larger packets (~1500 bytes) often carry bulk data.
      • Attackers may use abnormally small or large packets for reconnaissance or exploitation attempts.
    2. protocol_type (Communication Protocol)

      • The protocol used in the session: TCP, UDP, or ICMP.
      • TCP (Transmission Control Protocol): Reliable, connection-oriented (common for HTTP, HTTPS, SSH).
      • UDP (User Datagram Protocol): Faster but less reliable (used for VoIP, streaming).
      • ICMP (Internet Control Message Protocol): Used for network diagnostics (ping); often abused in Denial-of-Service (DoS) attacks.
    3. encryption_used (Encryption Protocol)

      • Values: AES, DES, None.
      • AES (Advanced Encryption Standard): Strong encryption, commonly used.
      • DES (Data Encryption Standard): Older encryption, weaker security.
      • None: Indicates unencrypted communication, which can be risky.
      • Attackers might use no encryption to avoid detection or weak encryption to exploit vulnerabilities.

    B. User Behavior-Based Features

    These features track user activities, such as login attempts and session duration.

    1. login_attempts (Number of Logins)

      • High values might indicate brute-force attacks (repeated login attempts).
      • Typical users have 1–3 login attempts, while an attack may have hundreds or thousands.
    2. session_duration (Session Length in Seconds)

      • A very long session might indicate unauthorized access or persistence by an attacker.
      • Attackers may try to stay connected to maintain access.
    3. failed_logins (Failed Login Attempts)

      • High failed login counts indicate credential stuffing or dictionary attacks.
      • Many failed attempts followed by a successful login could suggest an account was compromised.
    4. unusual_time_access (Login Time Anomaly)

      • A binary flag (0 or 1) indicating whether access happened at an unusual time.
      • Attackers often operate outside normal business hours to evade detection.
    5. ip_reputation_score (Trustworthiness of IP Address)

      • A score from 0 to 1, where higher values indicate suspicious activity.
      • IP addresses associated with botnets, spam, or previous attacks tend to have higher scores.
    6. browser_type (User’s Browser)

      • Common browsers: Chrome, Firefox, Edge, Safari.
      • Unknown: Could be an indicator of automated scripts or bots.

    2. Target Variable (attack_detected)

    • Binary classification: 1 means an attack was detected, 0 means normal activity.
    • The dataset is useful for supervised machine learning, where a model learns from labeled attack patterns.

    3. Possible Use Cases

    This dataset can be used for intrusion detection systems (IDS) and cybersecurity research. Some key applications include:

    A. Machine Learning-Based Intrusion Detection

    1. Supervised Learning Approaches

      • Classification Models (Logistic Regression, Decision Trees, Random Forest, XGBoost, SVM)
      • Train the model using labeled data (attack_detected as the target).
      • Evaluate using accuracy, precision, recall, F1-score.
    2. Deep Learning Approaches

      • Use Neural Networks (DNN, LSTM, CNN) for pattern recognition.
      • LSTMs work well for time-series-based network traffic analysis.

    B. Anomaly Detection (Unsupervised Learning)

    If attack labels are missing, anomaly detection can be used: - Autoencoders: Learn normal traffic and flag anomalies. - Isolation Forest: Detects outliers based on feature isolation. - One-Class SVM: Learns normal behavior and detects deviations.

    C. Rule-Based Detection

    • If certain thresholds are met (e.g., failed_logins > 10 & ip_reputation_score > 0.8), an alert is triggered.

    4. Challenges & Considerations

    • Adversarial Attacks: Attackers may modify traffic to evade detection.
    • Concept Drift: Cyber threats...
  3. Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip
    Updated Jun 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. http://doi.org/10.5281/zenodo.3746129
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 12, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

    Contents

    The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

    • Day 1, March 19, 2019
      • Start: 2019-03-19T11:00:00.000000+01:00
      • End: 2019-03-19T18:00:00.000000+01:00
    • Day 2, March 20, 2019
      • Start: 2019-03-20T08:00:00.000000+01:00
      • End: 2019-03-20T15:30:00.000000+01:00

    The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

    • cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.
    • cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.
    • cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

    Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

    • data.json.gz – the actual data entries in a single gzipped JSON file.
    • dictionary.yml – data dictionary for the entries.
    • schema.ddl – data schema for Apache Spark analytics engine.
    • schema.jsch – JSON schema for the entries.

    Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

    • global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.
    • global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.
    • redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.
    • redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.
    • topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.
    • topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.

  4. d

    5.12 Cybersecurity (detail)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +7more
    Updated Aug 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 5.12 Cybersecurity (detail) [Dataset]. https://catalog.data.gov/dataset/5-12-cybersecurity-detail-d8bb7
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset provided by
    City of Tempe
    Description

    The National Institute of Standards and Technology (NIST) provides a Cybersecurity Framework (CSF) for benchmarking and measuring the maturity level of cybersecurity programs across all industries. The City uses this framework and toolset to measure and report on its internal cybersecurity program. The foundation for this measure is the Framework Core, a set of cybersecurity activities, desired outcomes, and applicable references that are common across critical infrastructure/industry sectors. These activities come from the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF) published standard, along with the information security and customer privacy controls it references (NIST 800 Series Special Publications). The Framework Core presents industry standards, guidelines, and practices in a manner that allows for communication of cybersecurity activities and outcomes across the organization from the executive level to the implementation/operations level. The Framework Core consists of five concurrent and continuous functions: identify, protect, detect, respond, and recover. When considered together, these functions provide a high-level, strategic view of the lifecycle of an organization’s management of cybersecurity risk. The Framework Core identifies underlying key categories and subcategories for each function, and matches them with example references, such as existing standards, guidelines, and practices for each subcategory. This page provides data for the Cybersecurity performance measure. Cybersecurity Framework (CSF) scores by each CSF category per fiscal year quarter (Performance Measure 5.12) The performance measure dashboard is available at 5.12 Cybersecurity. Additional InformationSource: Maturity assessment /https://www.nist.gov/topics/cybersecurityContact: Scott CampbellContact E-Mail: Scott_Campbell@tempe.govData Source Type: ExcelPreparation Method: The data is a summary of a detailed and confidential analysis of the city's cybersecurity program. Maturity scores of subcategories within NIST CFS are combined, averaged, and rolled up to a summary score for each major category.Publish Frequency: AnnualPublish Method: ManualData Dictionary

  5. Cyber Security Indexes

    • kaggle.com
    zip
    Updated Apr 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kateryna Meleshenko (2023). Cyber Security Indexes [Dataset]. https://www.kaggle.com/datasets/katerynameleshenko/cyber-security-indexes
    Explore at:
    zip(3421 bytes)Available download formats
    Dataset updated
    Apr 16, 2023
    Authors
    Kateryna Meleshenko
    Description

    The Dataset "Cyber Security Indexes" includes four indicators which illustrate the current cyber security situation around the world. The data is provided on 193 countries and territories, grouped by five geographical regions - Africa, North America, South America, Europe and Asia-Pasific.

    The Cybersecurity Exposure Index (CEI) defines the level of exposure to cybercrime by country from 0 to 1; the higher the score, the higher the exposure (provided by 10guard). The indicator was last updated in 2020.

    The Global Cyber Security Index (GCI) is a trusted reference that measures the commitment of countries to cybersecurity at a global level – to raise awareness of the importance and different dimensions of the issue (provided by the International Telecommunication Union - ITU). The indicator was last updated in 2021.

    The National Cyber Security Index (NCSI) measures a country's readiness to address cyber threats and manage cyber incidents. It is composed of categories, capacities, and indicators (provided by NCSI). The indicator was last updated in January 2023.

    The Digital Development Level (DDL) defines the average percentage the country received from the maximum value of both indices (provided by NCSI). The indicator was last updated in January 2023.

    The dataset can be used for practising data cleaning, data visualization (on maps and round/bar charts), finding correlations between the indexes and predicting the missing data.

    The data was used in the analytical article research The Geography of Cybersecurity: Cyber Threats and Vulnerabilities

  6. Cyber Security

    • kaggle.com
    zip
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi Kumar (2024). Cyber Security [Dataset]. https://www.kaggle.com/datasets/rishikumarrajvansh/cyber-security
    Explore at:
    zip(8913512 bytes)Available download formats
    Dataset updated
    Jan 29, 2024
    Authors
    Rishi Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Business Context: We are in a time where businesses are more digitally advanced than ever, and as technology improves, organizations’ security postures must be enhanced as well. Failure to do so could result in a costly data breach, as we’ve seen happen with many businesses. The cybercrime landscape has evolved, and threat actors are going after any type of organization, so in order to protect your business’s data, money and reputation, it is critical that you invest in an advanced security system. Cyber security can be described as the collective methods, technologies, and processes to help protect the confidentiality, integrity, and availability of computer systems, networks and data, against cyber-attacks or unauthorized access. a. Information Security vs. Cyber Security vs. Network Security: Information security (also known as InfoSec) ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. Information security differs from cyber security in that InfoSec aims to keep data in any form secure, whereas cyber security protects only digital data. Cyber security, a subset of information security, is the practice of defending your organization’s networks, computers and data from unauthorized digital access, attack or damage by implementing various processes, technologies and practices. With the countless sophisticated threat actors targeting all types of organizations, it is critical that your IT infrastructure is secured at all times to prevent a full-scale attack on your network and risk exposing your company’ data and reputation. Network security, a subset of cyber security, aims to protect any data that is being sent through devices in your network to ensure that the information is not changed or intercepted. The role of network security is to protect the organization’s IT infrastructure from all types of cyber threats including: Viruses, worms and Trojan horses a. Zero-day attacks b. Hacker attacks c. Denial of service attacks d. Spyware and adware Your network security team implements the hardware and software necessary to guard your security architecture. With the proper network security in place, your system can detect emerging threats before they infiltrate your network and compromise your data. There are many components to a network security system that work together to improve your security posture. The most common network security components include: a. Firewalls b. Anti-virus software c. Intrusion detection and prevention systems (IDS/IPS) d. Virtual private networks (VPN) Network Intrusions vs. Computer intrusions vs. Cyber Attacks 1. Computer Intrusions: Computer intrusions occur when someone tries to gain access to any part of your computer system. Computer intruders or hackers typically use automated computer programs when they try to compromise a computer’s security. There are several ways an intruder can try to gain access to your computer. They can Access your a. Computer to view, change, or delete information on your computer, b. Crash or slow down your computer c. Access your private data by examining the files on your system d. Use your computer to access other computers on the Internet. 2. Network Intrusions: A network intrusion refers to any unauthorized activity on a digital network. Network intrusions often involve stealing valuable network resources and almost always jeopardize the security of networks and/or their data. In order to proactively detect and respond to network intrusions, organizations and their cyber security teams need to have a thorough understanding of how network intrusions work and implement network intrusion, detection, and response systems that are designed with attack techniques and cover-up methods in mind. Network Intrusion Attack Techniques: Given the amount of normal activity constantly taking place on digital networks, it can be very difficult to pinpoint anomalies that could indicate a network intrusion has occurred. Below are some of the most common network intrusion attack techniques that organizations should continually look for: Living Off the Land: Attackers increasingly use existing tools and processes and stolen credentials when compromising networks. These tools like operating system utilities, business productivity software and scripting languages are clearly not malware and have very legitimate usage as well. In fact, in most cases, the vast majority of the usage is business justified, allowing an attacker to blend in. Multi-Routing: If a network allows for asymmetric routing, attackers will often leverage multiple routes to access the targeted device or network. This allows them to avoid being detected by having a large portion of suspicious packets bypass certain network segments and any relevant network intrusion systems. Buffer Overwrit...

  7. h

    CyberExploitDB

    • huggingface.co
    Updated Oct 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esteban Cara de Sexo (2024). CyberExploitDB [Dataset]. https://huggingface.co/datasets/Canstralian/CyberExploitDB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2024
    Authors
    Esteban Cara de Sexo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    model_details: description: A comprehensive database and analysis tool for cyber exploits, vulnerabilities, and related information. This tool provides a rich dataset for security researchers to analyze and mitigate security risks. task_categories:

    data_analysis

    structure:

    data/ exploits.csv vulnerabilities.csv

    assets/ favicon.svg

    .streamlit/ config.toml

    main.py data_processor.py visualizations.py README.md

    intended_use: Designed for security researchers, developers, and… See the full description on the dataset page: https://huggingface.co/datasets/Canstralian/CyberExploitDB.

  8. Healthcare Cybersecurity Survey 1-14372.csv

    • figshare.com
    csv
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhalgas Temirbekov (2025). Healthcare Cybersecurity Survey 1-14372.csv [Dataset]. http://doi.org/10.6084/m9.figshare.28196003.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Zhalgas Temirbekov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The result of the survey.

  9. r

    Survey data from a survey about cybersecurity training and usability of...

    • researchdata.se
    • demo.researchdata.se
    Updated Jun 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Kävrestad (2021). Survey data from a survey about cybersecurity training and usability of security functions [Dataset]. http://doi.org/10.5878/pv4m-s237
    Explore at:
    (34816)Available download formats
    Dataset updated
    Jun 29, 2021
    Dataset provided by
    University of Skövde
    Authors
    Joakim Kävrestad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sweden, Italy, United Kingdom
    Description

    This data set was acquired using a survey which intends to measure: • Participants previous experience of cybersecurity training • Participants perception of ideal cybersecurity training • Participants perception of a specific cybersecurity training type called ContextBased MicroTraining • What usability aspects the participants find most important for security features Data was acquired from Sweden, UK and Italy to allow for comparative analysis. Demographic data was collected to allow for further analysis based on those. The files included in this data set are: • Completesurvey: This document includes the full survey presented to the participants. • Dataset: This file contains the variables and data for the different questions (available as .sav (SPSS and .csv)). • Var_info: contains information about the variables in the dataset • Overview: Contains frequency tables for the survey question (for the complete data set) • Sweden, UK, and Italy: Contains frequency tables for the survey questions divided by national sample groups.

    Se attahed description

  10. h

    nigerian-telecom-cybersecurity-incident-logs

    • huggingface.co
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electric Sheep (2025). nigerian-telecom-cybersecurity-incident-logs [Dataset]. https://huggingface.co/datasets/electricsheepafrica/nigerian-telecom-cybersecurity-incident-logs
    Explore at:
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    Electric Sheep
    License

    https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/

    Area covered
    Nigeria
    Description

    Cybersecurity Incident Logs

      Dataset Description
    

    Security events including intrusions, DDoS attacks, and malware on telecom infrastructure

      Dataset Information
    

    Category: Emerging and Advanced Format: CSV, Parquet Rows: 30,000 Columns: 14 Date Generated: 2025-10-05 Location: data/cybersecurity_incident_logs/

      Schema
    

    Column Type Sample Values

    incident_id String SEC00000001

    detected_at Datetime 2025-09-30 08:18:00

    incident_type String… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/nigerian-telecom-cybersecurity-incident-logs.

  11. Dataset to Train Intrusion Detection Systems based on Machine Learning...

    • zenodo.org
    application/gzip, bin +1
    Updated Nov 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esteban Damian Gutierrez Mlot; Esteban Damian Gutierrez Mlot (2024). Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations [Dataset]. http://doi.org/10.5281/zenodo.14066350
    Explore at:
    bin, application/gzip, zipAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Esteban Damian Gutierrez Mlot; Esteban Damian Gutierrez Mlot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DATASET

    This dataset is part of the research work titled "A Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations," which is currently awaiting approval for publication. The dataset has been meticulously curated to support the development and evaluation of machine learning models tailored for detecting cyber intrusions in the context of electrical substations. It is intended to facilitate research and advancements in cybersecurity for critical infrastructure, specifically focusing on real-world scenarios within electrical substation environments. We encourage its use for experimentation and benchmarking in related areas of study.

    The following sections list the content of the dataset generated.

    Data

    • raw
      • iec6180
        • attack-free-data
          • capture61850-attackfree.pcap (from real substation)
          • capture61850-attackfree_PTP.pcap
          • capture61850-attackfree_normalfault.pcap
        • attack-data
          • capture61850-floodattack_withfault.pcap
          • capture61850-floodattack_withoutfault.pcap
          • capture61850-fuzzyattack_withfault.pcap
          • capture61850-fuzzyattack_withoutfault.pcap
          • capture61850-replay.pcap
          • capture61850-ptpattack.pcap
      • iec104
        • attack-free-data
          • capture104-attackfree.pcap (from real substation)
        • attack-data
          • capture104-dosattack.pcap
          • capture104-floodattack.pcap
          • capture104-fuzzyattack.pcap
          • capture104-iec104starvationattack.pcap
          • capture104-mitmattack.pcap
          • capture104-ntpddosattack.pcap
          • capture104-portscanattack.pcap
    • processed
      • iec6180
        • attack-free-data
          • capture61850-attackfree.csv
          • capture61850-attackfree_PTP.csv
          • capture61850-attackfree_normalfault.csv
        • attack-data
          • capture61850-floodattack_withfault.csv
          • capture61850-floodattack_withoutfault.csv
          • capture61850-fuzzyattack_withfault.csv
          • capture61850-fuzzyattack_withoutfault.csv
          • capture61850-replay.csv
          • capture61850-ptpattack.csv
        • headers_iec61850[all].txt
      • iec104
        • attack-free-data
          • capture104-attackfree.csv
        • attack-data
          • capture104-dosattack.csv
          • capture104-floodattack.csv
          • capture104-fuzzyattack.csv
          • capture104-iec104starvationattack.csv
          • capture104-mitmattack.csv
          • capture104-ntpddosattack.csv
          • capture104-portscanattack.csv
        • headers_iec104[all].txt

    Description

    • file type: it may be captured61850 or captured104 depending on whether it contains network captures of the protocol IEC61850 or IEC104.
    • attack: attack free (attackfree) or attack name is added to the file name.
    • function: optionally, if there are some details about functionality captured (normalfault) or specific protocol capture (PTP).
    • file extension: the type can be PCAP (network capture) or CSV (flow file).

    Results

    • results
      • test1-iec104
        • model-test1-iec104.pkl
        • test1-iec104.log
      • test1-iec61850
        • model-test1-iec61850.pkl
        • test1-iec61850.log
      • test2-iec61850
        • model-test2-iec61850.pkl
        • test2-iec61850.log


    Description

    The outcomes of different test executions are available as follows:

    • test1-iec104: IEC 104 protocol for all attacks and attack free scenario
    • test1-iec61850: IEC 61850 protocol for fuzzy attack with fault injection and attack free scenario
    • test2-iec61850: IEC 61850 protocol for fuzzy attack normal operation and attack free scenario


    Each test consists of the model results in Python pickle format (with a .pkl extension) and a detailed description of the execution conditions in an output log file (with a .log extension).

    Source Code

    A snapshot of the source code used to process these files is included under the filename source-code-cybersecurity-datasets-v2.0.zip. For an updated version, please consider visiting github repository.

  12. Cybersecurity: Suspicious Web Threat Interactions

    • kaggle.com
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JanCSG (2024). Cybersecurity: Suspicious Web Threat Interactions [Dataset]. https://www.kaggle.com/datasets/jancsg/cybersecurity-suspicious-web-threat-interactions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JanCSG
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This dataset contains web traffic records collected through AWS CloudWatch, aimed at detecting suspicious activities and potential attack attempts.

    The data were generated by monitoring traffic to a production web server, using various detection rules to identify anomalous patterns.

    Context

    In today's cloud environments, cybersecurity is more crucial than ever. The ability to detect and respond to threats in real time can protect organizations from significant consequences. This dataset provides a view of web traffic that has been labeled as suspicious, offering a valuable resource for developers, data scientists, and security experts to enhance threat detection techniques.

    Dataset Content

    Each entry in the dataset represents a stream of traffic to a web server, including the following columns:

    bytes_in: Bytes received by the server.

    bytes_out: Bytes sent from the server.

    creation_time: Timestamp of when the record was created.

    end_time: Timestamp of when the connection ended.

    src_ip: Source IP address.

    src_ip_country_code: Country code of the source IP.

    protocol: Protocol used in the connection.

    response.code: HTTP response code.

    dst_port: Destination port on the server.

    dst_ip: Destination IP address.

    rule_names: Name of the rule that identified the traffic as suspicious.

    observation_name: Observations associated with the traffic.

    source.meta: Metadata related to the source.

    source.name: Name of the traffic source.

    time: Timestamp of the detected event.

    detection_types: Type of detection applied.

    Potential Uses

    This dataset is ideal for:

    • Anomaly Detection: Developing models to detect unusual behaviors in web traffic.
    • Classification Models: Training models to automatically classify traffic as normal or suspicious.
    • Security Analysis: Conducting security analyses to understand the tactics, techniques, and procedures of attackers.
  13. w

    Database for Cyber Security Agencies

    • whoisdatacenter.com
    csv
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2025). Database for Cyber Security Agencies [Dataset]. https://whoisdatacenter.com/whois-database-for-cyber-security-companies/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 30, 2025
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Nov 30, 2025
    Description

    Strengthen your cyber defense with our extensive, daily-updated WHOIS database. Accessible in CSV, JSON, and XML, it's a crucial asset for any security strategy.

  14. Exploits: All registered in exploit-db, from January 2019 to October 2020

    • zenodo.org
    csv
    Updated Nov 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erika Bracamonte; Anthony Alarcon; Erika Bracamonte; Anthony Alarcon (2020). Exploits: All registered in exploit-db, from January 2019 to October 2020 [Dataset]. http://doi.org/10.5281/zenodo.4259954
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 8, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erika Bracamonte; Anthony Alarcon; Erika Bracamonte; Anthony Alarcon
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset contains all exploits registered on the exploit-db website, from 02 January 2019 to 06 November 2020. 2,665 exploits were found in this time range, and stored in CSV file. The CSV fields are as follows:

    • Id: exploit-db identifier.
    • Link Exploit: Link to download exploit code source.
    • App: Link to download the application in case the exploit is an application, and if it isn't an app the field is empty.
    • Date: Publication date on the website.
    • Verification: Exploit verfication by website.
    • Title: Exploit title.
    • Type: Exploit type. Could be Local, Remote and Webapp.
    • Platform: Platform on which the exploit is based on.
    • Author: Name of exploit author.
  15. m

    Composed Encrypted Malicious Traffic Dataset for machine learning based...

    • data.mendeley.com
    Updated Oct 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihao Wang (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.2
    Explore at:
    Dataset updated
    Oct 12, 2021
    Authors
    Zihao Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

    Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.

  16. Z

    Dataset: What Are Cybersecurity Education Papers About? A Systematic...

    • data.niaid.nih.gov
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Švábenský, Valdemar; Vykopal, Jan; Čeleda, Pavel (2023). Dataset: What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3506639
    Explore at:
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    Masaryk University
    Authors
    Švábenský, Valdemar; Vykopal, Jan; Čeleda, Pavel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary materials for the following conference paper:

    Valdemar Švábenský, Jan Vykopal, Pavel Čeleda. What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE 2020). https://doi.org/10.1145/3328778.3366816

    Preprint available at: https://arxiv.org/abs/1911.11675

    How to cite

    If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

    @inproceedings{Svabensky2020what, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and \v{C}eleda, Pavel}, title = {{What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences}}, booktitle = {Proceedings of the 51st ACM Technical Symposium on Computer Science Education}, series = {SIGCSE '20}, location = {Portland, OR, USA}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, month = {03}, year = {2020}, pages = {2--8}, numpages = {7}, isbn = {978-1-4503-6793-6}, url = {https://doi.org/10.1145/3328778.3366816}, doi = {10.1145/3328778.3366816}, }

    Attached content

    The file "SIGCSE 2020 Literature Review.xlsx" is an Excel spreadsheet with three sheets corresponding to 1) all papers found by automated search, 2) manually excluded papers, and 3) papers included in the literature review. There are also three CSV files that correspond to the three individual sheets.

  17. MAD (MAlicious Traffic Dataset) in home and commercial environments - Home...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques (2021). MAD (MAlicious Traffic Dataset) in home and commercial environments - Home environment [Dataset]. http://doi.org/10.5281/zenodo.5094055
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For the home environment we have: 01 Wifi Modem Router, 03 Smartphones, 01 server, 01 desktop, 01 Multifunction Printer, 01 network extender, 01 SmartTV, 01 Cable TV decoder and 01 firewall. This environment is a local network. The server has the Monitoring Environment and a network card, which provides connectivity and receives all network traffic for analysis.

    The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases.

    events.csv.gz - Suricata / Evebox collections

    net.csv.gz - Telegraf collections from the TICK stack

    netstat.csv.gz - Telegraf collections from the TICK stack

    For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files.

    The interval between collections, non-consecutive, was from 2018-09-15 to 2019-02-04

  18. m

    AdDDoSDN: Adversarial DDoS Attacks Dataset for Software-Defined Networks

    • data.mendeley.com
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohd Adil Bin Mokti (2025). AdDDoSDN: Adversarial DDoS Attacks Dataset for Software-Defined Networks [Dataset]. http://doi.org/10.17632/9jp6r68y98.1
    Explore at:
    Dataset updated
    Sep 19, 2025
    Authors
    Mohd Adil Bin Mokti
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The AdDDoSDN dataset is a comprehensive network traffic corpus built for defensive SDN research, capturing coordinated DDoS attacks and benign enterprise activity through controlled Mininet experiments driven by a remote Ryu L3 controller to deliver high-quality labeled data for real-time detection development. The environment emulates a segmented four-subnet enterprise: h1 (192.168.10.10/24) acts as the external attacker, h2–h5 (192.168.20.10–13/24) form the corporate client subnet with h2 handling ICMP exchanges and h3/h5 generating rich TCP and UDP application sessions, h6 (192.168.30.10/24) resides in the server/DMZ subnet as the primary victim, and controller services operate on 192.168.0.0/24, providing realistic inter-subnet attack paths while preserving centralized SDN visibility.

    The dataset follows a structured, configurable timeline sourced from config.json, with the default cycle spanning roughly 35 minutes per run: a 5-second initialization period, 1,600 seconds of benign traffic mixing ICMP, Telnet, SSH, FTP, HTTP/S, and DNS exchanges, enhanced traditional attacks from h1 including an 88-second SYN flood and 176-second UDP flood against h6, plus an 88-second ICMP flood toward h4, and adversarial attacks from h1 to h6 comprising a 72-second TCP state-exhaustion phase with human-like timing patterns, a 24-second application-layer mimicry burst combining heavy HTTP range/post requests with legitimate queries, and a 72-second slow-read phase sustaining long-lived connections. Traditional phases operate around 20–30 packets per second with protocol-compliant options, while adversarial scripts emphasize mimicry and timing jitter.

    The dataset provides three synchronized data products derived from each capture cycle: 1. Packet-level data (adddosdn_packet_dataset.csvv): 30 header fields + 2 labels extracted directly from PCAP phases. 2. SDN flow-level data (adddosdn_flow_dataset.csv): Controller statistics with derived rates and labels collected via the Ryu REST API. 3. CICFlow aggregated data (adddosdn_cicflow_dataset.csv): 85 bidirectional behavioral features generated with CICFlowMeter.

    The dataset demonstrates exceptional quality containing 3.5 million total records across dataset instances, each representing different temporal scenarios. Labels span normal, syn_flood, udp_flood, icmp_flood, ad_syn, ad_udp, and ad_slow, with Label_binary collapsing them into benign (0) versus malicious (1) classes to maintain consistency across packet, controller-flow, and behavioral representations.

  19. Dataset for Sandboxing use case SUC2 related to cyber attacks affecting Wide...

    • data.europa.eu
    unknown
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). Dataset for Sandboxing use case SUC2 related to cyber attacks affecting Wide Area Protection [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-12949261?locale=fi
    Explore at:
    unknown(590022)Available download formats
    Dataset updated
    Aug 18, 2024
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is related to the operation of the second KIOS CoE sandboxing use case (SUC2) which inclused 3 scenarios (S1-S3) which examins the behavious a WAP scheme of power grids in case of a short circuit fault and in case of two types of cyber attacks. The description of the architecture of the University of Cyprus/ KIOS CoE sandboxing environmnet used for extracting these datasets along with the full list of scenarios and their detailed implementation are described in the supporting documents. Brief description of each of the 3 scenarios of this SUC2 are provided below. The datasets for the first scenario (S1) of SUC2 examines the operation of a wide area protection scheme in a transmission line which receives data sent from PMUs at the two ends of the lines, when a short-circuit fault occurred in the range of the transmission line between buses 7 and 8 of the system. More details about the scenario SUC2/S1 related to this scenario's dataset can be found in Section 1.3.1 of the SUC2 supporting document. The dataset includes electrical measurements of the current flow in line 7-8 (of the IEEE 9-bus system), in both magnitude and sinusoidal form. The dataset is provided in the form of time-series measurements available as MATLAB (.mat) and CSV files, which were recorded with a 30-second and 40-second time resolution, respectively. The measurements of RMS values were recorded by the Typhoon controller as they were sent by the two PMUs, while the sine wave measurements were recorder through the OPAL-RT The datasets for second scenario (S2) of SUC2 investigates the operation of a wide area protection scheme which receives data sent from PMUs when a MITM FDI cyber-attack is conducted on the measurements of bus 7, virtually implemented within the sandboxing, and introduces a multiplicative change to the current measurements before they are received by the Typhoon controller via IEEE C37.118 protocol. Section 1.3.2 of the SUC2 supporting document provides more details about the scenario related to this dataset. This dataset includes electrical measurements of the current flow, in magnitude and sinusoidal format, of the transmission line between buses 7 and 8 of the digital twin of the IEEE 9-bus system. The dataset is provided in the form of time-series measurements available as MATLAB (.mat) and CSV files which were recorded with a 30-second and 40-second time resolution, respectively. The measurements of magnitude values were recorded by the Typhoon controller, while the data from the sinusoidal waveform were recorder by OPAL-RT. Thie dataset of the SUC2/S3 examines the operation of a wide area protection scheme which receives data sent from PMUs when a combined MITM with DoS cyber-attack is conducted, as actual attack, in the isolated communication network of the sandboxing environment, disrupting the C37.118 UDP communication exchanged between OPAL-RT 5707, where the digital twin of IEEE 9-bus system was implemented, and Typhoon controller. More details about this scenario associated to this dataset can be found in Section 1.3.3 of the supporting document of SUC2. This dataset includes electrical measurements of current’s flow magnitude of the transmission line between buses 7 and 8 of the digital twin of the IEEE 9-bus system. The dataset was recorded by the Typhoon controller, and it is provided in the form of time-series measurements available as MATLAB (.mat) and CSV files which were recorded with a 30-second and 40-second time resolution, respectively. In addition, the dataset includes network traffic packets captured as .pcapng and .csv files.

  20. MedSec-25: IoMT Cybersecurity Dataset

    • kaggle.com
    zip
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Abdullah (2025). MedSec-25: IoMT Cybersecurity Dataset [Dataset]. https://www.kaggle.com/datasets/abdullah001234/medsec-25-iomt-cybersecurity-dataset
    Explore at:
    zip(38496221 bytes)Available download formats
    Dataset updated
    Sep 8, 2025
    Authors
    Muhammad Abdullah
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview

    MedSec-25 is a comprehensive, labeled network traffic dataset designed specifically for the Internet of Medical Things (IoMT) in healthcare environments. It addresses the limitations of existing generic IoT datasets by capturing realistic traffic from a custom-built healthcare IoT lab that mimics real-world hospital operations. The dataset includes both benign (normal) traffic and malicious traffic from multi-staged attack campaigns inspired by the MITRE ATT&CK framework. This allows for the development and evaluation of machine learning-based intrusion detection systems (IDS) tailored to IoMT scenarios, where patient safety and data privacy are critical. The dataset was generated using a variety of medical sensors (e.g., ECG, EEG, HHI, Respiration, SpO2) and environmental sensors (e.g., thermistor, ultrasonic, PIR, flame) connected via Raspberry Pi nodes and an IoT server. Traffic was captured over 7.5 hours using tools like Wireshark and tcpdump, resulting in PCAPNG files. These were processed with CICFlowMeter to extract flow-based features, producing a cleaned CSV dataset with 554,534 bidirectional network flows and 84 features.

    Key Highlights:

    Realistic Setup: Built in a physical lab at Rochester Institute of Technology, Dubai, incorporating diverse IoMT devices, protocols (e.g., MQTT, SSH, Telnet, FTP, HTTP, DNS), and real-time patient interactions (anonymized to comply with privacy regulations like HIPAA).

    Multi-Staged Attacks: Unlike datasets focusing on isolated attacks, MedSec-25 simulates full attack chains: Reconnaissance (e.g., SYN/TCP scans, OS fingerprinting), Initial Access (e.g., brute-force, malformed MQTT packets), Lateral Movement (e.g., exploiting vulnerabilities to pivot between devices), and Exfiltration (e.g., data theft via MQTT).

    Imbalanced Nature: This is the cleaned (imbalanced) version of the dataset. Users may need to apply balancing techniques (e.g., SMOTE oversampling + random undersampling) for model training, as demonstrated in the associated paper.

    Size and Quality: 554,534 rows, no duplicates, no missing values (except 111 NaNs in Flow Byts/s, ~0.02%, which can be handled via imputation). Data types include float64 (45 columns), int64 (34 columns), and object (5 columns: Flow ID, Src IP, Dst IP, Timestamp, Label).

    Utility: Preliminary models trained on this dataset (e.g., KNN: 98.09% accuracy, Decision Tree: 98.35% accuracy) show excellent performance for detecting attack stages.

    This dataset is ideal for researchers in cybersecurity, machine learning, and healthcare IoT, enabling the creation of an IDS that can detect attacks at different phases to prevent escalation.

    Data Collection

    Benign Traffic: Generated over two days with active sensors, services (HTTP dashboard for patient monitoring, SSH/Telnet for remote access, FTP for file transfers), and real users (students/faculty) interacting with medical devices. No personally identifiable information was stored.

    Malicious Traffic: Two Kali Linux attacker machines simulated MITRE ATT&CK-inspired campaigns using tools like Nmap, Scapy, Metasploit, and custom Python scripts.

    Capture Tools: Wireshark and tcpdump for PCAPNG files (total ~1GB: 600MB benign, 400MB malicious).

    Processing: Combined PCAP files per label, extracted features with CICFlowMeter, labeled flows manually based on attack phases, and cleaned for ML readiness. The final cleaned CSV is ~350MB.

    Features

    The dataset includes 84 features extracted by CICFlowMeter, categorized as:

    Identifiers: Flow ID, Src IP, Src Port, Dst IP, Dst Port, Protocol, Timestamp.

    Time-Series Metrics: Flow Duration, Flow IAT Mean/Std/Max/Min, Fwd/Bwd IAT Tot/Mean/Std/Max/Min.

    Size/Count Statistics: Tot Fwd/Bwd Pkts, TotLen Fwd/Bwd Pkts, Fwd/Bwd Pkt Len Max/Min/Mean/Std, Pkt Len Min/Max/Mean/Std/Var, Pkt Size Avg.

    Flag Counts: Fwd/Bwd PSH/URG Flags, FIN/SYN/RST/PSH/ACK/URG/CWE/ECE Flag Cnt.

    Rates and Ratios: Flow Byts/s, Flow Pkts/s, Fwd/Bwd Pkts/s, Down/Up Ratio, Active/Idle Mean/Std/Max/Min.

    Segmentation and Others: Fwd/Bwd Seg Size Avg/Min, Subflow Fwd/Bwd Pkts/Byts, Init Fwd/Bwd Win Byts, Fwd Act Data Pkts, Fwd/Bwd Byts/b Avg, Fwd/Bwd Pkts/b Avg, Fwd/Bwd Blk Rate Avg.

    Labels

    The dataset is labeled with 5 classes representing benign behavior and attack stages:

    Reconnaissance: 401,683 flows Initial Access: 102,090 flows Exfiltration: 25,915 flows Lateral Movement: 12,498 flows Benign: 12,348 flows

    Note: The dataset is imbalanced, with Reconnaissance dominating. Apply balancing techniques for optimal ML performance.

    Usage

    Preprocessing Suggestions: Encode categorical features (e.g., Protocol, Label) using LabelEncoder. Normalize numerical features with Min-Max Scaler or StandardScaler. Handle the minor NaNs in Flow Byts/s via mean imputation.

    Model Training: Split into train/test (e.g., 80/20). Suitable for classification tasks w...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Atharva Soundankar (2025). 🌐 Global Cybersecurity Threats (2015-2024) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/global-cybersecurity-threats-2015-2024
Organization logo

🌐 Global Cybersecurity Threats (2015-2024)

A comprehensive dataset tracking cybersecurity incidents, attack vectors, threat

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(48178 bytes)Available download formats
Dataset updated
Mar 16, 2025
Authors
Atharva Soundankar
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

📂

The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries. It is designed for threat intelligence analysis, cybersecurity trend forecasting, and machine learning model development to enhance global digital security.

📊 Column Descriptions

Column NameDescription
CountryCountry where the attack occurred
YearYear of the incident
Threat TypeType of cybersecurity threat (e.g., Malware, DDoS)
Attack VectorMethod of attack (e.g., Phishing, SQL Injection)
Affected IndustryIndustry targeted (e.g., Finance, Healthcare)
Data Breached (GB)Volume of data compromised
Financial Impact ($M)Estimated financial loss in millions
Severity LevelLow, Medium, High, Critical
Response Time (Hours)Time taken to mitigate the attack
Mitigation StrategyCountermeasures taken
Search
Clear search
Close search
Google apps
Main menu