100+ datasets found
  1. P

    Data from: Cybersecurity Threat Detection Dataset

    • paperswithcode.com
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Cybersecurity Threat Detection Dataset [Dataset]. https://paperswithcode.com/dataset/cybersecurity-threat-detection
    Explore at:
    Dataset updated
    Mar 7, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    Organizations face an increasing number of sophisticated cybersecurity threats, including malware, phishing attacks, and unauthorized access. A financial institution experienced frequent attempts to breach its network, risking sensitive data and regulatory compliance. Traditional security measures were reactive and failed to detect threats in real time. The institution sought a proactive AI-driven solution to identify and prevent cybersecurity threats effectively.

    Challenge

    Developing an advanced threat detection system required addressing several challenges:

    Processing and analyzing large volumes of network traffic and user activity data in real time.

    Identifying new and evolving threats, such as zero-day vulnerabilities, with high accuracy.

    Minimizing false positives to ensure security teams could focus on genuine threats.

    Solution Provided

    An AI-powered threat detection system was developed using machine learning algorithms and advanced analytics. The solution was designed to:

    Continuously monitor network activity and user behavior to identify suspicious patterns.

    Detect and neutralize cybersecurity threats in real time, including malware and phishing attempts.

    Provide actionable insights to security teams for faster and more effective threat response.

    Development Steps

    Data Collection

    Collected network traffic logs, endpoint activity, and historical threat data to train machine learning models.

    Preprocessing

    Cleaned and standardized data, ensuring compatibility across diverse sources, and filtered out noise for accurate analysis.

    Model Development

    Developed machine learning algorithms for anomaly detection, behavioral analysis, and threat classification. Trained models on labeled datasets to recognize known threats and identify emerging attack patterns.

    Validation

    Tested the system against simulated and real-world threat scenarios to evaluate detection accuracy, response times, and reliability.

    Deployment

    Integrated the threat detection system into the institution’s existing cybersecurity infrastructure, including firewalls, SIEM (Security Information and Event Management) tools, and endpoint protection

    Continuous Monitoring & Improvement

    Established a feedback loop to refine models using new threat data and adapt to evolving attack strategies.

    Results

    Enhanced Security Posture

    The system improved the institution’s ability to detect and prevent cybersecurity threats proactively, strengthening its overall security framework.

    Reduced Incidence of Cyber Attacks

    Real-time detection and response significantly reduced the frequency and impact of successful cyber attacks.

    Improved Threat Response Times

    Automated threat identification and prioritization enabled security teams to respond faster and more effectively to potential breaches.

    Minimized False Positives

    Advanced algorithms reduced false alarms, allowing security teams to focus on genuine threats and improve efficiency.

    Scalable and Adaptive Solution

    The system adapted to new threats and scaled effortlessly to protect growing organizational networks and data.

  2. IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

    • zenodo.org
    • data.niaid.nih.gov
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338
    Explore at:
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Article Information

    The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

    Please do cite the aforementioned article when using this dataset.

    Abstract

    The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

    ZIP Folder Content

    The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

    To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

    This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

    Datasets' Content

    Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

    Identified Key Features Within Bluetooth Dataset

    FeatureMeaning
    btle.advertising_headerBLE Advertising Packet Header
    btle.advertising_header.ch_selBLE Advertising Channel Selection Algorithm
    btle.advertising_header.lengthBLE Advertising Length
    btle.advertising_header.pdu_typeBLE Advertising PDU Type
    btle.advertising_header.randomized_rxBLE Advertising Rx Address
    btle.advertising_header.randomized_txBLE Advertising Tx Address
    btle.advertising_header.rfu.1Reserved For Future 1
    btle.advertising_header.rfu.2Reserved For Future 2
    btle.advertising_header.rfu.3Reserved For Future 3
    btle.advertising_header.rfu.4Reserved For Future 4
    btle.control.instantInstant Value Within a BLE Control Packet
    btle.crc.incorrectIncorrect CRC
    btle.extended_advertisingAdvertiser Data Information
    btle.extended_advertising.didAdvertiser Data Identifier
    btle.extended_advertising.sidAdvertiser Set Identifier
    btle.lengthBLE Length
    frame.cap_lenFrame Length Stored Into the Capture File
    frame.interface_idInterface ID
    frame.lenFrame Length Wire
    nordic_ble.board_idBoard ID
    nordic_ble.channelChannel Index
    nordic_ble.crcokIndicates if CRC is Correct
    nordic_ble.flagsFlags
    nordic_ble.packet_counterPacket Counter
    nordic_ble.packet_timePacket time (start to end)
    nordic_ble.phyPHY
    nordic_ble.protoverProtocol Version

    Identified Key Features Within IP-Based Packets Dataset

    FeatureMeaning
    http.content_lengthLength of content in an HTTP response
    http.requestHTTP request being made
    http.response.codeSequential number of an HTTP response
    http.response_numberSequential number of an HTTP response
    http.timeTime taken for an HTTP transaction
    tcp.analysis.initial_rttInitial round-trip time for TCP connection
    tcp.connection.finTCP connection termination with a FIN flag
    tcp.connection.synTCP connection initiation with SYN flag
    tcp.connection.synackTCP connection establishment with SYN-ACK flags
    tcp.flags.cwrCongestion Window Reduced flag in TCP
    tcp.flags.ecnExplicit Congestion Notification flag in TCP
    tcp.flags.finFIN flag in TCP
    tcp.flags.nsNonce Sum flag in TCP
    tcp.flags.resReserved flags in TCP
    tcp.flags.synSYN flag in TCP
    tcp.flags.urgUrgent flag in TCP
    tcp.urgent_pointerPointer to urgent data in TCP
    ip.frag_offsetFragment offset in IP packets
    eth.dst.igEthernet destination is in the internal network group
    eth.src.igEthernet source is in the internal network group
    eth.src.lgEthernet source is in the local network group
    eth.src_not_groupEthernet source is not in any network group
    arp.isannouncementIndicates if an ARP message is an announcement

    Identified Key Features Within IP-Based Flows Dataset

    FeatureMeaning
    protoTransport layer protocol of the connection
    serviceIdentification of an application protocol
    orig_bytesOriginator payload bytes
    resp_bytesResponder payload bytes
    historyConnection state history
    orig_pktsOriginator sent packets
    resp_pktsResponder sent packets
    flow_durationLength of the flow in seconds
    fwd_pkts_totForward packets total
    bwd_pkts_totBackward packets total
    fwd_data_pkts_totForward data packets total
    bwd_data_pkts_totBackward data packets total
    fwd_pkts_per_secForward packets per second
    bwd_pkts_per_secBackward packets per second
    flow_pkts_per_secFlow packets per second
    fwd_header_sizeForward header bytes
    bwd_header_sizeBackward header bytes
    fwd_pkts_payloadForward payload bytes
    bwd_pkts_payloadBackward payload bytes
    flow_pkts_payloadFlow payload bytes
    fwd_iatForward inter-arrival time
    bwd_iatBackward inter-arrival time
    flow_iatFlow inter-arrival time
    activeFlow active duration
  3. m

    Dataset Description for "Quantum AI for Cybersecurity Threat Prediction"

    • data.mendeley.com
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bindu Garg (2025). Dataset Description for "Quantum AI for Cybersecurity Threat Prediction" [Dataset]. http://doi.org/10.17632/fswng37vbz.2
    Explore at:
    Dataset updated
    Mar 20, 2025
    Authors
    Bindu Garg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is engineered to propel the development of quantum-enhanced anomaly detection systems for cybersecurity, merging real-world network traffic data with the potential for simulated attack scenarios. It comprises two datasets—malicious and non-malicious—crafted to train ML models, leveraging quantum AI to identify subtle anomalies and mitigate cyber threats, particularly those resistant to classical detection methods. Derived from Wireshark captures of normal web browsing and attack simulations, it provides a crucial baseline for quantum machine learning (QML) models.

    The dataset's strength lies in its fusion of traditional network attributes. These frequency features are paramount for QML algorithms to discern complex patterns indicative of malicious behavior. For instance, QML can identify minute deviations in source/destination frequency or unusual protocol usage, often missed by classical methods.

    Column Descriptions:

    No. (Record Number): Unique identifier. Time: Timestamp of activity. Source: Source device/IP. Source_Count: Source frequency. Destination: Destination device/IP. Destination_Count: Destination frequency. Protocol: Network protocol. Protocol_Count: Protocol frequency. Length: Packet size. Info: Contextual details.

    Uniqueness of the Dataset:

    • Two-Class Design: The dataset includes separate malicious and non-malicious traffic logs, essential for training ML models to differentiate between normal and attack patterns. • Frequency-Based Features: The inclusion of "Source_Count," "Destination_Count," and "Protocol_Count" significantly enhances analytical capabilities, allowing the detection of anomalies based on activity patterns. • Comprehensive Network Traffic Attributes: The dataset combines frequency features with standard network traffic attributes (Time, Source, Destination, Protocol, Length, Info), providing a holistic view of network activity. • Potential for Diverse Analysis: The combination of structured and semi-structured data (in the "Info" column) enables a wide range of analytical techniques, including time series analysis, machine learning, and natural language processing. • Cybersecurity Focus: Designed for cybersecurity threat prediction, it is valuable for researchers and practitioners in this domain. • Real-World and Simulated Attacks: The dataset includes both benign traffic and simulated attacks, making it ideal for testing security systems before deployment.

    Conclusion:

    This dataset, is a powerful tool for cybersecurity analysis. Its strength lies in its ability to establish a baseline and detect deviations, even subtle ones. The inclusion of malicious and non-malicious data enables precise model training for threat detection. It is vital for behavioral analysis, DDoS detection, malware analysis, forensics, and training. This dataset empowers security professionals to develop advanced solutions, enhancing network security by revealing valuable insights from seemingly routine network traffic.

  4. Drone-Based Malware Detection (DBMD)

    • kaggle.com
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DatasetEngineer (2024). Drone-Based Malware Detection (DBMD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/9045375
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DatasetEngineer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description Welcome to the Drone-Based Malware Detection dataset! This dataset is designed to aid researchers and practitioners in exploring innovative cybersecurity solutions using drone-collected data. The dataset contains detailed information on network traffic, drone sensor readings, malware detection indicators, and environmental conditions. It offers a unique perspective by integrating data from drones with traditional network security metrics to enhance malware detection capabilities.

    Dataset Overview The dataset comprises four main categories:

    Network Traffic Data: Captures network traffic attributes including IP addresses, ports, protocols, packet sizes, and various derived metrics. Drone Sensor Data: Includes GPS coordinates, altitude, speed, heading, battery level, and other sensor readings from drones. Malware Detection Data: Contains indicators and scores relevant to detecting malware, such as anomaly scores, suspicious IP counts, reputation scores, and attack types. Environmental Data: Provides context through environmental conditions like location type, noise level, weather conditions, and more. Files and Features The dataset is divided into four separate CSV files:

    network_traffic_data.csv

    timestamp: Date and time of the traffic event. source_ip: Source IP address. destination_ip: Destination IP address. source_port: Source port number. destination_port: Destination port number. protocol: Network protocol (TCP, UDP, ICMP). packet_length: Length of the network packet. payload_data: Content of the packet payload. flag: Network flag (SYN, ACK, FIN, RST). traffic_volume: Volume of traffic in bytes. flow_duration: Duration of the network flow. flow_bytes_per_s: Bytes per second for the flow. flow_packets_per_s: Packets per second for the flow. packet_count: Number of packets in the flow. average_packet_size: Average size of packets. min_packet_size: Minimum packet size. max_packet_size: Maximum packet size. packet_size_variance: Variance in packet sizes. header_length: Length of the packet header. payload_length: Length of the packet payload. ip_ttl: Time to live for the IP packet. tcp_window_size: TCP window size. icmp_type: ICMP type (echo_request, echo_reply, destination_unreachable). dns_query_count: Number of DNS queries. dns_response_count: Number of DNS responses. http_method: HTTP method (GET, POST, PUT, DELETE). http_status_code: HTTP status code (200, 404, 500, 301). content_type: Content type (text/html, application/json, image/png). ssl_tls_version: SSL/TLS version. ssl_tls_cipher_suite: SSL/TLS cipher suite. drone_data.csv

    latitude: Latitude of the drone. longitude: Longitude of the drone. altitude: Altitude of the drone. speed: Speed of the drone. heading: Heading of the drone. battery_level: Battery level of the drone. drone_id: Unique identifier for the drone. flight_time: Total flight time. signal_strength: Strength of the drone's signal. temperature: Temperature at the drone's location. humidity: Humidity at the drone's location. pressure: Atmospheric pressure at the drone's location. wind_speed: Wind speed at the drone's location. wind_direction: Wind direction at the drone's location. gps_accuracy: Accuracy of the GPS signal. malware_detection_data.csv

    anomaly_score: Score indicating the level of anomaly detected. suspicious_ip_count: Number of suspicious IP addresses detected. malicious_payload_indicator: Indicator for malicious payload (0 or 1). reputation_score: Reputation score for the network entity. behavioral_score: Behavioral score indicating potential malicious activity. attack_type: Type of attack (DDoS, phishing, malware). signature_match: Indicator for signature match (0 or 1). sandbox_result: Result from sandbox analysis (clean, infected). heuristic_score: Heuristic score for potential threats. traffic_pattern: Pattern of the traffic (burst, steady). environmental_data.csv

    location_type: Type of location (urban, rural). nearby_devices: Number of nearby devices. signal_interference: Level of signal interference. noise_level: Noise level in the environment. time_of_day: Time of day (morning, afternoon, evening, night). day_of_week: Day of the week. weather_conditions: Weather conditions (sunny, rainy, cloudy, stormy). Usage and Applications This dataset can be used for:

    Cybersecurity Research: Developing and testing algorithms for malware detection using drone data. Machine Learning: Training models to identify malicious activity based on network traffic and drone sensor readings. Data Analysis: Exploring the relationships between environmental conditions, drone sensor data, and network traffic anomalies. Educational Purposes: Teaching data science, machine learning, and cybersecurity concepts using a comprehensive and multi-faceted dataset.

    Acknowledgements This dataset is based on real-world data collected from drone sensors and network traffic monitoring s...

  5. Global cyberattack distribution 2023, by type

    • statista.com
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Global cyberattack distribution 2023, by type [Dataset]. https://www.statista.com/statistics/1382266/cyber-attacks-worldwide-by-type/
    Explore at:
    Dataset updated
    Nov 14, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    In 2023, ransomware was the most frequently detected cyberattack worldwide, with around 70 percent of all detected cyberattacks. Network breaches ranked second, with almost 19 percent of the detections. Although less frequently, data exfiltration was also among the detected cyberattacks.

  6. Open Source Cyber Security Market Report | Global Forecast From 2025 To 2033...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Open Source Cyber Security Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/open-source-cyber-security-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Open Source Cyber Security Market Outlook



    The global Open Source Cyber Security market size was valued at USD 5.2 billion in 2023 and is projected to reach USD 14.5 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 11.8% during the forecast period. This substantial growth is driven by increasing awareness about the benefits of open-source solutions, rising cyber threats, and stringent regulatory compliances.



    One of the primary factors fueling the growth of the Open Source Cyber Security market is the cost-effectiveness of open-source solutions compared to proprietary software. Open-source cyber security tools often come at a fraction of the cost of their commercial counterparts, making them highly attractive for organizations seeking to manage budgets efficiently. Additionally, the flexibility and customization capabilities offered by open-source solutions enable organizations to tailor the tools according to their specific security needs, which in turn drives adoption across various industries.



    Another significant growth driver is the mounting frequency and sophistication of cyber-attacks. As cyber threats evolve, organizations need robust and adaptable security measures to protect sensitive data and systems. Open-source cyber security tools are often at the forefront of innovation, with a large community of developers continuously improving and updating the software to address new vulnerabilities. This constant evolution ensures that open-source tools can effectively combat the latest threats, making them an essential component of modern cyber security strategies.



    Furthermore, the increasing regulatory pressure on organizations to maintain stringent security postures is propelling the adoption of open-source cyber security solutions. Regulations such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the US mandate robust data protection measures, encouraging businesses to invest in advanced security solutions. Open-source tools offer transparency and community-driven support, which can help organizations demonstrate compliance with these regulations, thereby fostering market growth.



    Security Orchestration is becoming increasingly vital in the realm of open-source cyber security solutions. As organizations face a growing number of cyber threats, the ability to efficiently coordinate and manage various security tools is crucial. Security Orchestration enables the integration of multiple security systems and processes, allowing for streamlined operations and improved incident response times. This capability is particularly beneficial in environments where open-source tools are deployed, as it helps to unify disparate systems and enhance overall security effectiveness. By automating routine tasks and facilitating better communication between security components, Security Orchestration empowers organizations to respond more swiftly and effectively to cyber threats, thereby strengthening their security posture.



    Regionally, North America is expected to dominate the Open Source Cyber Security market due to the presence of leading technology companies, high adoption rates of advanced technologies, and stringent regulatory frameworks. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period, driven by the rapid digital transformation of businesses, increasing awareness about cyber security, and supportive government initiatives aimed at enhancing cyber resilience.



    Component Analysis



    The Open Source Cyber Security market can be segmented by components into Software and Services. In the software segment, various types of open-source solutions such as intrusion detection systems, firewalls, security information and event management (SIEM) systems, and encryption tools are gaining traction. These solutions offer robust protection against a wide range of cyber threats, making them essential for organizations across different sectors. The continuous evolution and innovation in open-source software, driven by a collaborative community of developers, ensure that these tools remain effective in mitigating the latest cyber threats.



    On the services front, the market includes professional services such as consulting, training, and support, as well as managed security services. Professional services are crucial for organizations that require expert guidance to implement and optimize open-source s

  7. o

    Comprehensive, Multi-Source Cyber-Security Events Data Set

    • osti.gov
    Updated May 21, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2015). Comprehensive, Multi-Source Cyber-Security Events Data Set [Dataset]. http://doi.org/10.17021/1179829
    Explore at:
    Dataset updated
    May 21, 2015
    Dataset provided by
    USDOE Office of Science (SC)
    Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
    Description

    This data set represents 58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network. The data sources include Windows-based authentication events from both individual computers and centralized Active Directory domain controller servers; process start and stop events from individual Windows computers; Domain Name Service (DNS) lookups as collected on internal DNS servers; network flow data as collected on at several key router locations; and a set of well-defined red teaming events that present bad behavior within the 58 days. In total, the data set is approximately 12 gigabytes compressed across the five data elements and presents 1,648,275,307 events in total for 12,425 users, 17,684 computers, and 62,974 processes. Specific users that are well known system related (SYSTEM, Local Service) were not de-identified though any well-known administrators account were still de-identified. In the network flow data, well-known ports (e.g. 80, 443, etc) were not de-identified. All other users, computers, process, ports, times, and other details were de-identified as a unified set across all the data elements (e.g. U1 is the same U1 in all of the data). The specific timeframe used is not disclosed for security purposes. In addition, no data that allows association outside of LANL’s network is included. All data starts with a time epoch of 1 using a time resolution of 1 second. In the authentication data, failed authentication events are only included for users that had a successful authentication event somewhere within the data set.

  8. c

    Data from: A Dataset of Cyber-Induced Mechanical Faults on Buildings with...

    • s.cnmilf.com
    • data.openei.org
    • +2more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). A Dataset of Cyber-Induced Mechanical Faults on Buildings with Network and Buildings Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/a-dataset-of-cyber-induced-mechanical-faults-on-buildings-with-network-and-buildings-data-54439
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    We have collected data of cyber-induced mechanical faults on buildings using a simulation platform. A DOE reference building model was used for running the simulation under a Rogue device attack and collected the network data as well as the physical buildings data to better understand the impacts of cyber attacks on the building and help identify the source of the mechanical fault with the network data. Alfalfa is the tool used for simulating the DOE reference buildings and acts as an interface to the model for querying the status and providing input externally. The Building Automation System (BAS) is the centralized controller providing control commands to other BACnet devices on the network based on the building status received from Alfalfa. The BACnet devices like damper will listen for the control commands from BAS on the BACnet network and implement it. The attacker is the malicious actor on the network creating disruptions by placing cyber-attacks.

  9. Cyber Security Situational Awareness Market Report | Global Forecast From...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Cyber Security Situational Awareness Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-cyber-security-situational-awareness-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 12, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Cyber Security Situational Awareness Market Outlook



    The global cyber security situational awareness market size was valued at approximately $29.2 billion in 2023 and is projected to reach around $72.4 billion by 2032, growing at a CAGR of 10.6% during the forecast period. The primary growth factors driving this market are the increasing frequency and sophistication of cyber-attacks and the growing adoption of IoT and connected devices, which necessitate advanced security measures to ensure data integrity and network security.



    The rapid digital transformation across industries presents both opportunities and challenges in terms of cyber security. As organizations increasingly rely on digital platforms and interconnected systems, the threat landscape becomes more complex and dynamic. This has led to a heightened demand for robust cyber security situational awareness solutions, which provide real-time visibility, threat detection, and response capabilities. The growing regulatory requirements for data protection and privacy also play a crucial role in driving market growth, as businesses strive to comply with stringent regulations such as GDPR, HIPAA, and others.



    Moreover, the rise of remote work and the increased use of cloud services have expanded the attack surface for malicious actors. Organizations are now more vulnerable to phishing attacks, ransomware, and other forms of cyber threats. This has led to a greater emphasis on enhancing cyber security frameworks and adopting advanced situational awareness tools. The integration of artificial intelligence (AI) and machine learning (ML) in cyber security solutions is another significant growth factor, enabling faster detection and mitigation of threats while reducing false positives.



    Furthermore, the increasing investment in cybersecurity by both public and private sectors is expected to fuel market growth. Governments worldwide are recognizing the importance of protecting critical infrastructure and are allocating significant resources to bolster cyber defenses. Private enterprises, facing the potential financial and reputational damage from cyber incidents, are also increasingly investing in advanced security solutions to safeguard their operations. This collective effort to enhance cyber resilience is a key driver of the cyber security situational awareness market.



    From a regional perspective, North America currently holds the largest market share due to the presence of major technology companies and a high adoption rate of advanced cyber security solutions. However, the Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, driven by rapid digitalization, expanding internet penetration, and increasing awareness of cyber threats. Europe also remains a significant market, with stringent data protection regulations and substantial investments in cyber security infrastructure.



    Component Analysis



    The cyber security situational awareness market by component can be broadly segmented into solutions and services. Solutions encompass a variety of software and hardware tools designed to provide comprehensive situational awareness, including threat intelligence platforms, intrusion detection systems, security information and event management (SIEM) systems, and advanced analytics solutions. These tools are crucial for identifying, analyzing, and mitigating potential threats in real-time, ensuring the security and integrity of an organization's network and data.



    Within the solutions segment, SIEM systems are particularly notable for their ability to collect and analyze security-related data from various sources, providing a unified view of an organization's security posture. These systems leverage advanced analytics and machine learning algorithms to detect anomalies and potential threats, facilitating faster response times. Intrusion detection systems, on the other hand, focus on identifying unauthorized access attempts and other malicious activities within a network, enabling organizations to take proactive measures to thwart attacks.



    On the services side, the market includes professional services such as consulting, training, and implementation, as well as managed services. Consulting services help organizations assess their current security posture, identify vulnerabilities, and develop strategies for enhancing situational awareness. Training services are essential for building the skills and knowledge required to effectively use advanced cyber security tools and respond to threats. Implementation services ensure that security sol

  10. Number of data compromises and impacted individuals in U.S. 2005-2024

    • statista.com
    • ai-chatbox.pro
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

  11. Large-Scale Attacks in IoT Environment

    • kaggle.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Manaenkov (2025). Large-Scale Attacks in IoT Environment [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/large-scale-attacks-in-iot-environment
    Explore at:
    zip(1474647877 bytes)Available download formats
    Dataset updated
    May 7, 2025
    Authors
    Nikita Manaenkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CICIoT2023 dataset is a large-scale, realistic intrusion detection dataset designed to support security analytics and machine learning research in the Internet of Things (IoT) domain. Created by the Canadian Institute for Cybersecurity (CIC), the dataset captures 33 different types of attacks (including DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai) executed by malicious IoT devices against other IoT targets.

    The testbed consists of 105 real IoT devices of different types and manufacturers, including smart home devices and industrial equipment, configured in a complex network topology to emulate real-world conditions. The dataset includes benign and malicious traffic in various formats and supports feature extraction for both traditional ML and deep learning models.

    This dataset aims to address the lack of diversity and scale in previous IoT security datasets, offering a robust benchmark for evaluating intrusion detection systems (IDS) and enabling research in IoT cybersecurity, anomaly detection, and network forensics.

    Source https://www.mdpi.com/1424-8220/23/13/5941

  12. P

    Forchheim Image Dataset Dataset

    • paperswithcode.com
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Forchheim Image Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/forchheim-image-dataset
    Explore at:
    Dataset updated
    Mar 18, 2025
    Description

    Description:

    👉 Download the dataset here

    The Forchheim Image Dataset is design specifically for Source Camera Identification (SCI) tasks, offering a diverse range of images captured from various devices. This dataset is an essential resource for forensic and cybersecurity professionals who are working to trace the origin of digital images to the cameras that captured them.

    The dataset contains a total of 3,851 high-resolution images, meticulously curated from 27 distinct digital devices, ensuring broad representation across different camera models and manufacturers. To maintain the focus on Source Camera Identification, only the ‘original’ (unprocessed) images from each device have been retain, while all other derivative files, such as edited or compressed versions, have been exclude.

    Download Dataset

    Key Features of the Forchheim Dataset:

    Diversity of Devices: Includes images from 27 unique devices, ranging from smartphones to high-end cameras, covering various sensor types, lenses, and software configurations.

    High-Quality Images: All images are preserved in their original, unaltered formats to ensure authenticity and integrity for SCI tasks.

    Exclusively for SCI: Derivative files and any post-processed images have been remove, ensuring that the dataset strictly serves the purpose of source camera identification.

    Applications: Ideal for forensic analysis, digital media forensics, image authentication, and cybersecurity research where tracing the origin of images is critical.

    Dataset Structure: The dataset is organize into folders by device, making it easier for researchers to access and analyze images base on their source.

    Potential Use Cases:

    Forensic Analysis: Identifying the source of images in legal cases or criminal investigations.

    Cybersecurity: Detecting manipulated or unauthorized images used in malicious campaigns.

    Academic Research: Training and testing machine learning models for image source attribution

    This dataset is sourced from Kaggle.

  13. Open Source Security Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Open Source Security Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/open-source-security-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 5, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Open Source Security Market Outlook



    The global open source security market size was valued at approximately USD 2.5 billion in 2023 and is expected to grow to around USD 7.9 billion by 2032, reflecting a robust compound annual growth rate (CAGR) of 13.6% during the forecast period. This growth is primarily driven by the increasing adoption of open source software (OSS) across various industries due to its cost-effectiveness and flexibility, coupled with a growing awareness of cybersecurity threats.



    One of the primary growth factors for the open source security market is the escalating number of cyber threats and data breaches, which have heightened the need for more robust security measures. Organizations are increasingly turning to open source security solutions to safeguard their systems and data. The flexibility and transparency offered by open source solutions allow organizations to customize security measures to fit their specific needs, which is an attractive proposition compared to proprietary software.



    Another significant growth driver is the rising adoption of open source software in enterprise IT ecosystems. As more businesses leverage OSS for various applications, from web development to cloud computing, the need for effective security solutions becomes paramount. Open source security tools are often more adaptable and rapidly updated, enabling organizations to quickly address vulnerabilities and stay ahead of potential threats. The collaborative nature of open source communities also means that security solutions benefit from continuous contributions from a global pool of developers.



    Additionally, cost considerations play a crucial role in the market's expansion. Open source security solutions often come with lower upfront costs compared to proprietary alternatives, making them particularly appealing to small and medium-sized enterprises (SMEs) that may have limited IT budgets. This cost advantage, combined with the potential for reduced total cost of ownership due to the ability to modify and improve the software, is expected to fuel the market's growth further.



    Regionally, North America is anticipated to hold the largest market share during the forecast period, driven by the early adoption of advanced technologies and a strong focus on cybersecurity. However, the Asia Pacific region is expected to witness the highest growth rate due to the rapid digital transformation in emerging economies like India and China, increasing cybersecurity investments, and the growing implementation of OSS across various industries.



    Component Analysis



    The open source security market is segmented by component into software and services. Software comprises various security tools and applications designed to protect open source environments, including firewalls, intrusion detection systems, and security monitoring tools. The software segment is expected to dominate the market due to the increasing deployment of open source security software that offers extensive customization and integration capabilities. These tools are essential for organizations to maintain the security and integrity of their open source applications.



    On the other hand, the services segment includes consulting, implementation, and maintenance services. As organizations adopt open source security solutions, the demand for expert services to effectively implement and manage these solutions is growing. Consulting services help organizations assess their security posture and develop strategies to mitigate risks. Implementation services ensure that open source security tools are correctly deployed and configured, while maintenance services provide ongoing support and updates to keep the security measures effective.



    The services segment is also set to experience significant growth, driven by the increasing complexity of cybersecurity threats and the need for specialized expertise. Many organizations prefer to outsource their security needs to external experts who can provide up-to-date knowledge and skills. This trend is particularly prominent among SMEs, which may lack the resources to maintain an in-house security team.



    Furthermore, the integration of artificial intelligence (AI) and machine learning (ML) into open source security solutions is enhancing their capabilities. AI and ML-powered security tools can analyze vast amounts of data to detect anomalies and predict potential threats, providing organizations with advanced protection mechanisms. This technological advancement is expected to drive the growth of bot

  14. o

    Threat Intelligence Text Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Threat Intelligence Text Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/8293a044-4601-409d-898b-a16bf6852ae2
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Website Analytics & User Experience
    Description

    This curated dataset, Cyber-BERT, is designed for Natural Language Processing (NLP) applications within the cybersecurity domain. It contains text extracted from various cybersecurity sources, encompassing topics such as malware analysis, vulnerabilities, cyber threats, and network security. The dataset is well-suited for training BERT-based models to perform essential tasks like threat detection, text classification, and broader cybersecurity research. The data has been meticulously preprocessed to ensure cleanliness, with URLs, non-text symbols, HTML tags, metadata, and redundant content removed.

    Columns

    • text: This column contains the processed cybersecurity-related text.

    Distribution

    The dataset is typically provided in a CSV file format, making it readily accessible for various applications. It contains approximately 50,000 samples, though the exact number may vary based on collection updates. The data has undergone significant preprocessing to enhance its utility for NLP tasks, including the removal of URLs, non-text symbols, HTML tags, metadata, and duplicate entries.

    Usage

    This dataset offers a range of valuable applications, including: * Cyber Threat Detection: Utilise the dataset to train models for classifying security threats. * Named Entity Recognition (NER): Identify and extract key entities such as malware, exploits, and vulnerabilities from cybersecurity text. * Threat Intelligence Analysis: Extract valuable insights from cybersecurity reports and other relevant texts. * BERT Fine-Tuning: Build specialised NLP models tailored for security domains and specific cybersecurity challenges.

    Coverage

    The text within this dataset is extracted from prominent cybersecurity sources including TheHackerNews, CVE Details, Any.Run, and OpenPhish. The dataset's scope is global. Specific time ranges for the data content itself are not provided.

    License

    CCO

    Who Can Use It

    This dataset is an excellent resource for: * Researchers focused on advancing NLP techniques in cybersecurity. * Data Scientists and Machine Learning Engineers developing threat detection systems or text classification models. * Security Analysts looking to automate aspects of threat intelligence analysis. * Anyone involved in building specialised NLP models for security domains.

    Dataset Name Suggestions

    • Cyber-BERT
    • Cybersecurity NLP Corpus
    • Threat Intelligence Text Dataset
    • Security Text Analytics Data
    • BERT Security Dataset

    Attributes

    Original Data Source: Cyber-BERT

  15. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  16. I

    Intrusion Detection System Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Intrusion Detection System Software Report [Dataset]. https://www.datainsightsmarket.com/reports/intrusion-detection-system-software-1967301
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Intrusion Detection System (IDS) Software market is experiencing robust growth, driven by the escalating need for robust cybersecurity solutions across various sectors. The increasing frequency and sophistication of cyberattacks, coupled with stringent data privacy regulations like GDPR and CCPA, are compelling organizations to invest heavily in advanced IDS software. The market's expansion is further fueled by the proliferation of connected devices and the adoption of cloud computing, which expand the attack surface and necessitate comprehensive security measures. While the precise market size for 2025 isn't provided, considering a reasonable CAGR of 15% (a conservative estimate given the market dynamics) and assuming a 2024 market size of $8 billion (a plausible figure based on industry reports), the 2025 market size would be approximately $9.2 billion. This growth is expected to continue throughout the forecast period (2025-2033), driven by continuous innovation in detection techniques (like AI/ML-powered solutions), increasing demand for managed security services, and the growing adoption of hybrid cloud environments. Significant market segmentation exists, encompassing network-based IDS, host-based IDS, and cloud-based IDS. Network-based IDS dominates currently but the cloud-based segment is exhibiting the fastest growth rate. Leading vendors such as SolarWinds, ManageEngine, Cisco, and Splunk are actively competing to provide comprehensive, scalable, and user-friendly solutions. However, the market also features a considerable number of open-source options (like Snort, Suricata, and Zeek), offering cost-effective alternatives for smaller organizations. While the market faces restraints such as the complexity of implementation and maintenance, the rising cybersecurity threats are likely to outweigh these challenges, ensuring sustained market expansion in the coming years. This market analysis highlights the significant opportunities and challenges present within the IDS Software market, demonstrating its importance in the ever-evolving cybersecurity landscape.

  17. O

    Open Source Cyber Intelligence Tools Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Open Source Cyber Intelligence Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/open-source-cyber-intelligence-tools-40013
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global open source cyber intelligence tools market size was valued at $2107.3 million in 2023 and is projected to reach $3819.2 million by 2033, exhibiting a CAGR of 11.4% during the forecast period (2023-2033). The rising demand for advanced cyber threat detection and mitigation solutions, the increasing adoption of open source software in the cybersecurity industry, and the growing need to protect sensitive information from cyber attacks are driving the growth of the market. Additionally, the increasing adoption of cloud-based solutions and the growing awareness of cyber threats are further boosting the market expansion. North America holds the largest market share due to the presence of key players and the early adoption of advanced technologies. Asia Pacific is expected to witness robust growth due to the increasing number of cyber threats and the growing awareness about cybersecurity risks. The market is highly competitive, with key players offering a wide range of open source cyber intelligence tools. The key players in the market include Thales Group, Palantir Technologies, Cognyte, OpenText (Micro Focus), Recorded Future, Expert System, Hensoldt Analytics, Maltego, Cyware, and Babel Street. The open source cyber intelligence tools market is expected to grow from USD 1.5 billion in 2023 to USD 3.4 billion by 2033, at a CAGR of 9.5% over the forecast period.

  18. d

    5.12 Cybersecurity (summary) - Archived

    • catalog.data.gov
    • performance.tempe.gov
    • +6more
    Updated Jan 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 5.12 Cybersecurity (summary) - Archived [Dataset]. https://catalog.data.gov/dataset/5-12-cybersecurity-summary-823d7
    Explore at:
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    City of Tempe
    Description

    The National Institute of Standards and Technology (NIST) provides a Cybersecurity Framework (CSF) for benchmarking and measuring the maturity level of cyber security programs across all industries. The City uses this framework and toolset to measure and report on its internal cyber security program.The foundation for this measure is the Framework Core, a set of cybersecurity activities, desired outcomes and applicable references that are common across critical infrastructure/industry sectors. These activities come from the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF) published standard, along with the information security and customer privacy controls it references (NIST 800 Series Special Publications). The Framework Core presents industry standards, guidelines, and practices in a manner that allows for communication of cybersecurity activities and outcomes across the organization from the executive level to the implementation/operations level. The Framework Core consists of five concurrent and continuous functions – identify, protect, detect, respond, and recover. When considered together, these functions provide a high-level, strategic view of the lifecycle of an organization’s management of cybersecurity risk. The Framework Core identifies underlying key categories and subcategories for each function, and matches them with example references, such as existing standards, guidelines and practices for each subcategory. This page provides data for the Cybersecurity performance measure.Cybersecurity Framework cumulative score summary per fiscal year quarter (Performance Measure 5.12)The performance measure page is available at 5.12 Cybersecurity.Additional InformationSource: Maturity assessment / https://www.nist.gov/topics/cybersecurityContact: Scott CampbellContact E-Mail: Scott_Campbell@tempe.govData Source Type: ExcelPreparation Method: The data is a summary of a detailed and confidential analysis of the city's cyber security program. Maturity scores of subcategories within NIST CFS are combined, averaged and rolled up to a summary score for each major category.Publish Frequency: AnnualPublish Method: ManualData Dictionary

  19. D

    Defense Cyber Security Market Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Defense Cyber Security Market Report [Dataset]. https://www.marketreportanalytics.com/reports/defense-cyber-security-market-89169
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Apr 20, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global defense cybersecurity market is experiencing robust growth, projected to reach $22.95 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 12.82% from 2025 to 2033. This expansion is driven by several key factors. Firstly, the increasing sophistication and frequency of cyberattacks targeting defense infrastructure necessitate robust and advanced cybersecurity solutions. Governments worldwide are significantly increasing their investments in bolstering their national security posture, recognizing the critical role cybersecurity plays in protecting sensitive data, critical infrastructure, and military operations. Secondly, the adoption of cloud computing and Internet of Things (IoT) devices within defense organizations expands the attack surface, making comprehensive cybersecurity measures indispensable. Finally, the growing need for proactive threat intelligence and advanced training programs for cybersecurity professionals further fuels market growth. The market is segmented into various solutions, including defense solutions, threat assessment, network fortification, and training services, each contributing to the overall market expansion. Leading companies such as General Dynamics-CSRA, Raytheon Technologies Corporation, and Lockheed Martin Corporation are at the forefront of innovation, developing and deploying cutting-edge cybersecurity technologies to meet the evolving needs of the defense sector. The North American region, particularly the United States, currently dominates the market, driven by substantial defense budgets and advanced technological capabilities. However, the Asia-Pacific region is expected to witness significant growth during the forecast period, fueled by increasing defense spending in countries like China, India, and Japan, and a rising awareness of cybersecurity threats. Europe also presents a substantial market opportunity, driven by increasing cross-border cyber threats and a greater emphasis on cybersecurity within the defense sector. The continued development of artificial intelligence (AI)-powered cybersecurity solutions, enhanced data analytics for threat detection, and the integration of cybersecurity into the broader defense ecosystem will shape future market trends. While challenges such as the high cost of implementation and a shortage of skilled cybersecurity professionals exist, the overall market outlook remains highly positive, suggesting a sustained period of growth and innovation in the coming years. Recent developments include: May 2023: SAIC has introduced its new encrypted query analytics and data retrieval (EQADR) platform. The platform is capable of next-generation cryptographic, cross-boundary data search, retrieval, and analysis. The EQADR has been designed with a view to making it quicker, safer, and more reliable in terms of data search and retrieval. EQADR’s cross-domain strategy delivers targeted, on-demand queries from higher-side networks to lower-side networks while securing sources, methods, and analytical tradecraft. The platform is designed to handle sensitive data transfers, allowing search terms to remain hidden and enabling it to make an effective sift through open source data with a view to reducing classified data storage costs and sharing intellectual property., December 2022: The Army Evaluates Zero Trust Cybersecurity for JADC2, the company to attain the scale Operational Zero Trust to accommodate different Army command levels and demonstrated the platform’s ability to detect and respond to malicious attacks in a warfighting environment using a digital model and Army to test technologies for joint all-domain command and control, also known as JADC2. The Pentagon'swide effort is focused on linking platforms via a shared network in which decision-making data from multiple sensors and shooters are rapidly transmitted.. Key drivers for this market are: Growing Severity of Cyber Attacks on Military/Government Organizations, Increasing Government Initiatives to Secure Critical Data. Potential restraints include: Growing Severity of Cyber Attacks on Military/Government Organizations, Increasing Government Initiatives to Secure Critical Data. Notable trends are: Growing Severity of Cyber Attacks on Military/Government Organizations.

  20. Dataset to Train Intrusion Detection Systems based on Machine Learning...

    • zenodo.org
    application/gzip, bin +1
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esteban Damian Gutierrez Mlot; Esteban Damian Gutierrez Mlot (2024). Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations [Dataset]. http://doi.org/10.5281/zenodo.14066350
    Explore at:
    bin, application/gzip, zipAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Esteban Damian Gutierrez Mlot; Esteban Damian Gutierrez Mlot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DATASET

    This dataset is part of the research work titled "A Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations," which is currently awaiting approval for publication. The dataset has been meticulously curated to support the development and evaluation of machine learning models tailored for detecting cyber intrusions in the context of electrical substations. It is intended to facilitate research and advancements in cybersecurity for critical infrastructure, specifically focusing on real-world scenarios within electrical substation environments. We encourage its use for experimentation and benchmarking in related areas of study.

    The following sections list the content of the dataset generated.

    Data

    • raw
      • iec6180
        • attack-free-data
          • capture61850-attackfree.pcap (from real substation)
          • capture61850-attackfree_PTP.pcap
          • capture61850-attackfree_normalfault.pcap
        • attack-data
          • capture61850-floodattack_withfault.pcap
          • capture61850-floodattack_withoutfault.pcap
          • capture61850-fuzzyattack_withfault.pcap
          • capture61850-fuzzyattack_withoutfault.pcap
          • capture61850-replay.pcap
          • capture61850-ptpattack.pcap
      • iec104
        • attack-free-data
          • capture104-attackfree.pcap (from real substation)
        • attack-data
          • capture104-dosattack.pcap
          • capture104-floodattack.pcap
          • capture104-fuzzyattack.pcap
          • capture104-iec104starvationattack.pcap
          • capture104-mitmattack.pcap
          • capture104-ntpddosattack.pcap
          • capture104-portscanattack.pcap
    • processed
      • iec6180
        • attack-free-data
          • capture61850-attackfree.csv
          • capture61850-attackfree_PTP.csv
          • capture61850-attackfree_normalfault.csv
        • attack-data
          • capture61850-floodattack_withfault.csv
          • capture61850-floodattack_withoutfault.csv
          • capture61850-fuzzyattack_withfault.csv
          • capture61850-fuzzyattack_withoutfault.csv
          • capture61850-replay.csv
          • capture61850-ptpattack.csv
        • headers_iec61850[all].txt
      • iec104
        • attack-free-data
          • capture104-attackfree.csv
        • attack-data
          • capture104-dosattack.csv
          • capture104-floodattack.csv
          • capture104-fuzzyattack.csv
          • capture104-iec104starvationattack.csv
          • capture104-mitmattack.csv
          • capture104-ntpddosattack.csv
          • capture104-portscanattack.csv
        • headers_iec104[all].txt

    Description

    • file type: it may be captured61850 or captured104 depending on whether it contains network captures of the protocol IEC61850 or IEC104.
    • attack: attack free (attackfree) or attack name is added to the file name.
    • function: optionally, if there are some details about functionality captured (normalfault) or specific protocol capture (PTP).
    • file extension: the type can be PCAP (network capture) or CSV (flow file).

    Results

    • results
      • test1-iec104
        • model-test1-iec104.pkl
        • test1-iec104.log
      • test1-iec61850
        • model-test1-iec61850.pkl
        • test1-iec61850.log
      • test2-iec61850
        • model-test2-iec61850.pkl
        • test2-iec61850.log


    Description

    The outcomes of different test executions are available as follows:

    • test1-iec104: IEC 104 protocol for all attacks and attack free scenario
    • test1-iec61850: IEC 61850 protocol for fuzzy attack with fault injection and attack free scenario
    • test2-iec61850: IEC 61850 protocol for fuzzy attack normal operation and attack free scenario


    Each test consists of the model results in Python pickle format (with a .pkl extension) and a detailed description of the execution conditions in an output log file (with a .log extension).

    Source Code

    A snapshot of the source code used to process these files is included under the filename source-code-cybersecurity-datasets-v2.0.zip. For an updated version, please consider visiting github repository.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). Cybersecurity Threat Detection Dataset [Dataset]. https://paperswithcode.com/dataset/cybersecurity-threat-detection

Data from: Cybersecurity Threat Detection Dataset

Related Article
Explore at:
Dataset updated
Mar 7, 2025
Description

Problem Statement

👉 Download the case studies here

Organizations face an increasing number of sophisticated cybersecurity threats, including malware, phishing attacks, and unauthorized access. A financial institution experienced frequent attempts to breach its network, risking sensitive data and regulatory compliance. Traditional security measures were reactive and failed to detect threats in real time. The institution sought a proactive AI-driven solution to identify and prevent cybersecurity threats effectively.

Challenge

Developing an advanced threat detection system required addressing several challenges:

Processing and analyzing large volumes of network traffic and user activity data in real time.

Identifying new and evolving threats, such as zero-day vulnerabilities, with high accuracy.

Minimizing false positives to ensure security teams could focus on genuine threats.

Solution Provided

An AI-powered threat detection system was developed using machine learning algorithms and advanced analytics. The solution was designed to:

Continuously monitor network activity and user behavior to identify suspicious patterns.

Detect and neutralize cybersecurity threats in real time, including malware and phishing attempts.

Provide actionable insights to security teams for faster and more effective threat response.

Development Steps

Data Collection

Collected network traffic logs, endpoint activity, and historical threat data to train machine learning models.

Preprocessing

Cleaned and standardized data, ensuring compatibility across diverse sources, and filtered out noise for accurate analysis.

Model Development

Developed machine learning algorithms for anomaly detection, behavioral analysis, and threat classification. Trained models on labeled datasets to recognize known threats and identify emerging attack patterns.

Validation

Tested the system against simulated and real-world threat scenarios to evaluate detection accuracy, response times, and reliability.

Deployment

Integrated the threat detection system into the institution’s existing cybersecurity infrastructure, including firewalls, SIEM (Security Information and Event Management) tools, and endpoint protection

Continuous Monitoring & Improvement

Established a feedback loop to refine models using new threat data and adapt to evolving attack strategies.

Results

Enhanced Security Posture

The system improved the institution’s ability to detect and prevent cybersecurity threats proactively, strengthening its overall security framework.

Reduced Incidence of Cyber Attacks

Real-time detection and response significantly reduced the frequency and impact of successful cyber attacks.

Improved Threat Response Times

Automated threat identification and prioritization enabled security teams to respond faster and more effectively to potential breaches.

Minimized False Positives

Advanced algorithms reduced false alarms, allowing security teams to focus on genuine threats and improve efficiency.

Scalable and Adaptive Solution

The system adapted to new threats and scaled effortlessly to protect growing organizational networks and data.

Search
Clear search
Close search
Google apps
Main menu