100+ datasets found

P
Data from: Cybersecurity Threat Detection Dataset
paperswithcode.com
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Cybersecurity Threat Detection Dataset [Dataset]. https://paperswithcode.com/dataset/cybersecurity-threat-detection
Explore at:
Dataset updated
Mar 7, 2025
Description
Problem Statement

👉 Download the case studies here

Organizations face an increasing number of sophisticated cybersecurity threats, including malware, phishing attacks, and unauthorized access. A financial institution experienced frequent attempts to breach its network, risking sensitive data and regulatory compliance. Traditional security measures were reactive and failed to detect threats in real time. The institution sought a proactive AI-driven solution to identify and prevent cybersecurity threats effectively.

Challenge

Developing an advanced threat detection system required addressing several challenges:

Processing and analyzing large volumes of network traffic and user activity data in real time.

Identifying new and evolving threats, such as zero-day vulnerabilities, with high accuracy.

Minimizing false positives to ensure security teams could focus on genuine threats.

Solution Provided

An AI-powered threat detection system was developed using machine learning algorithms and advanced analytics. The solution was designed to:

Continuously monitor network activity and user behavior to identify suspicious patterns.

Detect and neutralize cybersecurity threats in real time, including malware and phishing attempts.

Provide actionable insights to security teams for faster and more effective threat response.

Development Steps

Data Collection

Collected network traffic logs, endpoint activity, and historical threat data to train machine learning models.

Preprocessing

Cleaned and standardized data, ensuring compatibility across diverse sources, and filtered out noise for accurate analysis.

Model Development

Developed machine learning algorithms for anomaly detection, behavioral analysis, and threat classification. Trained models on labeled datasets to recognize known threats and identify emerging attack patterns.

Validation

Tested the system against simulated and real-world threat scenarios to evaluate detection accuracy, response times, and reliability.

Deployment

Integrated the threat detection system into the institution’s existing cybersecurity infrastructure, including firewalls, SIEM (Security Information and Event Management) tools, and endpoint protection

Continuous Monitoring & Improvement

Established a feedback loop to refine models using new threat data and adapt to evolving attack strategies.

Results

Enhanced Security Posture

The system improved the institution’s ability to detect and prevent cybersecurity threats proactively, strengthening its overall security framework.

Reduced Incidence of Cyber Attacks

Real-time detection and response significantly reduced the frequency and impact of successful cyber attacks.

Improved Threat Response Times

Automated threat identification and prioritization enabled security teams to respond faster and more effectively to potential breaches.

Minimized False Positives

Advanced algorithms reduced false alarms, allowing security teams to focus on genuine threats and improve efficiency.

Scalable and Adaptive Solution

The system adapted to new threats and scaled effortlessly to protect growing organizational networks and data.

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

zenodo.org
data.niaid.nih.gov

Updated Aug 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.8116338

Dataset updated

Aug 30, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Article Information

The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

Please do cite the aforementioned article when using this dataset.

Abstract

The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

ZIP Folder Content

The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

Datasets' Content

Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

Identified Key Features Within Bluetooth Dataset

Feature	Meaning
btle.advertising_header	BLE Advertising Packet Header
btle.advertising_header.ch_sel	BLE Advertising Channel Selection Algorithm
btle.advertising_header.length	BLE Advertising Length
btle.advertising_header.pdu_type	BLE Advertising PDU Type
btle.advertising_header.randomized_rx	BLE Advertising Rx Address
btle.advertising_header.randomized_tx	BLE Advertising Tx Address
btle.advertising_header.rfu.1	Reserved For Future 1
btle.advertising_header.rfu.2	Reserved For Future 2
btle.advertising_header.rfu.3	Reserved For Future 3
btle.advertising_header.rfu.4	Reserved For Future 4
btle.control.instant	Instant Value Within a BLE Control Packet
btle.crc.incorrect	Incorrect CRC
btle.extended_advertising	Advertiser Data Information
btle.extended_advertising.did	Advertiser Data Identifier
btle.extended_advertising.sid	Advertiser Set Identifier
btle.length	BLE Length
frame.cap_len	Frame Length Stored Into the Capture File
frame.interface_id	Interface ID
frame.len	Frame Length Wire
nordic_ble.board_id	Board ID
nordic_ble.channel	Channel Index
nordic_ble.crcok	Indicates if CRC is Correct
nordic_ble.flags	Flags
nordic_ble.packet_counter	Packet Counter
nordic_ble.packet_time	Packet time (start to end)
nordic_ble.phy	PHY
nordic_ble.protover	Protocol Version

Identified Key Features Within IP-Based Packets Dataset

Feature	Meaning
http.content_length	Length of content in an HTTP response
http.request	HTTP request being made
http.response.code	Sequential number of an HTTP response
http.response_number	Sequential number of an HTTP response
http.time	Time taken for an HTTP transaction
tcp.analysis.initial_rtt	Initial round-trip time for TCP connection
tcp.connection.fin	TCP connection termination with a FIN flag
tcp.connection.syn	TCP connection initiation with SYN flag
tcp.connection.synack	TCP connection establishment with SYN-ACK flags
tcp.flags.cwr	Congestion Window Reduced flag in TCP
tcp.flags.ecn	Explicit Congestion Notification flag in TCP
tcp.flags.fin	FIN flag in TCP
tcp.flags.ns	Nonce Sum flag in TCP
tcp.flags.res	Reserved flags in TCP
tcp.flags.syn	SYN flag in TCP
tcp.flags.urg	Urgent flag in TCP
tcp.urgent_pointer	Pointer to urgent data in TCP
ip.frag_offset	Fragment offset in IP packets
eth.dst.ig	Ethernet destination is in the internal network group
eth.src.ig	Ethernet source is in the internal network group
eth.src.lg	Ethernet source is in the local network group
eth.src_not_group	Ethernet source is not in any network group
arp.isannouncement	Indicates if an ARP message is an announcement

Identified Key Features Within IP-Based Flows Dataset

Feature	Meaning
proto	Transport layer protocol of the connection
service	Identification of an application protocol
orig_bytes	Originator payload bytes
resp_bytes	Responder payload bytes
history	Connection state history
orig_pkts	Originator sent packets
resp_pkts	Responder sent packets
flow_duration	Length of the flow in seconds
fwd_pkts_tot	Forward packets total
bwd_pkts_tot	Backward packets total
fwd_data_pkts_tot	Forward data packets total
bwd_data_pkts_tot	Backward data packets total
fwd_pkts_per_sec	Forward packets per second
bwd_pkts_per_sec	Backward packets per second
flow_pkts_per_sec	Flow packets per second
fwd_header_size	Forward header bytes
bwd_header_size	Backward header bytes
fwd_pkts_payload	Forward payload bytes
bwd_pkts_payload	Backward payload bytes
flow_pkts_payload	Flow payload bytes
fwd_iat	Forward inter-arrival time
bwd_iat	Backward inter-arrival time
flow_iat	Flow inter-arrival time
active	Flow active duration

m
Dataset Description for "Quantum AI for Cybersecurity Threat Prediction"
data.mendeley.com
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bindu Garg (2025). Dataset Description for "Quantum AI for Cybersecurity Threat Prediction" [Dataset]. http://doi.org/10.17632/fswng37vbz.2
Explore at:
Unique identifier
https://doi.org/10.17632/fswng37vbz.2
Dataset updated
Mar 20, 2025
Authors
Bindu Garg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is engineered to propel the development of quantum-enhanced anomaly detection systems for cybersecurity, merging real-world network traffic data with the potential for simulated attack scenarios. It comprises two datasets—malicious and non-malicious—crafted to train ML models, leveraging quantum AI to identify subtle anomalies and mitigate cyber threats, particularly those resistant to classical detection methods. Derived from Wireshark captures of normal web browsing and attack simulations, it provides a crucial baseline for quantum machine learning (QML) models.

The dataset's strength lies in its fusion of traditional network attributes. These frequency features are paramount for QML algorithms to discern complex patterns indicative of malicious behavior. For instance, QML can identify minute deviations in source/destination frequency or unusual protocol usage, often missed by classical methods.

Column Descriptions:

No. (Record Number): Unique identifier. Time: Timestamp of activity. Source: Source device/IP. Source_Count: Source frequency. Destination: Destination device/IP. Destination_Count: Destination frequency. Protocol: Network protocol. Protocol_Count: Protocol frequency. Length: Packet size. Info: Contextual details.

Uniqueness of the Dataset:

• Two-Class Design: The dataset includes separate malicious and non-malicious traffic logs, essential for training ML models to differentiate between normal and attack patterns. • Frequency-Based Features: The inclusion of "Source_Count," "Destination_Count," and "Protocol_Count" significantly enhances analytical capabilities, allowing the detection of anomalies based on activity patterns. • Comprehensive Network Traffic Attributes: The dataset combines frequency features with standard network traffic attributes (Time, Source, Destination, Protocol, Length, Info), providing a holistic view of network activity. • Potential for Diverse Analysis: The combination of structured and semi-structured data (in the "Info" column) enables a wide range of analytical techniques, including time series analysis, machine learning, and natural language processing. • Cybersecurity Focus: Designed for cybersecurity threat prediction, it is valuable for researchers and practitioners in this domain. • Real-World and Simulated Attacks: The dataset includes both benign traffic and simulated attacks, making it ideal for testing security systems before deployment.

Conclusion:

This dataset, is a powerful tool for cybersecurity analysis. Its strength lies in its ability to establish a baseline and detect deviations, even subtle ones. The inclusion of malicious and non-malicious data enables precise model training for threat detection. It is vital for behavioral analysis, DDoS detection, malware analysis, forensics, and training. This dataset empowers security professionals to develop advanced solutions, enhancing network security by revealing valuable insights from seemingly routine network traffic.
Drone-Based Malware Detection (DBMD)
kaggle.com
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2024). Drone-Based Malware Detection (DBMD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/9045375
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9045375
Dataset updated
Jul 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DatasetEngineer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description Welcome to the Drone-Based Malware Detection dataset! This dataset is designed to aid researchers and practitioners in exploring innovative cybersecurity solutions using drone-collected data. The dataset contains detailed information on network traffic, drone sensor readings, malware detection indicators, and environmental conditions. It offers a unique perspective by integrating data from drones with traditional network security metrics to enhance malware detection capabilities.

Dataset Overview The dataset comprises four main categories:

Network Traffic Data: Captures network traffic attributes including IP addresses, ports, protocols, packet sizes, and various derived metrics. Drone Sensor Data: Includes GPS coordinates, altitude, speed, heading, battery level, and other sensor readings from drones. Malware Detection Data: Contains indicators and scores relevant to detecting malware, such as anomaly scores, suspicious IP counts, reputation scores, and attack types. Environmental Data: Provides context through environmental conditions like location type, noise level, weather conditions, and more. Files and Features The dataset is divided into four separate CSV files:

network_traffic_data.csv

timestamp: Date and time of the traffic event. source_ip: Source IP address. destination_ip: Destination IP address. source_port: Source port number. destination_port: Destination port number. protocol: Network protocol (TCP, UDP, ICMP). packet_length: Length of the network packet. payload_data: Content of the packet payload. flag: Network flag (SYN, ACK, FIN, RST). traffic_volume: Volume of traffic in bytes. flow_duration: Duration of the network flow. flow_bytes_per_s: Bytes per second for the flow. flow_packets_per_s: Packets per second for the flow. packet_count: Number of packets in the flow. average_packet_size: Average size of packets. min_packet_size: Minimum packet size. max_packet_size: Maximum packet size. packet_size_variance: Variance in packet sizes. header_length: Length of the packet header. payload_length: Length of the packet payload. ip_ttl: Time to live for the IP packet. tcp_window_size: TCP window size. icmp_type: ICMP type (echo_request, echo_reply, destination_unreachable). dns_query_count: Number of DNS queries. dns_response_count: Number of DNS responses. http_method: HTTP method (GET, POST, PUT, DELETE). http_status_code: HTTP status code (200, 404, 500, 301). content_type: Content type (text/html, application/json, image/png). ssl_tls_version: SSL/TLS version. ssl_tls_cipher_suite: SSL/TLS cipher suite. drone_data.csv

latitude: Latitude of the drone. longitude: Longitude of the drone. altitude: Altitude of the drone. speed: Speed of the drone. heading: Heading of the drone. battery_level: Battery level of the drone. drone_id: Unique identifier for the drone. flight_time: Total flight time. signal_strength: Strength of the drone's signal. temperature: Temperature at the drone's location. humidity: Humidity at the drone's location. pressure: Atmospheric pressure at the drone's location. wind_speed: Wind speed at the drone's location. wind_direction: Wind direction at the drone's location. gps_accuracy: Accuracy of the GPS signal. malware_detection_data.csv

anomaly_score: Score indicating the level of anomaly detected. suspicious_ip_count: Number of suspicious IP addresses detected. malicious_payload_indicator: Indicator for malicious payload (0 or 1). reputation_score: Reputation score for the network entity. behavioral_score: Behavioral score indicating potential malicious activity. attack_type: Type of attack (DDoS, phishing, malware). signature_match: Indicator for signature match (0 or 1). sandbox_result: Result from sandbox analysis (clean, infected). heuristic_score: Heuristic score for potential threats. traffic_pattern: Pattern of the traffic (burst, steady). environmental_data.csv

location_type: Type of location (urban, rural). nearby_devices: Number of nearby devices. signal_interference: Level of signal interference. noise_level: Noise level in the environment. time_of_day: Time of day (morning, afternoon, evening, night). day_of_week: Day of the week. weather_conditions: Weather conditions (sunny, rainy, cloudy, stormy). Usage and Applications This dataset can be used for:

Cybersecurity Research: Developing and testing algorithms for malware detection using drone data. Machine Learning: Training models to identify malicious activity based on network traffic and drone sensor readings. Data Analysis: Exploring the relationships between environmental conditions, drone sensor data, and network traffic anomalies. Educational Purposes: Teaching data science, machine learning, and cybersecurity concepts using a comprehensive and multi-faceted dataset.

Acknowledgements This dataset is based on real-world data collected from drone sensors and network traffic monitoring s...
Global cyberattack distribution 2023, by type
statista.com
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Global cyberattack distribution 2023, by type [Dataset]. https://www.statista.com/statistics/1382266/cyber-attacks-worldwide-by-type/
Explore at:
Dataset updated
Nov 14, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
In 2023, ransomware was the most frequently detected cyberattack worldwide, with around 70 percent of all detected cyberattacks. Network breaches ranked second, with almost 19 percent of the detections. Although less frequently, data exfiltration was also among the detected cyberattacks.
Open Source Cyber Security Market Report | Global Forecast From 2025 To 2033...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Open Source Cyber Security Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/open-source-cyber-security-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Open Source Cyber Security Market Outlook

The global Open Source Cyber Security market size was valued at USD 5.2 billion in 2023 and is projected to reach USD 14.5 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 11.8% during the forecast period. This substantial growth is driven by increasing awareness about the benefits of open-source solutions, rising cyber threats, and stringent regulatory compliances.

One of the primary factors fueling the growth of the Open Source Cyber Security market is the cost-effectiveness of open-source solutions compared to proprietary software. Open-source cyber security tools often come at a fraction of the cost of their commercial counterparts, making them highly attractive for organizations seeking to manage budgets efficiently. Additionally, the flexibility and customization capabilities offered by open-source solutions enable organizations to tailor the tools according to their specific security needs, which in turn drives adoption across various industries.

Another significant growth driver is the mounting frequency and sophistication of cyber-attacks. As cyber threats evolve, organizations need robust and adaptable security measures to protect sensitive data and systems. Open-source cyber security tools are often at the forefront of innovation, with a large community of developers continuously improving and updating the software to address new vulnerabilities. This constant evolution ensures that open-source tools can effectively combat the latest threats, making them an essential component of modern cyber security strategies.

Furthermore, the increasing regulatory pressure on organizations to maintain stringent security postures is propelling the adoption of open-source cyber security solutions. Regulations such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the US mandate robust data protection measures, encouraging businesses to invest in advanced security solutions. Open-source tools offer transparency and community-driven support, which can help organizations demonstrate compliance with these regulations, thereby fostering market growth.

Security Orchestration is becoming increasingly vital in the realm of open-source cyber security solutions. As organizations face a growing number of cyber threats, the ability to efficiently coordinate and manage various security tools is crucial. Security Orchestration enables the integration of multiple security systems and processes, allowing for streamlined operations and improved incident response times. This capability is particularly beneficial in environments where open-source tools are deployed, as it helps to unify disparate systems and enhance overall security effectiveness. By automating routine tasks and facilitating better communication between security components, Security Orchestration empowers organizations to respond more swiftly and effectively to cyber threats, thereby strengthening their security posture.

Regionally, North America is expected to dominate the Open Source Cyber Security market due to the presence of leading technology companies, high adoption rates of advanced technologies, and stringent regulatory frameworks. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period, driven by the rapid digital transformation of businesses, increasing awareness about cyber security, and supportive government initiatives aimed at enhancing cyber resilience.

Component Analysis

The Open Source Cyber Security market can be segmented by components into Software and Services. In the software segment, various types of open-source solutions such as intrusion detection systems, firewalls, security information and event management (SIEM) systems, and encryption tools are gaining traction. These solutions offer robust protection against a wide range of cyber threats, making them essential for organizations across different sectors. The continuous evolution and innovation in open-source software, driven by a collaborative community of developers, ensure that these tools remain effective in mitigating the latest cyber threats.

On the services front, the market includes professional services such as consulting, training, and support, as well as managed security services. Professional services are crucial for organizations that require expert guidance to implement and optimize open-source s
o
Comprehensive, Multi-Source Cyber-Security Events Data Set
osti.gov
Updated May 21, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2015). Comprehensive, Multi-Source Cyber-Security Events Data Set [Dataset]. http://doi.org/10.17021/1179829
Explore at:
Unique identifier
https://doi.org/10.17021/1179829
Dataset updated
May 21, 2015
Dataset provided by
USDOE Office of Science (SC)
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Description
This data set represents 58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network. The data sources include Windows-based authentication events from both individual computers and centralized Active Directory domain controller servers; process start and stop events from individual Windows computers; Domain Name Service (DNS) lookups as collected on internal DNS servers; network flow data as collected on at several key router locations; and a set of well-defined red teaming events that present bad behavior within the 58 days. In total, the data set is approximately 12 gigabytes compressed across the five data elements and presents 1,648,275,307 events in total for 12,425 users, 17,684 computers, and 62,974 processes. Specific users that are well known system related (SYSTEM, Local Service) were not de-identified though any well-known administrators account were still de-identified. In the network flow data, well-known ports (e.g. 80, 443, etc) were not de-identified. All other users, computers, process, ports, times, and other details were de-identified as a unified set across all the data elements (e.g. U1 is the same U1 in all of the data). The specific timeframe used is not disclosed for security purposes. In addition, no data that allows association outside of LANL’s network is included. All data starts with a time epoch of 1 using a time resolution of 1 second. In the authentication data, failed authentication events are only included for users that had a successful authentication event somewhere within the data set.
c
Data from: A Dataset of Cyber-Induced Mechanical Faults on Buildings with...
s.cnmilf.com
data.openei.org
+2more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). A Dataset of Cyber-Induced Mechanical Faults on Buildings with Network and Buildings Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/a-dataset-of-cyber-induced-mechanical-faults-on-buildings-with-network-and-buildings-data-54439
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
We have collected data of cyber-induced mechanical faults on buildings using a simulation platform. A DOE reference building model was used for running the simulation under a Rogue device attack and collected the network data as well as the physical buildings data to better understand the impacts of cyber attacks on the building and help identify the source of the mechanical fault with the network data. Alfalfa is the tool used for simulating the DOE reference buildings and acts as an interface to the model for querying the status and providing input externally. The Building Automation System (BAS) is the centralized controller providing control commands to other BACnet devices on the network based on the building status received from Alfalfa. The BACnet devices like damper will listen for the control commands from BAS on the BACnet network and implement it. The attacker is the malicious actor on the network creating disruptions by placing cyber-attacks.
Cyber Security Situational Awareness Market Report | Global Forecast From...
dataintelo.com
csv, pdf, pptx
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Cyber Security Situational Awareness Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-cyber-security-situational-awareness-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 12, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Cyber Security Situational Awareness Market Outlook

The global cyber security situational awareness market size was valued at approximately $29.2 billion in 2023 and is projected to reach around $72.4 billion by 2032, growing at a CAGR of 10.6% during the forecast period. The primary growth factors driving this market are the increasing frequency and sophistication of cyber-attacks and the growing adoption of IoT and connected devices, which necessitate advanced security measures to ensure data integrity and network security.

The rapid digital transformation across industries presents both opportunities and challenges in terms of cyber security. As organizations increasingly rely on digital platforms and interconnected systems, the threat landscape becomes more complex and dynamic. This has led to a heightened demand for robust cyber security situational awareness solutions, which provide real-time visibility, threat detection, and response capabilities. The growing regulatory requirements for data protection and privacy also play a crucial role in driving market growth, as businesses strive to comply with stringent regulations such as GDPR, HIPAA, and others.

Moreover, the rise of remote work and the increased use of cloud services have expanded the attack surface for malicious actors. Organizations are now more vulnerable to phishing attacks, ransomware, and other forms of cyber threats. This has led to a greater emphasis on enhancing cyber security frameworks and adopting advanced situational awareness tools. The integration of artificial intelligence (AI) and machine learning (ML) in cyber security solutions is another significant growth factor, enabling faster detection and mitigation of threats while reducing false positives.

Furthermore, the increasing investment in cybersecurity by both public and private sectors is expected to fuel market growth. Governments worldwide are recognizing the importance of protecting critical infrastructure and are allocating significant resources to bolster cyber defenses. Private enterprises, facing the potential financial and reputational damage from cyber incidents, are also increasingly investing in advanced security solutions to safeguard their operations. This collective effort to enhance cyber resilience is a key driver of the cyber security situational awareness market.

From a regional perspective, North America currently holds the largest market share due to the presence of major technology companies and a high adoption rate of advanced cyber security solutions. However, the Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, driven by rapid digitalization, expanding internet penetration, and increasing awareness of cyber threats. Europe also remains a significant market, with stringent data protection regulations and substantial investments in cyber security infrastructure.

Component Analysis

The cyber security situational awareness market by component can be broadly segmented into solutions and services. Solutions encompass a variety of software and hardware tools designed to provide comprehensive situational awareness, including threat intelligence platforms, intrusion detection systems, security information and event management (SIEM) systems, and advanced analytics solutions. These tools are crucial for identifying, analyzing, and mitigating potential threats in real-time, ensuring the security and integrity of an organization's network and data.

Within the solutions segment, SIEM systems are particularly notable for their ability to collect and analyze security-related data from various sources, providing a unified view of an organization's security posture. These systems leverage advanced analytics and machine learning algorithms to detect anomalies and potential threats, facilitating faster response times. Intrusion detection systems, on the other hand, focus on identifying unauthorized access attempts and other malicious activities within a network, enabling organizations to take proactive measures to thwart attacks.

On the services side, the market includes professional services such as consulting, training, and implementation, as well as managed services. Consulting services help organizations assess their current security posture, identify vulnerabilities, and develop strategies for enhancing situational awareness. Training services are essential for building the skills and knowledge required to effectively use advanced cyber security tools and respond to threats. Implementation services ensure that security sol
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
ai-chatbox.pro
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
May 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Large-Scale Attacks in IoT Environment
kaggle.com
zip
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Manaenkov (2025). Large-Scale Attacks in IoT Environment [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/large-scale-attacks-in-iot-environment
Explore at:
zip(1474647877 bytes)Available download formats
Dataset updated
May 7, 2025
Authors
Nikita Manaenkov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CICIoT2023 dataset is a large-scale, realistic intrusion detection dataset designed to support security analytics and machine learning research in the Internet of Things (IoT) domain. Created by the Canadian Institute for Cybersecurity (CIC), the dataset captures 33 different types of attacks (including DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai) executed by malicious IoT devices against other IoT targets.

The testbed consists of 105 real IoT devices of different types and manufacturers, including smart home devices and industrial equipment, configured in a complex network topology to emulate real-world conditions. The dataset includes benign and malicious traffic in various formats and supports feature extraction for both traditional ML and deep learning models.

This dataset aims to address the lack of diversity and scale in previous IoT security datasets, offering a robust benchmark for evaluating intrusion detection systems (IDS) and enabling research in IoT cybersecurity, anomaly detection, and network forensics.

Source https://www.mdpi.com/1424-8220/23/13/5941
P
Forchheim Image Dataset Dataset
paperswithcode.com
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Forchheim Image Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/forchheim-image-dataset
Explore at:
Dataset updated
Mar 18, 2025
Description
Description:

👉 Download the dataset here

The Forchheim Image Dataset is design specifically for Source Camera Identification (SCI) tasks, offering a diverse range of images captured from various devices. This dataset is an essential resource for forensic and cybersecurity professionals who are working to trace the origin of digital images to the cameras that captured them.

The dataset contains a total of 3,851 high-resolution images, meticulously curated from 27 distinct digital devices, ensuring broad representation across different camera models and manufacturers. To maintain the focus on Source Camera Identification, only the ‘original’ (unprocessed) images from each device have been retain, while all other derivative files, such as edited or compressed versions, have been exclude.

Download Dataset

Key Features of the Forchheim Dataset:

Diversity of Devices: Includes images from 27 unique devices, ranging from smartphones to high-end cameras, covering various sensor types, lenses, and software configurations.

High-Quality Images: All images are preserved in their original, unaltered formats to ensure authenticity and integrity for SCI tasks.

Exclusively for SCI: Derivative files and any post-processed images have been remove, ensuring that the dataset strictly serves the purpose of source camera identification.

Applications: Ideal for forensic analysis, digital media forensics, image authentication, and cybersecurity research where tracing the origin of images is critical.

Dataset Structure: The dataset is organize into folders by device, making it easier for researchers to access and analyze images base on their source.

Potential Use Cases:

Forensic Analysis: Identifying the source of images in legal cases or criminal investigations.

Cybersecurity: Detecting manipulated or unauthorized images used in malicious campaigns.

Academic Research: Training and testing machine learning models for image source attribution

This dataset is sourced from Kaggle.
Open Source Security Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Open Source Security Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/open-source-security-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Oct 5, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Open Source Security Market Outlook

The global open source security market size was valued at approximately USD 2.5 billion in 2023 and is expected to grow to around USD 7.9 billion by 2032, reflecting a robust compound annual growth rate (CAGR) of 13.6% during the forecast period. This growth is primarily driven by the increasing adoption of open source software (OSS) across various industries due to its cost-effectiveness and flexibility, coupled with a growing awareness of cybersecurity threats.

One of the primary growth factors for the open source security market is the escalating number of cyber threats and data breaches, which have heightened the need for more robust security measures. Organizations are increasingly turning to open source security solutions to safeguard their systems and data. The flexibility and transparency offered by open source solutions allow organizations to customize security measures to fit their specific needs, which is an attractive proposition compared to proprietary software.

Another significant growth driver is the rising adoption of open source software in enterprise IT ecosystems. As more businesses leverage OSS for various applications, from web development to cloud computing, the need for effective security solutions becomes paramount. Open source security tools are often more adaptable and rapidly updated, enabling organizations to quickly address vulnerabilities and stay ahead of potential threats. The collaborative nature of open source communities also means that security solutions benefit from continuous contributions from a global pool of developers.

Additionally, cost considerations play a crucial role in the market's expansion. Open source security solutions often come with lower upfront costs compared to proprietary alternatives, making them particularly appealing to small and medium-sized enterprises (SMEs) that may have limited IT budgets. This cost advantage, combined with the potential for reduced total cost of ownership due to the ability to modify and improve the software, is expected to fuel the market's growth further.

Regionally, North America is anticipated to hold the largest market share during the forecast period, driven by the early adoption of advanced technologies and a strong focus on cybersecurity. However, the Asia Pacific region is expected to witness the highest growth rate due to the rapid digital transformation in emerging economies like India and China, increasing cybersecurity investments, and the growing implementation of OSS across various industries.

Component Analysis

The open source security market is segmented by component into software and services. Software comprises various security tools and applications designed to protect open source environments, including firewalls, intrusion detection systems, and security monitoring tools. The software segment is expected to dominate the market due to the increasing deployment of open source security software that offers extensive customization and integration capabilities. These tools are essential for organizations to maintain the security and integrity of their open source applications.

On the other hand, the services segment includes consulting, implementation, and maintenance services. As organizations adopt open source security solutions, the demand for expert services to effectively implement and manage these solutions is growing. Consulting services help organizations assess their security posture and develop strategies to mitigate risks. Implementation services ensure that open source security tools are correctly deployed and configured, while maintenance services provide ongoing support and updates to keep the security measures effective.

The services segment is also set to experience significant growth, driven by the increasing complexity of cybersecurity threats and the need for specialized expertise. Many organizations prefer to outsource their security needs to external experts who can provide up-to-date knowledge and skills. This trend is particularly prominent among SMEs, which may lack the resources to maintain an in-house security team.

Furthermore, the integration of artificial intelligence (AI) and machine learning (ML) into open source security solutions is enhancing their capabilities. AI and ML-powered security tools can analyze vast amounts of data to detect anomalies and predict potential threats, providing organizations with advanced protection mechanisms. This technological advancement is expected to drive the growth of bot
o
Threat Intelligence Text Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Threat Intelligence Text Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/8293a044-4601-409d-898b-a16bf6852ae2
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Website Analytics & User Experience
Description
This curated dataset, Cyber-BERT, is designed for Natural Language Processing (NLP) applications within the cybersecurity domain. It contains text extracted from various cybersecurity sources, encompassing topics such as malware analysis, vulnerabilities, cyber threats, and network security. The dataset is well-suited for training BERT-based models to perform essential tasks like threat detection, text classification, and broader cybersecurity research. The data has been meticulously preprocessed to ensure cleanliness, with URLs, non-text symbols, HTML tags, metadata, and redundant content removed.

Columns

text: This column contains the processed cybersecurity-related text.

Distribution

The dataset is typically provided in a CSV file format, making it readily accessible for various applications. It contains approximately 50,000 samples, though the exact number may vary based on collection updates. The data has undergone significant preprocessing to enhance its utility for NLP tasks, including the removal of URLs, non-text symbols, HTML tags, metadata, and duplicate entries.

Usage

This dataset offers a range of valuable applications, including: * Cyber Threat Detection: Utilise the dataset to train models for classifying security threats. * Named Entity Recognition (NER): Identify and extract key entities such as malware, exploits, and vulnerabilities from cybersecurity text. * Threat Intelligence Analysis: Extract valuable insights from cybersecurity reports and other relevant texts. * BERT Fine-Tuning: Build specialised NLP models tailored for security domains and specific cybersecurity challenges.

Coverage

The text within this dataset is extracted from prominent cybersecurity sources including TheHackerNews, CVE Details, Any.Run, and OpenPhish. The dataset's scope is global. Specific time ranges for the data content itself are not provided.

License

CCO

Who Can Use It

This dataset is an excellent resource for: * Researchers focused on advancing NLP techniques in cybersecurity. * Data Scientists and Machine Learning Engineers developing threat detection systems or text classification models. * Security Analysts looking to automate aspects of threat intelligence analysis. * Anyone involved in building specialised NLP models for security domains.

Dataset Name Suggestions

Cyber-BERT

Cybersecurity NLP Corpus

Threat Intelligence Text Dataset

Security Text Analytics Data

BERT Security Dataset

Attributes

Original Data Source: Cyber-BERT
P
EDGE-IIOTSET Dataset
paperswithcode.com
Updated Oct 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
Explore at:
Dataset updated
Oct 16, 2023
Description
ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

Instructions:

Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

Link to paper : https://ieeexplore.ieee.org/document/9751703

The directories of the Edge-IIoTset dataset include the following:

•File 1 (Normal traffic)

-File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

-File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

-File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

-File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

-File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

-File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

-File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

-File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

•File 2 (Attack traffic):

-File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

-File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

•File 3 (Selected dataset for ML and DL):

-File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

-File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

!pip install -q kaggle

files.upload()

!mkdir ~/.kaggle

!cp kaggle.json ~/.kaggle/

!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

!unzip DNN-EdgeIIoT-dataset.csv.zip

!rm DNN-EdgeIIoT-dataset.csv.zip

Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

import numpy as np

df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

Step 3 : Exploring some of the DataFrame's contents: df.head(5)

print(df['Attack_type'].value_counts())

Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

"http.file_data","http.request.full_uri","icmp.transmit_timestamp", "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport", "tcp.dstport", "udp.port", "mqtt.msg"]

df.drop(drop_columns, axis=1, inplace=True)

df.dropna(axis=0, how='any', inplace=True)

df.drop_duplicates(subset=None, keep="first", inplace=True)

df = shuffle(df)

df.isna().sum()

print(df['Attack_type'].value_counts())

Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn import preprocessing

def encode_text_dummy(df, name):

dummies = pd.get_dummies(df[name])

for x in dummies.columns:

dummy_name = f"{name}-{x}" df[dummy_name] = dummies[x]

df.drop(name, axis=1, inplace=True)

encode_text_dummy(df,'http.request.method')

encode_text_dummy(df,'http.referer')

encode_text_dummy(df,"http.request.version")

encode_text_dummy(df,"dns.qry.name.len")

encode_text_dummy(df,"mqtt.conack.flags")

encode_text_dummy(df,"mqtt.protoname")

encode_text_dummy(df,"mqtt.topic")

Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

More information about Dr. Mohamed Amine Ferrag is available at:

https://www.linkedin.com/in/Mohamed-Amine-Ferrag

https://dblp.uni-trier.de/pid/142/9937.html

https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

https://www.scopus.com/authid/detail.uri?authorId=56115001200

https://publons.com/researcher/1322865/mohamed-amine-ferrag/

https://orcid.org/0000-0002-0632-3172

Last Updated: 27 Mar. 2023
I
Intrusion Detection System Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Intrusion Detection System Software Report [Dataset]. https://www.datainsightsmarket.com/reports/intrusion-detection-system-software-1967301
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
May 28, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Intrusion Detection System (IDS) Software market is experiencing robust growth, driven by the escalating need for robust cybersecurity solutions across various sectors. The increasing frequency and sophistication of cyberattacks, coupled with stringent data privacy regulations like GDPR and CCPA, are compelling organizations to invest heavily in advanced IDS software. The market's expansion is further fueled by the proliferation of connected devices and the adoption of cloud computing, which expand the attack surface and necessitate comprehensive security measures. While the precise market size for 2025 isn't provided, considering a reasonable CAGR of 15% (a conservative estimate given the market dynamics) and assuming a 2024 market size of $8 billion (a plausible figure based on industry reports), the 2025 market size would be approximately $9.2 billion. This growth is expected to continue throughout the forecast period (2025-2033), driven by continuous innovation in detection techniques (like AI/ML-powered solutions), increasing demand for managed security services, and the growing adoption of hybrid cloud environments. Significant market segmentation exists, encompassing network-based IDS, host-based IDS, and cloud-based IDS. Network-based IDS dominates currently but the cloud-based segment is exhibiting the fastest growth rate. Leading vendors such as SolarWinds, ManageEngine, Cisco, and Splunk are actively competing to provide comprehensive, scalable, and user-friendly solutions. However, the market also features a considerable number of open-source options (like Snort, Suricata, and Zeek), offering cost-effective alternatives for smaller organizations. While the market faces restraints such as the complexity of implementation and maintenance, the rising cybersecurity threats are likely to outweigh these challenges, ensuring sustained market expansion in the coming years. This market analysis highlights the significant opportunities and challenges present within the IDS Software market, demonstrating its importance in the ever-evolving cybersecurity landscape.
O
Open Source Cyber Intelligence Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Open Source Cyber Intelligence Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/open-source-cyber-intelligence-tools-40013
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global open source cyber intelligence tools market size was valued at $2107.3 million in 2023 and is projected to reach $3819.2 million by 2033, exhibiting a CAGR of 11.4% during the forecast period (2023-2033). The rising demand for advanced cyber threat detection and mitigation solutions, the increasing adoption of open source software in the cybersecurity industry, and the growing need to protect sensitive information from cyber attacks are driving the growth of the market. Additionally, the increasing adoption of cloud-based solutions and the growing awareness of cyber threats are further boosting the market expansion. North America holds the largest market share due to the presence of key players and the early adoption of advanced technologies. Asia Pacific is expected to witness robust growth due to the increasing number of cyber threats and the growing awareness about cybersecurity risks. The market is highly competitive, with key players offering a wide range of open source cyber intelligence tools. The key players in the market include Thales Group, Palantir Technologies, Cognyte, OpenText (Micro Focus), Recorded Future, Expert System, Hensoldt Analytics, Maltego, Cyware, and Babel Street. The open source cyber intelligence tools market is expected to grow from USD 1.5 billion in 2023 to USD 3.4 billion by 2033, at a CAGR of 9.5% over the forecast period.
d
5.12 Cybersecurity (summary) - Archived
catalog.data.gov
performance.tempe.gov
+6more
Updated Jan 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 5.12 Cybersecurity (summary) - Archived [Dataset]. https://catalog.data.gov/dataset/5-12-cybersecurity-summary-823d7
Explore at:
Dataset updated
Jan 17, 2025
Dataset provided by
City of Tempe
Description
The National Institute of Standards and Technology (NIST) provides a Cybersecurity Framework (CSF) for benchmarking and measuring the maturity level of cyber security programs across all industries. The City uses this framework and toolset to measure and report on its internal cyber security program.The foundation for this measure is the Framework Core, a set of cybersecurity activities, desired outcomes and applicable references that are common across critical infrastructure/industry sectors. These activities come from the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF) published standard, along with the information security and customer privacy controls it references (NIST 800 Series Special Publications). The Framework Core presents industry standards, guidelines, and practices in a manner that allows for communication of cybersecurity activities and outcomes across the organization from the executive level to the implementation/operations level. The Framework Core consists of five concurrent and continuous functions – identify, protect, detect, respond, and recover. When considered together, these functions provide a high-level, strategic view of the lifecycle of an organization’s management of cybersecurity risk. The Framework Core identifies underlying key categories and subcategories for each function, and matches them with example references, such as existing standards, guidelines and practices for each subcategory. This page provides data for the Cybersecurity performance measure.Cybersecurity Framework cumulative score summary per fiscal year quarter (Performance Measure 5.12)The performance measure page is available at 5.12 Cybersecurity.Additional InformationSource: Maturity assessment / https://www.nist.gov/topics/cybersecurityContact: Scott CampbellContact E-Mail: Scott_Campbell@tempe.govData Source Type: ExcelPreparation Method: The data is a summary of a detailed and confidential analysis of the city's cyber security program. Maturity scores of subcategories within NIST CFS are combined, averaged and rolled up to a summary score for each major category.Publish Frequency: AnnualPublish Method: ManualData Dictionary
D
Defense Cyber Security Market Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Defense Cyber Security Market Report [Dataset]. https://www.marketreportanalytics.com/reports/defense-cyber-security-market-89169
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Apr 20, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global defense cybersecurity market is experiencing robust growth, projected to reach $22.95 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 12.82% from 2025 to 2033. This expansion is driven by several key factors. Firstly, the increasing sophistication and frequency of cyberattacks targeting defense infrastructure necessitate robust and advanced cybersecurity solutions. Governments worldwide are significantly increasing their investments in bolstering their national security posture, recognizing the critical role cybersecurity plays in protecting sensitive data, critical infrastructure, and military operations. Secondly, the adoption of cloud computing and Internet of Things (IoT) devices within defense organizations expands the attack surface, making comprehensive cybersecurity measures indispensable. Finally, the growing need for proactive threat intelligence and advanced training programs for cybersecurity professionals further fuels market growth. The market is segmented into various solutions, including defense solutions, threat assessment, network fortification, and training services, each contributing to the overall market expansion. Leading companies such as General Dynamics-CSRA, Raytheon Technologies Corporation, and Lockheed Martin Corporation are at the forefront of innovation, developing and deploying cutting-edge cybersecurity technologies to meet the evolving needs of the defense sector. The North American region, particularly the United States, currently dominates the market, driven by substantial defense budgets and advanced technological capabilities. However, the Asia-Pacific region is expected to witness significant growth during the forecast period, fueled by increasing defense spending in countries like China, India, and Japan, and a rising awareness of cybersecurity threats. Europe also presents a substantial market opportunity, driven by increasing cross-border cyber threats and a greater emphasis on cybersecurity within the defense sector. The continued development of artificial intelligence (AI)-powered cybersecurity solutions, enhanced data analytics for threat detection, and the integration of cybersecurity into the broader defense ecosystem will shape future market trends. While challenges such as the high cost of implementation and a shortage of skilled cybersecurity professionals exist, the overall market outlook remains highly positive, suggesting a sustained period of growth and innovation in the coming years. Recent developments include: May 2023: SAIC has introduced its new encrypted query analytics and data retrieval (EQADR) platform. The platform is capable of next-generation cryptographic, cross-boundary data search, retrieval, and analysis. The EQADR has been designed with a view to making it quicker, safer, and more reliable in terms of data search and retrieval. EQADR’s cross-domain strategy delivers targeted, on-demand queries from higher-side networks to lower-side networks while securing sources, methods, and analytical tradecraft. The platform is designed to handle sensitive data transfers, allowing search terms to remain hidden and enabling it to make an effective sift through open source data with a view to reducing classified data storage costs and sharing intellectual property., December 2022: The Army Evaluates Zero Trust Cybersecurity for JADC2, the company to attain the scale Operational Zero Trust to accommodate different Army command levels and demonstrated the platform’s ability to detect and respond to malicious attacks in a warfighting environment using a digital model and Army to test technologies for joint all-domain command and control, also known as JADC2. The Pentagon'swide effort is focused on linking platforms via a shared network in which decision-making data from multiple sensors and shooters are rapidly transmitted.. Key drivers for this market are: Growing Severity of Cyber Attacks on Military/Government Organizations, Increasing Government Initiatives to Secure Critical Data. Potential restraints include: Growing Severity of Cyber Attacks on Military/Government Organizations, Increasing Government Initiatives to Secure Critical Data. Notable trends are: Growing Severity of Cyber Attacks on Military/Government Organizations.
Dataset to Train Intrusion Detection Systems based on Machine Learning...
zenodo.org
application/gzip, bin +1
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esteban Damian Gutierrez Mlot; Esteban Damian Gutierrez Mlot (2024). Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations [Dataset]. http://doi.org/10.5281/zenodo.14066350
Explore at:
bin, application/gzip, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14066350
Dataset updated
Nov 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Esteban Damian Gutierrez Mlot; Esteban Damian Gutierrez Mlot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DATASET

This dataset is part of the research work titled "A Dataset to Train Intrusion Detection Systems based on Machine Learning Models for Electrical Substations," which is currently awaiting approval for publication. The dataset has been meticulously curated to support the development and evaluation of machine learning models tailored for detecting cyber intrusions in the context of electrical substations. It is intended to facilitate research and advancements in cybersecurity for critical infrastructure, specifically focusing on real-world scenarios within electrical substation environments. We encourage its use for experimentation and benchmarking in related areas of study.

The following sections list the content of the dataset generated.

Data

raw

iec6180

attack-free-data

capture61850-attackfree.pcap (from real substation)

capture61850-attackfree_PTP.pcap

capture61850-attackfree_normalfault.pcap

attack-data

capture61850-floodattack_withfault.pcap

capture61850-floodattack_withoutfault.pcap

capture61850-fuzzyattack_withfault.pcap

capture61850-fuzzyattack_withoutfault.pcap

capture61850-replay.pcap

capture61850-ptpattack.pcap

iec104

attack-free-data

capture104-attackfree.pcap (from real substation)

attack-data

capture104-dosattack.pcap

capture104-floodattack.pcap

capture104-fuzzyattack.pcap

capture104-iec104starvationattack.pcap

capture104-mitmattack.pcap

capture104-ntpddosattack.pcap

capture104-portscanattack.pcap

processed

iec6180

attack-free-data

capture61850-attackfree.csv

capture61850-attackfree_PTP.csv

capture61850-attackfree_normalfault.csv

attack-data

capture61850-floodattack_withfault.csv

capture61850-floodattack_withoutfault.csv

capture61850-fuzzyattack_withfault.csv

capture61850-fuzzyattack_withoutfault.csv

capture61850-replay.csv

capture61850-ptpattack.csv

headers_iec61850[all].txt

iec104

attack-free-data

capture104-attackfree.csv

attack-data

capture104-dosattack.csv

capture104-floodattack.csv

capture104-fuzzyattack.csv

capture104-iec104starvationattack.csv

capture104-mitmattack.csv

capture104-ntpddosattack.csv

capture104-portscanattack.csv

headers_iec104[all].txt

Description

file type: it may be captured61850 or captured104 depending on whether it contains network captures of the protocol IEC61850 or IEC104.

attack: attack free (attackfree) or attack name is added to the file name.

function: optionally, if there are some details about functionality captured (normalfault) or specific protocol capture (PTP).

file extension: the type can be PCAP (network capture) or CSV (flow file).

Results

results

test1-iec104

model-test1-iec104.pkl

test1-iec104.log

test1-iec61850

model-test1-iec61850.pkl

test1-iec61850.log

test2-iec61850

model-test2-iec61850.pkl

test2-iec61850.log

Description

The outcomes of different test executions are available as follows:

test1-iec104: IEC 104 protocol for all attacks and attack free scenario

test1-iec61850: IEC 61850 protocol for fuzzy attack with fault injection and attack free scenario

test2-iec61850: IEC 61850 protocol for fuzzy attack normal operation and attack free scenario

Each test consists of the model results in Python pickle format (with a .pkl extension) and a detailed description of the execution conditions in an output log file (with a .log extension).

Source Code

A snapshot of the source code used to process these files is included under the filename source-code-cybersecurity-datasets-v2.0.zip. For an updated version, please consider visiting github repository.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Cybersecurity Threat Detection Dataset [Dataset]. https://paperswithcode.com/dataset/cybersecurity-threat-detection

Data from: Cybersecurity Threat Detection Dataset

Explore at:

Dataset updated

Mar 7, 2025

Description

Problem Statement

👉 Download the case studies here

Organizations face an increasing number of sophisticated cybersecurity threats, including malware, phishing attacks, and unauthorized access. A financial institution experienced frequent attempts to breach its network, risking sensitive data and regulatory compliance. Traditional security measures were reactive and failed to detect threats in real time. The institution sought a proactive AI-driven solution to identify and prevent cybersecurity threats effectively.

Challenge

Developing an advanced threat detection system required addressing several challenges:

Processing and analyzing large volumes of network traffic and user activity data in real time.

Identifying new and evolving threats, such as zero-day vulnerabilities, with high accuracy.

Minimizing false positives to ensure security teams could focus on genuine threats.

Solution Provided

An AI-powered threat detection system was developed using machine learning algorithms and advanced analytics. The solution was designed to:

Continuously monitor network activity and user behavior to identify suspicious patterns.

Detect and neutralize cybersecurity threats in real time, including malware and phishing attempts.

Provide actionable insights to security teams for faster and more effective threat response.

Development Steps

Data Collection

Collected network traffic logs, endpoint activity, and historical threat data to train machine learning models.

Preprocessing

Cleaned and standardized data, ensuring compatibility across diverse sources, and filtered out noise for accurate analysis.

Model Development

Developed machine learning algorithms for anomaly detection, behavioral analysis, and threat classification. Trained models on labeled datasets to recognize known threats and identify emerging attack patterns.

Validation

Tested the system against simulated and real-world threat scenarios to evaluate detection accuracy, response times, and reliability.

Deployment

Integrated the threat detection system into the institution’s existing cybersecurity infrastructure, including firewalls, SIEM (Security Information and Event Management) tools, and endpoint protection

Continuous Monitoring & Improvement

Established a feedback loop to refine models using new threat data and adapt to evolving attack strategies.

Results

Enhanced Security Posture

The system improved the institution’s ability to detect and prevent cybersecurity threats proactively, strengthening its overall security framework.

Reduced Incidence of Cyber Attacks

Real-time detection and response significantly reduced the frequency and impact of successful cyber attacks.

Improved Threat Response Times

Automated threat identification and prioritization enabled security teams to respond faster and more effectively to potential breaches.

Minimized False Positives

Advanced algorithms reduced false alarms, allowing security teams to focus on genuine threats and improve efficiency.

Scalable and Adaptive Solution

The system adapted to new threats and scaled effortlessly to protect growing organizational networks and data.

Clear search

Close search

Google apps

Main menu

Data from: Cybersecurity Threat Detection Dataset

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

Article Information

Abstract

ZIP Folder Content

Datasets' Content

Dataset Description for "Quantum AI for Cybersecurity Threat Prediction"

Drone-Based Malware Detection (DBMD)

Global cyberattack distribution 2023, by type

Open Source Cyber Security Market Report | Global Forecast From 2025 To 2033...

Open Source Cyber Security Market Outlook

Component Analysis

Comprehensive, Multi-Source Cyber-Security Events Data Set

Data from: A Dataset of Cyber-Induced Mechanical Faults on Buildings with...

Cyber Security Situational Awareness Market Report | Global Forecast From...

Cyber Security Situational Awareness Market Outlook

Component Analysis

Number of data compromises and impacted individuals in U.S. 2005-2024

Large-Scale Attacks in IoT Environment

Forchheim Image Dataset Dataset

Open Source Security Market Report | Global Forecast From 2025 To 2033

Open Source Security Market Outlook

Component Analysis

Threat Intelligence Text Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

EDGE-IIOTSET Dataset

Intrusion Detection System Software Report

Open Source Cyber Intelligence Tools Report

5.12 Cybersecurity (summary) - Archived

Defense Cyber Security Market Report

Dataset to Train Intrusion Detection Systems based on Machine Learning...

DATASET

Data

Description

Results

Description

Source Code

Data from: Cybersecurity Threat Detection Dataset