64 datasets found

z
Global Dataset of Cyber Incidents
zenodo.org
data.niaid.nih.gov
bin, csv, pdf, txt
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley (2025). Global Dataset of Cyber Incidents [Dataset]. http://doi.org/10.5281/zenodo.14965395
Explore at:
pdf, bin, txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14965395
Dataset updated
Apr 1, 2025
Dataset provided by
European Repository of Cyber Incidents
Authors
Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The European Repository of Cyber Incidents (EuRepoC) is releasing the Global Dataset of Cyber Incidents in Version 1.3 as an extract of our backend database. This official release contains fully consolidated cyber incident data reviewed by our interdisciplinary experts in the fields of politics, law and technology across all 60 variables covered by the European Repository. Version 1.3 covers the years 2000 – 2024 entirely. The Global Dataset is meant for reliable, evidence-based analysis. If you require real-time data, please refer to the download option in our TableView or contact us for special requirements (including API access).

The dataset now contains data on 3416 cyber incidents which started between 01.01.2000 and 31.12.2024. The European Repository of Cyber Incidents (EuRepoC) gathers, codes, and analyses publicly available information from over 220 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

Full Codebook available here

Information about each file

please scroll down this page entirely to see all files available. Zenodo only displays the attribution dataset by default.

Global Database (csv or xlsx):
This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

Receiver Dataset (csv or xlsx):
In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

Attribution Dataset (csv or xlsx):
This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

Dyadic Dataset (csv or xlsx):
The dyadic dataset puts state dyads in the focus. Each row in the dataset represents one cyber incident in a specific dyad. Because incidents may affect multiple receivers, single incidents can be duplicated in this format, when they affected multiple countries.
All-time biggest online data breaches 2025
statista.com
ai-chatbox.pro
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025
Area covered
Worldwide
Description
The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
Global cyberattack distribution 2023, by type
statista.com
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Global cyberattack distribution 2023, by type [Dataset]. https://www.statista.com/statistics/1382266/cyber-attacks-worldwide-by-type/
Explore at:
Dataset updated
Nov 14, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
In 2023, ransomware was the most frequently detected cyberattack worldwide, with around 70 percent of all detected cyberattacks. Network breaches ranked second, with almost 19 percent of the detections. Although less frequently, data exfiltration was also among the detected cyberattacks.
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
ai-chatbox.pro
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
Jul 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
i
Cyber-Physical Dataset for UAVs Under Normal Operations and Cyber-Attacks
ieee-dataport.org
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Ismail (2023). Cyber-Physical Dataset for UAVs Under Normal Operations and Cyber-Attacks [Dataset]. https://ieee-dataport.org/documents/cyber-physical-dataset-uavs-under-normal-operations-and-cyber-attacks
Explore at:
Dataset updated
Dec 4, 2023
Authors
Muhammad Ismail
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
which hinders further research in this field.
z
Global Dataset of Cyber Incidents V.1.2
zenodo.org
bin, csv, json
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Repository of Cyber Incidents (EuRepoC); European Repository of Cyber Incidents (EuRepoC) (2024). Global Dataset of Cyber Incidents V.1.2 [Dataset]. http://doi.org/10.5281/zenodo.11108195
Explore at:
csv, bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11108195
Dataset updated
May 3, 2024
Dataset provided by
European Repository of Cyber Incidents (EuRepoC)
Authors
European Repository of Cyber Incidents (EuRepoC); European Repository of Cyber Incidents (EuRepoC)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 2, 2024
Description
The dataset contains data on 2889 cyber incidents between 01.01.2000 and 02.05.2024 using 60 variables, including the start date, names and categories of receivers along with names and categories of initiators. The database was compiled as part of the European Repository of Cyber Incidents (EuRepoC) project.

EuRepoC gathers, codes, and analyses publicly available information from over 200 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

Codebook available here

Information about each file:

Global Database (csv or xlsx):
This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

Receiver Dataset (csv):
In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

Attribution Dataset (csv):
This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

eurepoc_global_database_1.2 (json):
This file contains the whole database in JSON format.
Intrusion Detect. CICEV2023: DDoS Attack Profiling
kaggle.com
zip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agung Pambudi (2025). Intrusion Detect. CICEV2023: DDoS Attack Profiling [Dataset]. https://www.kaggle.com/datasets/agungpambudi/secure-intrusion-detection-ddos-attacks-profiling
Explore at:
zip(231762852 bytes)Available download formats
Dataset updated
Mar 27, 2025
Authors
Agung Pambudi
Description
To cite the dataset please reference it as Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.

Explore a comprehensive dataset capturing DDoS attack scenarios within electric vehicle (EV) charging infrastructure. This dataset features diverse machine learning attributes, including packet access counts, system status details, and authentication profiles across multiple charging stations and grid services. Simulated attack scenarios, authentication protocols, and extensive profiling results offer invaluable insights for training and testing detection models in safeguarding EV charging systems against cyber threats.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5737185%2F2dec3a047fec426e0b6d2f7672d25016%2Fadjusted-5221113.jpg?generation=1743055158796994&alt=media" alt=""> Figure 1: Proposed simulator structure, source: Y. Kim, S. Hakak, and A. Ghorbani.

Acknowledgment :

The authors sincerely appreciate the support provided by the Canadian Institute for Cybersecurity (CIC), as well as the funding received from the Canada Research Chair and the Atlantic Canada Opportunities Agency (ACOA).

Reference :

Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.
CTU-SME-11: a labeled dataset with real benign and malicious network traffic...
zenodo.org
data.niaid.nih.gov
bin, bz2, csv, html
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Štěpán Bendl; Štěpán Bendl; Veronica Valeros; Veronica Valeros; Sebastian Garcia; Sebastian Garcia (2023). CTU-SME-11: a labeled dataset with real benign and malicious network traffic mimicking a small medium-size enterprise environment [Dataset]. http://doi.org/10.5281/zenodo.7958259
Explore at:
csv, html, bz2, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7958259
Dataset updated
May 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Štěpán Bendl; Štěpán Bendl; Veronica Valeros; Veronica Valeros; Sebastian Garcia; Sebastian Garcia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As technology advances, the number and complexity of cyber-attacks increase, forcing defense techniques to be updated and improved. To help develop effective tools for detecting security threats it is essential to have reliable and representative security datasets. Many existing security datasets have limitations that make them unsuitable for research, including lack of labels, unbalanced traffic, and outdated threats.

CTU-SME-11 is a labeled network dataset designed to address the limitations of previous datasets. The dataset was captured in a real network that mimics a small-medium enterprise setting. Raw network traffic (packets) was captured from 11 devices using tcpdump for a duration of 7 days, from 20th to 26th of February, 2023 in Prague, Czech Republic. The devices were chosen based on the enterprise setting and consists of IoT, desktop and mobile devices, both bare metal and virtualized. The devices were infected with malware or exposed to Internet attacks, and factory reset to restore benign behavior.

The raw data was processed to generate network flows (Zeek logs) which were analyzed and labeled. The dataset contains two types of levels, a high level label and a descriptive label, which were put by experts. The former can take three values, benign, malicious or background. The latter contains detailed information about the specific behavior observed in the network flows. The dataset contains 99 million labeled network flows. The overall compressed size of the dataset is 80GB and the uncompressed size is 170GB.
Z
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
data.niaid.nih.gov
zenodo.org
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nurmi, Juha (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
Explore at:
Dataset updated
Jun 20, 2023
Dataset provided by
Niemelä, Mikko
Brumley, Billy
Nurmi, Juha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
UNB CIC IOT Dataset 2023 (Updated 2024-10-08)
kaggle.com
zip
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Abdul Al Emon (2025). UNB CIC IOT Dataset 2023 (Updated 2024-10-08) [Dataset]. https://www.kaggle.com/datasets/mdabdulalemo/cic-iot-dataset2023-updated-2024-10-08
Explore at:
zip(3264262523 bytes)Available download formats
Dataset updated
May 24, 2025
Authors
Md. Abdul Al Emon
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The CIC IoT Dataset 2023 is a comprehensive benchmark developed by the Canadian Institute for Cybersecurity (CIC) to advance intrusion detection research in real-world Internet of Things (IoT) environments. This dataset was created using a network of 105 actual IoT devices, encompassing smart home gadgets, sensors, and cameras, to simulate authentic IoT traffic and attack scenarios.

Key Features:

Diverse Attack Scenarios: The dataset includes 33 distinct attacks categorized into seven classes: DDoS, DoS, Reconnaissance, Web-based, Brute Force, Spoofing, and Mirai. These attacks were executed by compromised IoT devices targeting other IoT devices, reflecting realistic threat vectors.(University of New Brunswick)

Extensive Data Collection: Network traffic was captured in real-time, resulting in over 46 million records. The data is available in various formats, including raw PCAP files and pre-extracted CSV features, facilitating different research needs.

Realistic IoT Topology: Unlike many datasets that rely on simulations, this dataset was generated using a large-scale IoT testbed with devices from multiple vendors, providing a heterogeneous and realistic network environment.

Benchmarking and Evaluation: The dataset has been utilized to evaluate the performance of machine learning and deep learning algorithms in classifying and detecting malicious versus benign IoT network traffic.(University of New Brunswick)

This dataset serves as a valuable resource for researchers and practitioners aiming to develop and test security analytics applications, intrusion detection systems, and other cybersecurity solutions tailored for IoT ecosystems.(University of New Brunswick)
Share of cyberattacks in Italy 2024, by reason
statista.com
ai-chatbox.pro
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of cyberattacks in Italy 2024, by reason [Dataset]. https://www.statista.com/statistics/649358/share-cyber-attacks-in-italy-by-reason/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Italy
Description
During the first half of 2024, around ** percent of cyberattacks carried out in Italy had cybercrime as a purpose. Cyber espionage was another motivation, representing the main reason behind roughly **** percent of attacks. By contrast, information warfare only accounted for *** percent of the cyberattacks in the country in the last examined period. Data breaches in Italy In 2023, over half of the Italian digital population was alerted that their personal data had been breached, and **** percent of the alerted users had the misfortune of being affected by data compromise on the dark web. Despite a decrease in the number of data sets affected in data breaches between 2020 and 2023, Italy recorded almost *** million exposed data sets at the beginning of 2023.Meanwhile, the average cost of data breaches for both Italian companies and targeted users kept growing, reaching **** million U.S. dollars in 2024, up from the **** million U.S. dollars recorded in the previous year. The Italian privacy landscape: GDPR effects As a state member of the European Union, Italy is covered by the General Data Protection Regulation (GDPR). Since 2018, the GDPR has regulated online data privacy and has the responsibility to represent consumers’ interests within the digital and tech landscape of the Union. As of 2023, approximately *** fines were issued in Italy due to violations of the GDPR – making Italy the second country in Europe with the highest number of violations dispensed to tech companies. The highest GDPR fine ever issued in Italy was at the expense of Telecom Italia (TIM), one of the largest Italian telecommunications companies. TIM was fined approximately **** million euros in January 2020. GDPR is enforced and helped by the country's Garante della Privacy, the national institution overseeing Italian users’ online rights, cybersecurity, and digital privacy.
Cyber security breaches survey 2023
gov.uk
beta.ukdataservice.ac.uk
Updated Apr 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Science, Innovation and Technology (2023). Cyber security breaches survey 2023 [Dataset]. https://www.gov.uk/government/statistics/cyber-security-breaches-survey-2023
Explore at:
Dataset updated
Apr 19, 2023
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Science, Innovation and Technology
Description
The government has surveyed UK businesses, charities and educational institutions to find out how they approach cyber security and gain insight into the cyber security issues they face. The research informs government policy on cyber security and how government works with industry to build a prosperous and resilient digital UK.

Published

19 April 2023

Period covered

Respondents were asked about their approach to cyber security and any breaches or attacks over the 12 months before the interview. Main survey interviews took place between October 2022 and January 2023. Qualitative follow up interviews took place in December 2022 and January 2023.

Geographic coverage

UK

Further Information

The survey is part of the government’s National Cyber Strategy 2002.

There is a wide range of free government cyber security guidance and information for businesses, including details of free online training and support.

The survey was carried out by Ipsos UK. The report has been produced by Ipsos on behalf of the Department for Science, Innovation and Technology.

The UK Statistics Authority

This release is published in accordance with the Code of Practice for Statistics (2018), as produced by the UK Statistics Authority. The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.

Pre-release access

The document above contains a list of ministers and officials who have received privileged early access to this release. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.

Contact information

The Lead Analyst for this release is Emma Johns. For any queries please contact cybersurveys@dsit.gov.uk.

For media enquiries only, please contact the press office on 020 7215 1000.
Z
AIT Alert Data Set
data.niaid.nih.gov
zenodo.org
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landauer, Max (2024). AIT Alert Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8263180
Explore at:
Dataset updated
Oct 14, 2024
Dataset provided by
Landauer, Max
Wurzenberger, Markus
Skopik, Florian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the AIT Alert Data Set (AIT-ADS), a collection of synthetic alerts suitable for evaluation of alert aggregation, alert correlation, alert filtering, and attack graph generation approaches. The alerts were forensically generated from the AIT Log Data Set V2 (AIT-LDSv2) and origin from three intrusion detection systems, namely Suricata, Wazuh, and AMiner. The data sets comprise eight scenarios, each of which has been targeted by a multi-step attack with attack steps such as scans, web application exploits, password cracking, remote command execution, privilege escalation, etc. Each scenario and attack chain has certain variations so that attack manifestations and resulting alert sequences vary in each scenario; this means that the data set allows to develop and evaluate approaches that compute similarities of attack chains or merge them into meta-alerts. Since only few benchmark alert data sets are publicly available, the AIT-ADS was developed to address common issues in the research domain of multi-step attack analysis; specifically, the alert data set contains many false positives caused by normal user behavior (e.g., user login attempts or software updates), heterogeneous alert formats (although all alerts are in JSON format, their fields are different for each IDS), repeated executions of attacks according to an attack plan, collection of alerts from diverse log sources (application logs and network traffic) and all components in the network (mail server, web server, DNS, firewall, file share, etc.), and labels for attack phases. For more information on how this alert data set was generated, check out our paper accompanying this data set [1] or our GitHub repository. More information on the original log data set, including a detailed description of scenarios and attacks, can be found in [2].

The alert data set contains two files for each of the eight scenarios, and a file for their labels:

_aminer.json contains alerts from AMiner IDS

_wazuh.json contains alerts from Wazuh IDS and Suricata IDS

labels.csv contains the start and end times of attack phases in each scenario

Beside false positive alerts, the alerts in the AIT-ADS correspond to the following attacks:

Scans (nmap, WPScan, dirb)

Webshell upload (CVE-2020-24186)

Password cracking (John the Ripper)

Privilege escalation

Remote command execution

Data exfiltration (DNSteal) and stopped service

The total number of alerts involved in the data set is 2,655,821, of which 2,293,628 origin from Wazuh, 306,635 origin from Suricata, and 55,558 origin from AMiner. The numbers of alerts in each scenario are as follows. fox: 473,104; harrison: 593,948; russellmitchell: 45,544; santos: 130,779; shaw: 70,782; wardbeck: 91,257; wheeler: 616,161; wilson: 634,246.

Acknowledgements: Partially funded by the European Defence Fund (EDF) projects AInception (101103385) and NEWSROOM (101121403), and the FFG project PRESENT (FO999899544). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them.

If you use the AIT-ADS, please cite the following publications:

[1] Landauer, M., Skopik, F., Wurzenberger, M. (2024): Introducing a New Alert Data Set for Multi-Step Attack Analysis. Proceedings of the 17th Cyber Security Experimentation and Test Workshop. [PDF]

[2] Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): Maintainable Log Datasets for Evaluation of Intrusion Detection Systems. IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. [PDF]
U.S. number of data sets affected in data breaches Q1 2020-Q2 2025
statista.com
Updated Jul 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). U.S. number of data sets affected in data breaches Q1 2020-Q2 2025 [Dataset]. https://www.statista.com/statistics/1329989/us-number-of-records-exposed/
Explore at:
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
Between the third quarter of 2024 and the second quarter of 2025, the number of records exposed in data breaches in the United States decreased significantly. In the most recent measured period, over **** million records were reported as leaked, down from around ****** million in the third quarter of 2024.
CICIoT2023
kaggle.com
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HIMADRI07 (2025). CICIoT2023 [Dataset]. https://www.kaggle.com/datasets/himadri07/ciciot2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HIMADRI07
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The CICIoT2023 dataset is a comprehensive and modern dataset designed for research in Internet of Things (IoT) security, particularly for intrusion detection and anomaly detection systems. Released by the Canadian Institute for Cybersecurity (CIC), this dataset reflects real-world IoT network traffic and attack scenarios, providing a valuable resource for machine learning and cybersecurity research.

The dataset was generated using a realistic testbed that simulates various IoT devices communicating over a network, including smart TVs, webcams, smart thermostats, and wearable devices. It captures both benign traffic and a wide variety of attack types such as Denial of Service (DoS), Distributed Denial of Service (DDoS), brute-force attacks, botnets, reconnaissance, and more advanced threats.

Key Features of CICIoT2023:

Contains a mix of normal and malicious IoT network traffic.

Includes 34 distinct attack types, covering modern and advanced cyber threat scenarios.

Provides labeled data suitable for supervised machine learning models.

Offers extracted network flow features (e.g., packet size, duration, flags, statistical summaries) which can be used for traffic classification and anomaly detection.

Supports research in intrusion detection, anomaly detection, and IoT security strategy development.

This dataset helps bridge the gap between traditional network security datasets and the unique, evolving patterns of IoT device communication, making it an excellent benchmark for evaluating the performance of AI-based security solutions.

I have further broken downed the data into these 3 parts Train: (5491971, 47) Validation: (1176851, 47) Test: (1176851, 47)
Z
Cyber-attack scenarios for super-heaters system
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał Syfert (2024). Cyber-attack scenarios for super-heaters system [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7612268
Explore at:
Dataset updated
Jul 12, 2024
Dataset authored and provided by
Michał Syfert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Simulated cyber-attacks for super-heaters system. For detailed description refer to pdf file. Dataset is in the form of tab separated txt files.

In case of any questions please contact michal.syfert@pw.edu.pl or anna.sztyber@pw.edu.pl.

Please cite: Sztyber-Betley, A.; Syfert, M.; Kościelny, J.M.; Górecka, Z. Controller Cyber-Attack Detection and Isolation. Sensors 2023, 23, 2778. https://doi.org/10.3390/s23052778
h
cyber-crimes
huggingface.co
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SchoolyAI (2023). cyber-crimes [Dataset]. https://huggingface.co/datasets/schooly/cyber-crimes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2023
Dataset authored and provided by
SchoolyAI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
schooly/cyber-crimes dataset hosted on Hugging Face and contributed by the HF Datasets community
Global number of breached user accounts Q1 2020-Q3 2024
statista.com
ai-chatbox.pro
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global number of breached user accounts Q1 2020-Q3 2024 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the third quarter of 2024, data breaches exposed more than *** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the first quarter of ***, more than *** million data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.
m
PhiUSIIL Phishing URL Dataset
data.mendeley.com
Updated Nov 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Prasad (2023). PhiUSIIL Phishing URL Dataset [Dataset]. http://doi.org/10.17632/shwpxscxy2.2
Explore at:
Unique identifier
https://doi.org/10.17632/shwpxscxy2.2
Dataset updated
Nov 15, 2023
Authors
Arvind Prasad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed while constructing the dataset are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.

Citation: Prasad, A., & Chandra, S. (2023). PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning. Computers & Security, 103545. doi: https://doi.org/10.1016/j.cose.2023.103545
India Cyber Security Incidents: Total
ceicdata.com
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). India Cyber Security Incidents: Total [Dataset]. https://www.ceicdata.com/en/india/information-technology-statistics-cyber-security-incidents/cyber-security-incidents-total
Explore at:
Dataset updated
Dec 15, 2023
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2012 - Dec 1, 2023
Area covered
India
Variables measured
Technology
Description
India Cyber Security Incidents: Total data was reported at 1,592,917.000 Unit in 2023. This records an increase from the previous number of 1,391,457.000 Unit for 2022. India Cyber Security Incidents: Total data is updated yearly, averaging 49,908.500 Unit from Dec 2004 (Median) to 2023, with 20 observations. The data reached an all-time high of 1,592,917.000 Unit in 2023 and a record low of 23.000 Unit in 2004. India Cyber Security Incidents: Total data remains active status in CEIC and is reported by Indian Computer Emergency Response Team. The data is categorized under India Premium Database’s Transportation, Post and Telecom Sector – Table IN.TF010: Information Technology Statistics: Cyber Security Incidents.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley (2025). Global Dataset of Cyber Incidents [Dataset]. http://doi.org/10.5281/zenodo.14965395

Global Dataset of Cyber Incidents

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

pdf, bin, txt, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14965395

Dataset updated

Apr 1, 2025

Dataset provided by

European Repository of Cyber Incidents

Authors

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The European Repository of Cyber Incidents (EuRepoC) is releasing the Global Dataset of Cyber Incidents in Version 1.3 as an extract of our backend database. This official release contains fully consolidated cyber incident data reviewed by our interdisciplinary experts in the fields of politics, law and technology across all 60 variables covered by the European Repository. Version 1.3 covers the years 2000 – 2024 entirely. The Global Dataset is meant for reliable, evidence-based analysis. If you require real-time data, please refer to the download option in our TableView or contact us for special requirements (including API access).

The dataset now contains data on 3416 cyber incidents which started between 01.01.2000 and 31.12.2024. The European Repository of Cyber Incidents (EuRepoC) gathers, codes, and analyses publicly available information from over 220 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

Full Codebook available here

Information about each file

please scroll down this page entirely to see all files available. Zenodo only displays the attribution dataset by default.

Global Database (csv or xlsx):
This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

Receiver Dataset (csv or xlsx):
In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

Attribution Dataset (csv or xlsx):
This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

Dyadic Dataset (csv or xlsx):
The dyadic dataset puts state dyads in the focus. Each row in the dataset represents one cyber incident in a specific dyad. Because incidents may affect multiple receivers, single incidents can be duplicated in this format, when they affected multiple countries.

Clear search

Close search

Google apps

Main menu

Global Dataset of Cyber Incidents

All-time biggest online data breaches 2025

Global cyberattack distribution 2023, by type

Number of data compromises and impacted individuals in U.S. 2005-2024

Cyber-Physical Dataset for UAVs Under Normal Operations and Cyber-Attacks

Global Dataset of Cyber Incidents V.1.2

Intrusion Detect. CICEV2023: DDoS Attack Profiling

CTU-SME-11: a labeled dataset with real benign and malicious network traffic...

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

UNB CIC IOT Dataset 2023 (Updated 2024-10-08)

Share of cyberattacks in Italy 2024, by reason

Cyber security breaches survey 2023

Published

Period covered

Geographic coverage

Further Information

The UK Statistics Authority

Pre-release access

Contact information

AIT Alert Data Set

U.S. number of data sets affected in data breaches Q1 2020-Q2 2025

CICIoT2023

Cyber-attack scenarios for super-heaters system

cyber-crimes

Global number of breached user accounts Q1 2020-Q3 2024

PhiUSIIL Phishing URL Dataset

India Cyber Security Incidents: Total

Global Dataset of Cyber Incidents