6 datasets found

Data Breaches
kaggle.com
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Data Breaches [Dataset]. https://www.kaggle.com/datasets/thedevastator/data-breaches-a-comprehensive-list/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Data Breaches Dataset

30,000 Records of cyber-security data breaches

About this dataset

This dataset is a compilation of data from various sources detailing data breaches. These sources include press reports, government news releases, and mainstream news articles. The list includes those involving the theft or compromise of 30,000 or more records, although many smaller breaches occur continually. In addition, the various methods used in the breaches are listed, with hacking being the most common.

Organizations of all types and sizes are susceptible to data breaches, which can have devastating consequences. This dataset can help shed light on which organizations are most at risk and how these breaches occur so that steps can be taken to prevent them in the future

How to use the dataset

There are many ways to use this dataset. Here are a few ideas:

Use the data to understand which types of organizations are most commonly breached, and what methods are used most often.

Analyze the data to see if there are any trends or patterns in when or how breaches occur.

Use the data to create a visualizations or infographic showing the prevalence of data breaches

Research Ideas

This dataset can be used to identify trends in data breaches in terms of methods used, types of organizations breached, and geographical distribution.

This dataset can be used to study the effect of data breaches on organizational reputation and customer trust.

This dataset can be used by organizations to benchmark their own security measures against those of similar organizations that have experienced data breaches

Acknowledgements

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:----------------------|:---------------------------------------------------------------------| | Entity | The name of the organization that was breached. (String) | | Year | The year when the breach occurred. (Integer) | | Records | The number of records that were compromised in the breach. (Integer) | | Organization type | The type of organization that was breached. (String) | | Method | The method that was used to breach the organization. (String) | | Sources | The sources from which the data was collected. (String) |
All-time biggest online data breaches 2025
statista.com
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025
Area covered
Worldwide
Description
The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
Jul 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Global number of breached user accounts Q1 2020-Q3 2024
statista.com
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global number of breached user accounts Q1 2020-Q3 2024 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the third quarter of 2024, data breaches exposed more than *** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the first quarter of ***, more than *** million data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.
Z
IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT
data.niaid.nih.gov
zenodo.org
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santos, Leonel (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8116337
Explore at:
Dataset updated
Aug 30, 2024
Dataset provided by
Areia, José
Costa, Rogério Luís
Bispo, Ivo Afonso
Santos, Leonel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Article Information

The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

Please do cite the aforementioned article when using this dataset.

Abstract

The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

ZIP Folder Content

The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

Datasets' Content

Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

Identified Key Features Within Bluetooth Dataset

Feature Meaning

btle.advertising_header BLE Advertising Packet Header

btle.advertising_header.ch_sel BLE Advertising Channel Selection Algorithm

btle.advertising_header.length BLE Advertising Length

btle.advertising_header.pdu_type BLE Advertising PDU Type

btle.advertising_header.randomized_rx BLE Advertising Rx Address

btle.advertising_header.randomized_tx BLE Advertising Tx Address

btle.advertising_header.rfu.1 Reserved For Future 1

btle.advertising_header.rfu.2 Reserved For Future 2

btle.advertising_header.rfu.3 Reserved For Future 3

btle.advertising_header.rfu.4 Reserved For Future 4

btle.control.instant Instant Value Within a BLE Control Packet

btle.crc.incorrect Incorrect CRC

btle.extended_advertising Advertiser Data Information

btle.extended_advertising.did Advertiser Data Identifier

btle.extended_advertising.sid Advertiser Set Identifier

btle.length BLE Length

frame.cap_len Frame Length Stored Into the Capture File

frame.interface_id Interface ID

frame.len Frame Length Wire

nordic_ble.board_id Board ID

nordic_ble.channel Channel Index

nordic_ble.crcok Indicates if CRC is Correct

nordic_ble.flags Flags

nordic_ble.packet_counter Packet Counter

nordic_ble.packet_time Packet time (start to end)

nordic_ble.phy PHY

nordic_ble.protover Protocol Version

Identified Key Features Within IP-Based Packets Dataset

Feature Meaning

http.content_length Length of content in an HTTP response

http.request HTTP request being made

http.response.code Sequential number of an HTTP response

http.response_number Sequential number of an HTTP response

http.time Time taken for an HTTP transaction

tcp.analysis.initial_rtt Initial round-trip time for TCP connection

tcp.connection.fin TCP connection termination with a FIN flag

tcp.connection.syn TCP connection initiation with SYN flag

tcp.connection.synack TCP connection establishment with SYN-ACK flags

tcp.flags.cwr Congestion Window Reduced flag in TCP

tcp.flags.ecn Explicit Congestion Notification flag in TCP

tcp.flags.fin FIN flag in TCP

tcp.flags.ns Nonce Sum flag in TCP

tcp.flags.res Reserved flags in TCP

tcp.flags.syn SYN flag in TCP

tcp.flags.urg Urgent flag in TCP

tcp.urgent_pointer Pointer to urgent data in TCP

ip.frag_offset Fragment offset in IP packets

eth.dst.ig Ethernet destination is in the internal network group

eth.src.ig Ethernet source is in the internal network group

eth.src.lg Ethernet source is in the local network group

eth.src_not_group Ethernet source is not in any network group

arp.isannouncement Indicates if an ARP message is an announcement

Identified Key Features Within IP-Based Flows Dataset

Feature Meaning

proto Transport layer protocol of the connection

service Identification of an application protocol

orig_bytes Originator payload bytes

resp_bytes Responder payload bytes

history Connection state history

orig_pkts Originator sent packets

resp_pkts Responder sent packets

flow_duration Length of the flow in seconds

fwd_pkts_tot Forward packets total

bwd_pkts_tot Backward packets total

fwd_data_pkts_tot Forward data packets total

bwd_data_pkts_tot Backward data packets total

fwd_pkts_per_sec Forward packets per second

bwd_pkts_per_sec Backward packets per second

flow_pkts_per_sec Flow packets per second

fwd_header_size Forward header bytes

bwd_header_size Backward header bytes

fwd_pkts_payload Forward payload bytes

bwd_pkts_payload Backward payload bytes

flow_pkts_payload Flow payload bytes

fwd_iat Forward inter-arrival time

bwd_iat Backward inter-arrival time

flow_iat Flow inter-arrival time

active Flow active duration
Average cost per data breach in the United States 2006-2024
statista.com
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average cost per data breach in the United States 2006-2024 [Dataset]. https://www.statista.com/statistics/273575/us-average-cost-incurred-by-a-data-breach/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
As of 2024, the average cost of a data breach in the United States amounted to **** million U.S. dollars, down from **** million U.S. dollars in the previous year. The global average cost per data breach was **** million U.S. dollars in 2024. Cost of a data breach in different countries worldwide Data breaches impose a big threat for organizations globally. The monetary damage caused by data breaches has increased in many markets in the past decade. In 2023, Canada followed the U.S. by data breach costs, with an average of **** million U.S. dollars. Since 2019, the average monetary damage caused by loss of sensitive information in Canada has increased notably. In the United Kingdom, the average cost of a data breach in 2024 amounted to around **** million U.S. dollars, while in Germany it stood at **** million U.S. dollars. The cost of data breach by industry and segment Data breach costs vary depending on the industry and segment. For the fourth consecutive year, the global healthcare sector registered the highest costs of data breach, which in 2024 amounted to about **** million U.S. dollars. Financial institutions ranked second, with an average cost of *** million U.S. dollars for a data breach. Detection and escalation was the costliest segment in data breaches worldwide, with **** U.S. dollars on average. The cost for lost business ranked second, while response following a breach came across as the third-costliest segment.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Data Breaches [Dataset]. https://www.kaggle.com/datasets/thedevastator/data-breaches-a-comprehensive-list/code

Data Breaches

30,000 Records of cyber-security data breaches

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 10, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Data Breaches Dataset

30,000 Records of cyber-security data breaches

About this dataset

This dataset is a compilation of data from various sources detailing data breaches. These sources include press reports, government news releases, and mainstream news articles. The list includes those involving the theft or compromise of 30,000 or more records, although many smaller breaches occur continually. In addition, the various methods used in the breaches are listed, with hacking being the most common.

Organizations of all types and sizes are susceptible to data breaches, which can have devastating consequences. This dataset can help shed light on which organizations are most at risk and how these breaches occur so that steps can be taken to prevent them in the future

How to use the dataset

There are many ways to use this dataset. Here are a few ideas:

Use the data to understand which types of organizations are most commonly breached, and what methods are used most often.

Analyze the data to see if there are any trends or patterns in when or how breaches occur.

Use the data to create a visualizations or infographic showing the prevalence of data breaches

Research Ideas

This dataset can be used to identify trends in data breaches in terms of methods used, types of organizations breached, and geographical distribution.

This dataset can be used to study the effect of data breaches on organizational reputation and customer trust.

This dataset can be used by organizations to benchmark their own security measures against those of similar organizations that have experienced data breaches

Acknowledgements

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:----------------------|:---------------------------------------------------------------------| | Entity | The name of the organization that was breached. (String) | | Year | The year when the breach occurred. (Integer) | | Records | The number of records that were compromised in the breach. (Integer) | | Organization type | The type of organization that was breached. (String) | | Method | The method that was used to breach the organization. (String) | | Sources | The sources from which the data was collected. (String) |

Clear search

Close search

Google apps

Main menu

Data Breaches

Data Breaches Dataset

30,000 Records of cyber-security data breaches

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

All-time biggest online data breaches 2025

Number of data compromises and impacted individuals in U.S. 2005-2024

Global number of breached user accounts Q1 2020-Q3 2024

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

Average cost per data breach in the United States 2006-2024

Data Breaches

30,000 Records of cyber-security data breaches

Data Breaches Dataset

30,000 Records of cyber-security data breaches

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns