100+ datasets found

Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
ai-chatbox.pro
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
May 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
D
Data Breach Notifications Affecting Washington Residents (Personal...
data.wa.gov
catalog.data.gov
application/rdfxml +5
Updated Jun 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Washington State Attorney General's Office Consumer Protection Division (2025). Data Breach Notifications Affecting Washington Residents (Personal Information Breakdown) [Dataset]. https://data.wa.gov/Consumer-Protection/Data-Breach-Notifications-Affecting-Washington-Res/padd-mby7
Explore at:
csv, json, application/rssxml, application/rdfxml, tsv, xmlAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Washington State Attorney General's Office Consumer Protection Division
Area covered
Washington
Description
Washington law requires entities impacted by a data breach to notify the Attorney General’s Office (AGO) when more than 500 Washingtonians personal information was compromised as a result of the breach. This dataset breaks out the individual types of breached personal information that were identified in each notice our office received. This data is used to produce the AGO’s Annual Data Breach Report. For additional statistics relating to data breaches, also see the main dataset at: https://data.wa.gov/Consumer-Protection/Data-Breach-Notifications-Affecting-Washington-Res/sb4j-ca4h.
Global number of breached user accounts Q1 2020-Q3 2024
statista.com
ai-chatbox.pro
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global number of breached user accounts Q1 2020-Q3 2024 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the third quarter of 2024, data breaches exposed more than *** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the first quarter of ***, more than *** million data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.
"Pwned Passwords" Dataset
academictorrents.com
bittorrent
Updated Aug 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
haveibeenpwned.com (2018). "Pwned Passwords" Dataset [Dataset]. https://academictorrents.com/details/53555c69e3799d876159d7290ea60e56b35e36a9
Explore at:
bittorrent(11101449979)Available download formats
Dataset updated
Aug 3, 2018
Dataset provided by
Have I Been Pwned?http://haveibeenpwned.com/
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Version 3 with 517M hashes and counts of password usage ordered by most to least prevalent Pwned Passwords are 517,238,891 real world passwords previously exposed in data breaches. This exposure makes them unsuitable for ongoing use as they re at much greater risk of being used to take over other accounts. They re searchable online below as well as being downloadable for use in other online system. The entire set of passwords is downloadable for free below with each password being represented as a SHA-1 hash to protect the original value (some passwords contain personally identifiable information) followed by a count of how many times that password had been seen in the source data breaches. The list may be integrated into other systems and used to verify whether a password has previously appeared in a data breach after which a system may warn the user or even block the password outright.
D
Data from: A dataset containing S&P500 information security breaches and...
dataverse.nl
csv, pdf, xlsx
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nynke Voermans; Francesco Lelli; Francesco Lelli; Nynke Voermans (2024). A dataset containing S&P500 information security breaches and related financial firm performances [Dataset]. http://doi.org/10.34894/Z2IDYZ
Explore at:
xlsx(1831436), csv(2072394), csv(22649), xlsx(36429), xlsx(89376), csv(93304), csv(2014511), csv(58832), pdf(153677), csv(23048), xlsx(1844452), xlsx(64856), xlsx(36000)Available download formats
Unique identifier
https://doi.org/10.34894/Z2IDYZ
Dataset updated
Jul 2, 2024
Dataset provided by
DataverseNL
Authors
Nynke Voermans; Francesco Lelli; Francesco Lelli; Nynke Voermans
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this document, comprehensive datasets are presented to advance research on information security breaches. The datasets include data on disclosed information security breaches affecting S&P500 companies between 2020 and 2023, collected through manual search of the Internet. Overall, the datasets include 504 companies, with detailed information security breach and financial data available for 97 firms that experienced a disclosed information security breach. This document will describe the datasets in detail, explain the data collection procedure and shows the initial versions of the datasets. Contact at Tilburg University Francesco Lelli
All-time biggest online data breaches 2025
statista.com
ai-chatbox.pro
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025
Area covered
Worldwide
Description
The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
Healthcare Ransomware Dataset
kaggle.com
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rivalytics (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rivalytics
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
📌 Context of the Dataset

The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

Why is this important?

Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

1️⃣ IBM Cost of a Data Breach Report (2024)

The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

2️⃣ Sophos State of Ransomware in Healthcare (2024)

67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

3️⃣ Health & Human Services (HHS) Cybersecurity Reports

Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

📌 Why is This a Simulated Dataset?

This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

How It Was Created:

1️⃣ Defining the Dataset Structure

The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

2️⃣ Generating Realistic Data Using ChatGPT & Python

ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

3️⃣ Ensuring Logical Relationships Between Data Points

Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.
A
‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/746cf4e2/?iid=002-610&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

--- Dataset description provided by original source is as follows ---

This is a dataset containing all the major data breaches in the world from 2004 to 2021

As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

--- Original source retains full ownership of the source dataset ---
b
CNSS Jabaroot Leaked Data (April 2025)
breachlookup.me
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caisse Nationale de Sécurité Sociale (CNSS) - Data Source (2025). CNSS Jabaroot Leaked Data (April 2025) [Dataset]. https://breachlookup.me/2025/cnss
Explore at:
Dataset updated
Apr 8, 2025
Dataset provided by
National Social Security Fund
Authors
Caisse Nationale de Sécurité Sociale (CNSS) - Data Source
Area covered
Morocco
Description
Searchable datasets derived from the CNSS data breach posted by Jabaroot on breachforums in April 2025. Includes company details (~500k records) and employee information (~2M records).
Cloud-based Database Security Market Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Cloud-based Database Security Market Outlook

As per our latest research, the global cloud-based database security market size reached USD 7.4 billion in 2024, reflecting robust demand across diverse industries. The market is poised to grow at a compelling CAGR of 17.2% from 2025 to 2033, with the market size forecasted to reach USD 25.7 billion by 2033. This strong growth trajectory is primarily driven by the increasing adoption of cloud infrastructure, the proliferation of data-centric business models, and escalating concerns over cyber threats targeting sensitive and mission-critical data.

A major growth factor for the cloud-based database security market is the exponential rise in cloud adoption across enterprises of all sizes. Organizations are migrating their workloads and databases to the cloud to leverage scalability, cost-efficiency, and agility. However, this migration has also amplified the exposure of databases to sophisticated cyberattacks, prompting a surge in demand for advanced cloud-based security solutions. The increasing frequency of data breaches, ransomware attacks, and compliance requirements such as GDPR, HIPAA, and CCPA have made database security a board-level priority. Consequently, businesses are investing in comprehensive security frameworks that encompass threat detection, access control, encryption, and compliance management, thereby fueling market growth.

Another significant driver is the rapid digital transformation initiatives undertaken by sectors such as BFSI, healthcare, retail, and government. The surge in digital transactions, electronic health records, and online retailing has led to an unprecedented volume of sensitive data being stored and processed in cloud databases. This data is a lucrative target for cybercriminals, necessitating robust security measures. Innovations in artificial intelligence (AI), machine learning (ML), and automation are being integrated into cloud-based database security solutions, enabling real-time threat intelligence, anomaly detection, and automated response mechanisms. These advancements are not only enhancing the efficacy of security protocols but also reducing manual intervention and operational costs.

Furthermore, the evolving regulatory landscape is compelling organizations to adopt cloud-based database security solutions. Governments and regulatory bodies worldwide are imposing stringent data protection laws, mandating businesses to implement advanced security controls and maintain audit trails. Non-compliance can result in hefty fines, reputational damage, and loss of customer trust. As a result, companies are increasingly opting for cloud-native security platforms that offer centralized visibility, automated compliance reporting, and seamless integration with existing IT infrastructure. The growing awareness about the shared responsibility model in cloud security is also encouraging enterprises to proactively secure their databases, driving sustained market expansion.

From a regional perspective, North America currently dominates the cloud-based database security market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The region's leadership is attributed to the high concentration of cloud service providers, early adoption of advanced technologies, and stringent regulatory frameworks. However, Asia Pacific is expected to exhibit the fastest growth during the forecast period, driven by rapid digitalization, expanding IT infrastructure, and increasing investments in cybersecurity. Latin America and the Middle East & Africa are also witnessing steady growth, fueled by rising awareness and government initiatives to bolster data security.

Component Analysis

The component segment of the cloud-based database security market is bifurcated into software and services. Software solutions encompass a wide array of security tools, including database activity monitoring, data encryption, access management, and vulnerability assessment. These
m
Phishing Websites Dataset
data.mendeley.com
Updated Sep 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phishing Websites Dataset [Dataset]. https://data.mendeley.com/datasets/72ptz43s9v/1
Explore at:
Unique identifier
https://doi.org/10.17632/72ptz43s9v.1
Dataset updated
Sep 24, 2020
Authors
Grega Vrbančič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.

In this repository the two variants of the Phishing Dataset are presented.

Full variant - dataset_full.csv Short description of the full variant dataset: Total number of instances: 88,647 Number of legitimate website instances (labeled as 0): 58,000 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111

Small variant - dataset_small.csv Short description of the small variant dataset: Total number of instances: 58,645 Number of legitimate website instances (labeled as 0): 27,998 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111
Biggest data breaches in the U.S. 2024, by impact
statista.com
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Biggest data breaches in the U.S. 2024, by impact [Dataset]. https://www.statista.com/statistics/1448545/us-biggest-data-breaches/
Explore at:
Dataset updated
Mar 7, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2024
Area covered
United States
Description
As of December 2024, the most significant data breach incident in the United States was the Yahoo data breach that dates back to 2013-2016. Impacting over three billion online users, this incident still remains one of the most significant data breaches worldwide. The second-biggest case was the January 2021 data breach at Microsoft, involving about 30 thousand companies in the United States and around 60 thousand companies around the world.
h
Cyber-Security-Breaches
huggingface.co
Updated Jun 30, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SchoolyAI (2014). Cyber-Security-Breaches [Dataset]. https://huggingface.co/datasets/schooly/Cyber-Security-Breaches
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2014
Dataset authored and provided by
SchoolyAI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
schooly/Cyber-Security-Breaches dataset hosted on Hugging Face and contributed by the HF Datasets community
Oracle Cloud Infrastructure Breach Data (March 2025)
breachlookup.me
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oracle Cloud Infrastructure (OCI) - Data Source (2025). Oracle Cloud Infrastructure Breach Data (March 2025) [Dataset]. https://breachlookup.me/2025/oracle
Explore at:
Dataset updated
Mar 15, 2025
Dataset provided by
Oraclehttp://www.oracle.com/
Authors
Oracle Cloud Infrastructure (OCI) - Data Source
Description
Searchable database of domains and records compromised in the March 2025 Oracle Cloud Infrastructure security breach. Includes information about affected tenants and compromised credentials.
CISA TTP Articles Data Set
zenodo.org
csv, json
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dženan Hamzić; Dženan Hamzić; Florian Skopik; Florian Skopik; Markus Wurzenberger; Markus Wurzenberger; Max Landauer; Max Landauer (2025). CISA TTP Articles Data Set [Dataset]. http://doi.org/10.5281/zenodo.14659512
Explore at:
json, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14659512
Dataset updated
May 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dženan Hamzić; Dženan Hamzić; Florian Skopik; Florian Skopik; Markus Wurzenberger; Markus Wurzenberger; Max Landauer; Max Landauer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 27, 2024
Description
This dataset contains 77 cybersecurity articles crawled from the public CISA website. All these articles were publically available at the time of crawling without the need of any subscription or usage of paid services. These articles were published from July 2020 to February 2024 and selected for this dataset if they contained explicitely mentioned MITRE ATT&CK TTPs (Tactics, Techniques, and Procedures).

The data set supports research in the domain of Cyber Threat Intelligence as it may act as a ground truth for TTP labeling. Specifically, this dataset is designed to facilitate research and analysis related to the identification and classification of TTPs in cybersecurity advisories.

Each crawled article is represented by the following four columns:

RawText: The unfiltered text extracted from the main content of each article (class: "l-full_main").

TTP: A set of MITRE ATT&CK TTP (Tactics, Techniques, and Procedures) IDs identified within the article's RawText. These IDs are extracted using the regex pattern: (?:TA\d{4}|T\d{4,5}(?:\.\d{3})?).

CleanText: A cleaned version of the RawText, with tables and TTP IDs removed for clarity.

URL: The url to the original article.

About the crawling process

All advisories were gathered on Sept 27th, 2024 from the CISA website by sifting through all advisory urls backwards in time until 2020. All articles which explicitely mentioned TTPs were selected for the data set. To detect the presence of TTP IDs, each article was checked for the presence of any of the following phrases in the main content:

"MITRE ATT&CK Tactics and Techniques"

"Tactics and Techniques"

"MITRE ATT&CK Techniques"

The data set is availble in CSV and as JSON format, both containing the same data.

Acknowledgments: Funded by the European Union under the European Defence Fund (GA no. 101121403 - NEWSROOM and GA no. 101121418 - EUCINF). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. This work is co-funded by the Austrian FFG Kiras project ASOC (GA no. FO999905301).
Cyber Security Breaches Survey: Combined Dataset, 2016-2022
beta.ukdataservice.ac.uk
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Culture Department For Digital (2025). Cyber Security Breaches Survey: Combined Dataset, 2016-2022 [Dataset]. http://doi.org/10.5255/ukda-sn-8971-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8971-1
Dataset updated
2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
DataCitehttps://www.datacite.org/
Authors
Culture Department For Digital
Description
The Cyber Security Breaches Survey, (CSBS) is run to understand organisations' approaches and attitudes to cyber security, and to understand their experience of cyber security breaches.. The aim of the survey is to support the Government by providing evidence that can inform policies which help to make Britain a safer place to do business online.

These surveys have been conducted annually since 2016 to understand the views of UK organisations on cyber security. Data are collected on topics including online use; attitudes of organisations to cyber security and awareness of Government initiatives; approaches to cyber security (including investment and processes); incidences and impact of a cyber security breach or attack; and how breaches are dealt with by the organisation. This information helps to inform Government policy towards organisations, including how best to target key messages to businesses and charities so that they are cyber secure (and so that the UK is the safest place in the world to do business online). The study is funded by the DCMS as part of the government's £2.6 billion National Cyber Strategy 2022 to protect and promote the UK in cyber space.

The underlying data are useful for researchers to better understand the response across a range of organisations and for wider comparability over time. The survey originally only covered businesses but was expanded to include charities from the 2018 survey onwards. From 2020, the survey includes a sample of education institutions (primary and secondary schools, further and higher education). Please note that the UK Data Service only holds datasets on each specific year from 2018 onwards.

Cyber Security Breaches Survey: Combined Dataset, 2016-2022 includes data from 2016 to 2022. This is cross-sectional data only and not all variables are included in all years. For longitudinal data, please access the Cyber Security Longitudinal Survey: Wave 1, 2021 (available from the UK Data Archive under SN 8969) and onwards.

Further information and additional publications can be found on the GOV.UK Cyber Security Breaches Survey webpage.
m
Data Breach Notification Reports
mass.gov
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Consumer Affairs and Business Regulation (2025). Data Breach Notification Reports [Dataset]. https://www.mass.gov/lists/data-breach-notification-reports
Explore at:
Dataset updated
May 7, 2025
Dataset authored and provided by
Office of Consumer Affairs and Business Regulation
Area covered
Massachusetts
Description
View Data Breach Notification Reports, which include how many breaches are reported each year and the number of affected residents.
Z
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
data.niaid.nih.gov
zenodo.org
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nurmi, Juha (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
Explore at:
Dataset updated
Jun 20, 2023
Dataset provided by
Niemelä, Mikko
Brumley, Billy
Nurmi, Juha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Enterprises that identified an Internet security breach, by industry and...
open.canada.ca
www150.statcan.gc.ca
+2more
csv, html, xml
Updated Jan 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Enterprises that identified an Internet security breach, by industry and size of enterprise [Dataset]. https://open.canada.ca/data/en/dataset/546b13e4-de44-41b1-a05b-2fede8b8e40f
Explore at:
csv, html, xmlAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Survey of digital technology and Internet use, by enterprises that identified an Internet security breach, North American Industry Classification System (NAICS) and size of enterprise for Canada in 2013.
i
Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...
ieee-dataport.org
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
Explore at:
Dataset updated
Oct 21, 2024
Authors
Mohamad Amar Irsyad Mohd Aminuddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/

Number of data compromises and impacted individuals in U.S. 2005-2024

Explore at:

163 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 23, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

United States

Description

In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

Clear search

Close search

Google apps

Main menu

Number of data compromises and impacted individuals in U.S. 2005-2024

Data Breach Notifications Affecting Washington Residents (Personal...

Global number of breached user accounts Q1 2020-Q3 2024

"Pwned Passwords" Dataset

Data from: A dataset containing S&P500 information security breaches and...

All-time biggest online data breaches 2025

Healthcare Ransomware Dataset

‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

CNSS Jabaroot Leaked Data (April 2025)

Cloud-based Database Security Market Market Research Report 2033

Cloud-based Database Security Market Outlook

Component Analysis

Phishing Websites Dataset

Biggest data breaches in the U.S. 2024, by impact

Cyber-Security-Breaches

Oracle Cloud Infrastructure Breach Data (March 2025)

CISA TTP Articles Data Set

Cyber Security Breaches Survey: Combined Dataset, 2016-2022

Data Breach Notification Reports

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

Enterprises that identified an Internet security breach, by industry and...

Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...

Number of data compromises and impacted individuals in U.S. 2005-2024