100+ datasets found

Car-Hacking Dataset
kaggle.com
zip
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav J (2025). Car-Hacking Dataset [Dataset]. https://www.kaggle.com/datasets/pranavjha24/car-hacking-dataset
Explore at:
zip(137852368 bytes)Available download formats
Dataset updated
Feb 3, 2025
Authors
Pranav J
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Description Title: Car Hacking: CAN Intrusion Detection Dataset (Mirror) Source: Original dataset curated by the Hacking and Countermeasure Research Lab (HCRL) at Korea University

This dataset captures Controller Area Network (CAN) bus traffic from a real vehicle, simulating both normal driving scenarios and four types of cyberattacks:

Denial-of-Service (DoS)

Fuzzy/Flooding Attacks

Spoofing Attacks (RPM/Gear/Speed gauge manipulation)

Replay Attacks

It is widely used to develop machine learning models for detecting intrusions in automotive communication systems.

Dataset Structure Files:

normal.csv: Benign CAN traffic (2.1M messages).

attack_DoS.csv, attack_Fuzzy.csv, attack_spoofing.csv, attack_replay.csv: Attack-specific logs.

Features:

Timestamp, CAN ID, Data Length Code (DLC), Data (hexadecimal payload).

Label (0 for normal, 1 for attack).

Usage Examples Train ML/DL models (e.g., Random Forest, LSTM) for real-time CAN intrusion detection.

Benchmark automotive cybersecurity solutions.

Study CAN protocol vulnerabilities and attack patterns.

Citation If you use this dataset, cite the original work:

Publication Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.

Seo, Eunbi, Hyun Min Song, and Huy Kang Kim. "GIDS: GAN based Intrusion Detection System for In-Vehicle Network." 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018.'

This Kaggle dataset is a mirror of the original HCRL work.

For reproducibility, include the above citation in publications or projects using this data.
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
U.S. health data breaches caused by hacking 2014 - H1 2024
statista.com
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). U.S. health data breaches caused by hacking 2014 - H1 2024 [Dataset]. https://www.statista.com/statistics/972228/health-data-breaches-caused-by-hacking-us/
Explore at:
Dataset updated
Nov 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In the first half of 2024, the share of health-related U.S. data breaches caused by hacking was ** percent, which marked a *** percent increase from 2023, reaching its highest rate since 2014.

Malicious Server Hack

kaggle.com

zip

Updated Oct 10, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

LPLenka (2020). Malicious Server Hack [Dataset]. https://www.kaggle.com/lplenka/malicious-server-hack

Explore at:

zip(596537 bytes)Available download formats

Dataset updated

Oct 10, 2020

Authors

LPLenka

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

All the countries across the globe have adapted to means of digital payments and with the increased volume of digital payments, hacking has become a pretty common event wherein the hacker can try to hack your details just with your phone number linked to your bank account. However, there is data with some anonymized variables based on which one can predict that the hack is going to happen.

Your work is to build a predictive model which can identify a pattern in these variables and suggest that a hack is going to happen so that the cybersecurity can somehow stop it before it actually happens. You have to predict the column: "MALICIOUS OFFENSE.

Content

Column	Description
INCIDENT_ID	Unique identifier for an incident log
DATE	Date of incident occurence
X_1 - X_15	Anonymized logging parameters
MALICIOUS_OFFENSE	[Target] Indicates if the incident was a hack [1: Yes; 0:No]

d
Data from: Health IT, hacking, and cybersecurity: national trends in data...
search.dataone.org
datadryad.org
Updated Apr 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jay G. Ronquillo; J. Erik Winterholler; Kamil Cwikla; Raphael Szymanski; Christopher Levy (2025). Health IT, hacking, and cybersecurity: national trends in data breaches of protected health information [Dataset]. http://doi.org/10.5061/dryad.24275c6
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.24275c6
Dataset updated
Apr 6, 2025
Dataset provided by
Dryad Digital Repository
Authors
Jay G. Ronquillo; J. Erik Winterholler; Kamil Cwikla; Raphael Szymanski; Christopher Levy
Time period covered
May 25, 2019
Description
Objective: The rapid adoption of health information technology (IT) coupled with growing reports of ransomware, and hacking has made cybersecurity a priority in health care. This study leverages federal data in order to better understand current cybersecurity threats in the context of health IT.

Materials and Methods: Retrospective observational study of all available reported data breaches in the United States from 2013 to 2017, downloaded from a publicly available federal regulatory database.

Results: There were 1512 data breaches affecting 154 415 257 patient records from a heterogeneous distribution of covered entities (Pâ€‰<â€‰.001). There were 128 electronic medical record-related breaches of 4 867 920 patient records, while 363 hacking incidents affected 130 702 378 records.

Discussion and Conclusion: Despite making up less than 25% of all breaches, hacking was responsible for nearly 85% of all affected patient records. As medicine becomes increasingly interconnected and ...
Hacker News Cumulative Data
kaggle.com
zip
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NexoCodAI (2025). Hacker News Cumulative Data [Dataset]. https://www.kaggle.com/datasets/nexocodai/automated-hacker-news-dataset-cumulative
Explore at:
zip(175826 bytes)Available download formats
Dataset updated
May 7, 2025
Authors
NexoCodAI
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introducing the Hacker News Top Stories Cumulative Dataset, a meticulously curated collection of top-ranked articles from the Hacker News community. This dataset offers a comprehensive view of the most discussed and impactful stories, making it an invaluable resource for researchers, data scientists, and enthusiasts interested in analyzing trends in technology, startups, and digital culture.

Key Features:

Comprehensive Coverage: Aggregates top stories over time, providing a historical perspective on trending topics and discussions within the tech community.

Rich Metadata: Each entry includes detailed information such as the story title, URL, author, publication date, score, number of comments, and the date it was featured as a top story.

Consistent Updates: The dataset is updated daily, ensuring users have access to the latest information and trends.

Potential Use Cases:

Trend Analysis: Identify and analyze emerging topics and shifts in the tech industry over time.

Sentiment Analysis: Examine community reactions and sentiments towards specific events or announcements.

Content Recommendation Systems: Develop algorithms to recommend articles based on popularity and user engagement metrics.

Sociological Research: Study the dynamics of online communities and the dissemination of information.

Data Structure:

The dataset is presented in CSV format with the following columns:

story_id: Unique identifier for each story. title: The headline of the story. url: Direct link to the content. author: Username of the individual who submitted the story. created_at: Timestamp of when the story was published. points: Score indicating the story's popularity. num_comments: Number of comments the story received. scrape_date: Date when the story was added to the dataset. Data Source:

All data is sourced from the Hacker News API, ensuring accuracy and reliability.

Licensing and Usage:

This dataset is released under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication, allowing for unrestricted use in both commercial and non-commercial projects.

Get Started:

To access the dataset and explore its potential applications, visit the Hacker News Top Stories Cumulative Dataset.

Stay ahead of the curve by leveraging this dataset to gain insights into the ever-evolving landscape of technology and innovation.
S
Global Password Hacking Software Market Risk Analysis 2025-2032
statsndata.org
excel, pdf
Updated Oct 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Password Hacking Software Market Risk Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/password-hacking-software-market-59196
Explore at:
pdf, excelAvailable download formats
Dataset updated
Oct 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Password Hacking Software market has evolved significantly in recent years as both a response to and a driver of increasing cybersecurity threats. This software is primarily utilized by security professionals and ethical hackers to assess the robustness of password security systems, identify vulnerabilities, and
Global data breaches caused by hacking 2023-2024, by industry
statista.com
Updated Sep 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global data breaches caused by hacking 2023-2024, by industry [Dataset]. https://www.statista.com/statistics/1419277/data-breaches-hacking-by-industry/
Explore at:
Dataset updated
Sep 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 1, 2023 - Oct 31, 2024
Area covered
Worldwide
Description
Between November 2023 and October 2024, organizations in the manufacturing sector worldwide saw around 818 incidents of data breaches caused by hacking. The healthcare industry ranked second, with 745 data breaches in the measured period. Furthermore, hacking caused 564 data breach incidents in the professional sector.
h
hackaprompt-dataset
huggingface.co
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hackaprompt (2023). hackaprompt-dataset [Dataset]. https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset
Explore at:
Dataset updated
Oct 23, 2023
Dataset authored and provided by
hackaprompt
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for HackAPrompt 💻🔍

This dataset contains submissions from a prompt hacking competition. An in-depth analysis of the dataset has been accepted at the EMNLP 2023 conference. 📊👾 Submissions were sourced from two environments: a playground for experimentation and an official submissions platform. The playground itself can be accessed here 🎮 More details about the competition itself here 🏆

Dataset Details 📋 Dataset Description 📄

We… See the full description on the dataset page: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset.
Cyber Security
kaggle.com
zip
Updated Jan 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishi Kumar (2024). Cyber Security [Dataset]. https://www.kaggle.com/datasets/rishikumarrajvansh/cyber-security
Explore at:
zip(8913512 bytes)Available download formats
Dataset updated
Jan 29, 2024
Authors
Rishi Kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Business Context: We are in a time where businesses are more digitally advanced than ever, and as technology improves, organizations’ security postures must be enhanced as well. Failure to do so could result in a costly data breach, as we’ve seen happen with many businesses. The cybercrime landscape has evolved, and threat actors are going after any type of organization, so in order to protect your business’s data, money and reputation, it is critical that you invest in an advanced security system. Cyber security can be described as the collective methods, technologies, and processes to help protect the confidentiality, integrity, and availability of computer systems, networks and data, against cyber-attacks or unauthorized access. a. Information Security vs. Cyber Security vs. Network Security: Information security (also known as InfoSec) ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. Information security differs from cyber security in that InfoSec aims to keep data in any form secure, whereas cyber security protects only digital data. Cyber security, a subset of information security, is the practice of defending your organization’s networks, computers and data from unauthorized digital access, attack or damage by implementing various processes, technologies and practices. With the countless sophisticated threat actors targeting all types of organizations, it is critical that your IT infrastructure is secured at all times to prevent a full-scale attack on your network and risk exposing your company’ data and reputation. Network security, a subset of cyber security, aims to protect any data that is being sent through devices in your network to ensure that the information is not changed or intercepted. The role of network security is to protect the organization’s IT infrastructure from all types of cyber threats including: Viruses, worms and Trojan horses a. Zero-day attacks b. Hacker attacks c. Denial of service attacks d. Spyware and adware Your network security team implements the hardware and software necessary to guard your security architecture. With the proper network security in place, your system can detect emerging threats before they infiltrate your network and compromise your data. There are many components to a network security system that work together to improve your security posture. The most common network security components include: a. Firewalls b. Anti-virus software c. Intrusion detection and prevention systems (IDS/IPS) d. Virtual private networks (VPN) Network Intrusions vs. Computer intrusions vs. Cyber Attacks 1. Computer Intrusions: Computer intrusions occur when someone tries to gain access to any part of your computer system. Computer intruders or hackers typically use automated computer programs when they try to compromise a computer’s security. There are several ways an intruder can try to gain access to your computer. They can Access your a. Computer to view, change, or delete information on your computer, b. Crash or slow down your computer c. Access your private data by examining the files on your system d. Use your computer to access other computers on the Internet. 2. Network Intrusions: A network intrusion refers to any unauthorized activity on a digital network. Network intrusions often involve stealing valuable network resources and almost always jeopardize the security of networks and/or their data. In order to proactively detect and respond to network intrusions, organizations and their cyber security teams need to have a thorough understanding of how network intrusions work and implement network intrusion, detection, and response systems that are designed with attack techniques and cover-up methods in mind. Network Intrusion Attack Techniques: Given the amount of normal activity constantly taking place on digital networks, it can be very difficult to pinpoint anomalies that could indicate a network intrusion has occurred. Below are some of the most common network intrusion attack techniques that organizations should continually look for: Living Off the Land: Attackers increasingly use existing tools and processes and stolen credentials when compromising networks. These tools like operating system utilities, business productivity software and scripting languages are clearly not malware and have very legitimate usage as well. In fact, in most cases, the vast majority of the usage is business justified, allowing an attacker to blend in. Multi-Routing: If a network allows for asymmetric routing, attackers will often leverage multiple routes to access the targeted device or network. This allows them to avoid being detected by having a large portion of suspicious packets bypass certain network segments and any relevant network intrusion systems. Buffer Overwrit...
u
Joint Report On Publicly Available Hacking Tools - Catalogue - Canadian...
data.urbandatacentre.ca
Updated Oct 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Joint Report On Publicly Available Hacking Tools - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-b83288ca-8c65-431f-b85f-ffa26f975d45
Explore at:
Dataset updated
Oct 19, 2025
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
"This report is a collaborative research effort by the cyber security authorities of five nations: Australia, Canada, New Zealand, the UK and USA. In it we highlight the use of five publicly-available tools, which have been used for malicious purposes in recent cyber incidents around the world. To aid the work of network defenders and systems administrators, we also provide advice on limiting the effectiveness of these tools and detecting their use on a network."
Cybersecurity Breaches Information 2010-2023
kaggle.com
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hello_world 3112 (2023). Cybersecurity Breaches Information 2010-2023 [Dataset]. https://www.kaggle.com/datasets/sumanth3112/hello-world
Explore at:
zip(43549 bytes)Available download formats
Dataset updated
Dec 11, 2023
Authors
Hello_world 3112
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Each row in the dataset represents a specific data breach incident. Here's an explanation of the columns in the dataset:

Number: An identifier for each data breach incident.

Name_of_Covered_Entity: The name of the organization or entity that experienced the data breach.

Business_Associate_Involved: Information about whether a business associate was involved in the breach.

Total_Individuals: The total number of individuals affected by the breach.

Individuals_Affected: The number of individuals whose information was compromised.

Type_of_Breach: The method or nature of the data breach (e.g., theft, loss, hacking/IT incident, unauthorized access/disclosure).

Location_of_Breached_Information: The location or type of device where the breached information was stored (e.g., laptop, desktop computer, network server).

Breach_Start: The start date of the data breach.

Breach_End: The end date of the data breach.

Branch: A categorical identifier, possibly indicating a specific branch or division of the organization.

Department: A categorical identifier, possibly indicating a specific department within the organization.

CountryBranch: The country associated with the branch.

Employee(who find out breach): The employee who discovered the breach.

Employee URL: A URL link associated with the employee who discovered the breach.

Estimate Stole Data(GB): An estimate of the amount of data stolen in gigabytes.
P
Password Hacking Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Password Hacking Software Report [Dataset]. https://www.datainsightsmarket.com/reports/password-hacking-software-1982234
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Explore the booming password hacking software market, projected to reach $6.1 billion by 2033. This in-depth analysis covers market size, growth drivers, key players (Hashcat, John the Ripper, Burp Suite), and regional trends. Learn about ethical hacking, penetration testing, and the evolving landscape of cybersecurity.
E
Ethical Hacking Certification Report
datainsightsmarket.com
doc, pdf, ppt
Updated Sep 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Ethical Hacking Certification Report [Dataset]. https://www.datainsightsmarket.com/reports/ethical-hacking-certification-1988933
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Sep 28, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Explore the booming Ethical Hacking Certification market, driven by rising cyber threats. Discover key insights, market size, CAGR, and trends impacting SMEs and enterprises globally through 2033.
f
Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2
figshare.com
data.4tu.nl
zip
Updated Jul 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Dupont; Alexios Lekidis; J. (Jerry) den Hartog; S. (Sandro) Etalle (2020). Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2 [Dataset]. http://doi.org/10.4121/uuid:b74b4928-c377-4585-9432-2004dfa20a5d
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:b74b4928-c377-4585-9432-2004dfa20a5d
Dataset updated
Jul 28, 2020
Dataset provided by
4TU.ResearchData
Authors
Guillaume Dupont; Alexios Lekidis; J. (Jerry) den Hartog; S. (Sandro) Etalle
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset contains automotive Controller Area Network (CAN) bus data from three systems: two cars (Opel Astra and Renault Clio) and from a CAN bus prototype we built ourselves. Its purpose is meant to evaluate CAN bus Network Intrusion Detection Systems (NIDS). For each system, the dataset consists in a collection of log files captured from its CAN bus: normal (attack-free) data for training and testing detection algorithms, and different CAN bus attacks (Diagnostic, Fuzzing attacks, Replay attack, Suspension attack and Denial-of-Service attack).
n
Data from: The extent and consequences of p-hacking in science
data-staging.niaid.nih.gov
researchdata.edu.au
+3more
zip
Updated Feb 24, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan L. Head; Luke Holman; Rob Lanfear; Andrew T. Kahn; Michael D. Jennions (2016). The extent and consequences of p-hacking in science [Dataset]. http://doi.org/10.5061/dryad.79d43
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.79d43
Dataset updated
Feb 24, 2016
Dataset provided by
Australian National University
Authors
Megan L. Head; Luke Holman; Rob Lanfear; Andrew T. Kahn; Michael D. Jennions
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.
Data Breaches
kaggle.com
zip
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Data Breaches [Dataset]. https://www.kaggle.com/datasets/thedevastator/data-breaches-a-comprehensive-list/discussion
Explore at:
zip(9067 bytes)Available download formats
Dataset updated
Nov 10, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Data Breaches Dataset

30,000 Records of cyber-security data breaches

About this dataset

This dataset is a compilation of data from various sources detailing data breaches. These sources include press reports, government news releases, and mainstream news articles. The list includes those involving the theft or compromise of 30,000 or more records, although many smaller breaches occur continually. In addition, the various methods used in the breaches are listed, with hacking being the most common.

Organizations of all types and sizes are susceptible to data breaches, which can have devastating consequences. This dataset can help shed light on which organizations are most at risk and how these breaches occur so that steps can be taken to prevent them in the future

How to use the dataset

There are many ways to use this dataset. Here are a few ideas:

Use the data to understand which types of organizations are most commonly breached, and what methods are used most often.

Analyze the data to see if there are any trends or patterns in when or how breaches occur.

Use the data to create a visualizations or infographic showing the prevalence of data breaches

Research Ideas

This dataset can be used to identify trends in data breaches in terms of methods used, types of organizations breached, and geographical distribution.

This dataset can be used to study the effect of data breaches on organizational reputation and customer trust.

This dataset can be used by organizations to benchmark their own security measures against those of similar organizations that have experienced data breaches

Acknowledgements

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:----------------------|:---------------------------------------------------------------------| | Entity | The name of the organization that was breached. (String) | | Year | The year when the breach occurred. (Integer) | | Records | The number of records that were compromised in the breach. (Integer) | | Organization type | The type of organization that was breached. (String) | | Method | The method that was used to breach the organization. (String) | | Sources | The sources from which the data was collected. (String) |
E
Ethical Hacking Service Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Ethical Hacking Service Report [Dataset]. https://www.datainsightsmarket.com/reports/ethical-hacking-service-1966542
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Ethical Hacking Service market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juha Nurmi; Juha Nurmi; Mikko Niemelä; Mikko Niemelä; Billy Brumley; Billy Brumley (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. http://doi.org/10.5281/zenodo.8047205
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8047205
Dataset updated
Jun 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juha Nurmi; Juha Nurmi; Mikko Niemelä; Mikko Niemelä; Billy Brumley; Billy Brumley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

1. MalwareInfectionSet
We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

2. VictimAccessSet
We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

3. AccountAccessSet
The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Global number of breached user accounts Q1 2020-Q3 2025
statista.com
Updated Oct 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global number of breached user accounts Q1 2020-Q3 2025 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
Explore at:
Dataset updated
Oct 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the third quarter of 2025, data breaches exposed more than ** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the third quarter of ****, more than **** billion data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.

Facebook

Twitter

Click to copy link

Link copied

Cite

Pranav J (2025). Car-Hacking Dataset [Dataset]. https://www.kaggle.com/datasets/pranavjha24/car-hacking-dataset

Car-Hacking Dataset

Car-Hacking Dataset for the intrusion detection

Explore at:

zip(137852368 bytes)Available download formats

Dataset updated

Feb 3, 2025

Authors

Pranav J

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Description Title: Car Hacking: CAN Intrusion Detection Dataset (Mirror) Source: Original dataset curated by the Hacking and Countermeasure Research Lab (HCRL) at Korea University

This dataset captures Controller Area Network (CAN) bus traffic from a real vehicle, simulating both normal driving scenarios and four types of cyberattacks:

Denial-of-Service (DoS)

Fuzzy/Flooding Attacks

Spoofing Attacks (RPM/Gear/Speed gauge manipulation)

Replay Attacks

It is widely used to develop machine learning models for detecting intrusions in automotive communication systems.

Dataset Structure Files:

normal.csv: Benign CAN traffic (2.1M messages).

attack_DoS.csv, attack_Fuzzy.csv, attack_spoofing.csv, attack_replay.csv: Attack-specific logs.

Features:

Timestamp, CAN ID, Data Length Code (DLC), Data (hexadecimal payload).

Label (0 for normal, 1 for attack).

Usage Examples Train ML/DL models (e.g., Random Forest, LSTM) for real-time CAN intrusion detection.

Benchmark automotive cybersecurity solutions.

Study CAN protocol vulnerabilities and attack patterns.

Citation If you use this dataset, cite the original work:

Publication Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.

Seo, Eunbi, Hyun Min Song, and Huy Kang Kim. "GIDS: GAN based Intrusion Detection System for In-Vehicle Network." 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018.'

This Kaggle dataset is a mirror of the original HCRL work.

For reproducibility, include the above citation in publications or projects using this data.

Clear search

Close search

Google apps

Main menu

Car-Hacking Dataset

Number of data compromises and impacted individuals in U.S. 2005-2024

U.S. health data breaches caused by hacking 2014 - H1 2024

Malicious Server Hack

Context

Content

Data from: Health IT, hacking, and cybersecurity: national trends in data...

Hacker News Cumulative Data

Global Password Hacking Software Market Risk Analysis 2025-2032

Global data breaches caused by hacking 2023-2024, by industry

hackaprompt-dataset

Cyber Security

Joint Report On Publicly Available Hacking Tools - Catalogue - Canadian...

Cybersecurity Breaches Information 2010-2023

Password Hacking Software Report

Ethical Hacking Certification Report

Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2

Data from: The extent and consequences of p-hacking in science

Data Breaches

Data Breaches Dataset

30,000 Records of cyber-security data breaches

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Ethical Hacking Service Report

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

Global number of breached user accounts Q1 2020-Q3 2025

Car-Hacking Dataset

Car-Hacking Dataset for the intrusion detection