Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description Title: Car Hacking: CAN Intrusion Detection Dataset (Mirror) Source: Original dataset curated by the Hacking and Countermeasure Research Lab (HCRL) at Korea University
This dataset captures Controller Area Network (CAN) bus traffic from a real vehicle, simulating both normal driving scenarios and four types of cyberattacks:
Denial-of-Service (DoS)
Fuzzy/Flooding Attacks
Spoofing Attacks (RPM/Gear/Speed gauge manipulation)
Replay Attacks
It is widely used to develop machine learning models for detecting intrusions in automotive communication systems.
Dataset Structure Files:
normal.csv: Benign CAN traffic (2.1M messages).
attack_DoS.csv, attack_Fuzzy.csv, attack_spoofing.csv, attack_replay.csv: Attack-specific logs.
Features:
Timestamp, CAN ID, Data Length Code (DLC), Data (hexadecimal payload).
Label (0 for normal, 1 for attack).
Usage Examples Train ML/DL models (e.g., Random Forest, LSTM) for real-time CAN intrusion detection.
Benchmark automotive cybersecurity solutions.
Study CAN protocol vulnerabilities and attack patterns.
Citation If you use this dataset, cite the original work:
Publication Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.
Seo, Eunbi, Hyun Min Song, and Huy Kang Kim. "GIDS: GAN based Intrusion Detection System for In-Vehicle Network." 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018.'
This Kaggle dataset is a mirror of the original HCRL work.
For reproducibility, include the above citation in publications or projects using this data.
Facebook
TwitterIn 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Facebook
TwitterIn the first half of 2024, the share of health-related U.S. data breaches caused by hacking was ** percent, which marked a *** percent increase from 2023, reaching its highest rate since 2014.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
All the countries across the globe have adapted to means of digital payments and with the increased volume of digital payments, hacking has become a pretty common event wherein the hacker can try to hack your details just with your phone number linked to your bank account. However, there is data with some anonymized variables based on which one can predict that the hack is going to happen.
Your work is to build a predictive model which can identify a pattern in these variables and suggest that a hack is going to happen so that the cybersecurity can somehow stop it before it actually happens. You have to predict the column: "MALICIOUS OFFENSE.
| Column | Description |
|---|---|
| INCIDENT_ID | Unique identifier for an incident log |
| DATE | Date of incident occurence |
| X_1 - X_15 | Anonymized logging parameters |
| MALICIOUS_OFFENSE | [Target] Indicates if the incident was a hack [1: Yes; 0:No] |
Facebook
TwitterObjective: The rapid adoption of health information technology (IT) coupled with growing reports of ransomware, and hacking has made cybersecurity a priority in health care. This study leverages federal data in order to better understand current cybersecurity threats in the context of health IT.
Materials and Methods: Retrospective observational study of all available reported data breaches in the United States from 2013 to 2017, downloaded from a publicly available federal regulatory database.
Results: There were 1512 data breaches affecting 154 415 257 patient records from a heterogeneous distribution of covered entities (P < .001). There were 128 electronic medical record-related breaches of 4 867 920 patient records, while 363 hacking incidents affected 130 702 378 records.
Discussion and Conclusion: Despite making up less than 25% of all breaches, hacking was responsible for nearly 85% of all affected patient records. As medicine becomes increasingly interconnected and ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Introducing the Hacker News Top Stories Cumulative Dataset, a meticulously curated collection of top-ranked articles from the Hacker News community. This dataset offers a comprehensive view of the most discussed and impactful stories, making it an invaluable resource for researchers, data scientists, and enthusiasts interested in analyzing trends in technology, startups, and digital culture.
Key Features:
Comprehensive Coverage: Aggregates top stories over time, providing a historical perspective on trending topics and discussions within the tech community.
Rich Metadata: Each entry includes detailed information such as the story title, URL, author, publication date, score, number of comments, and the date it was featured as a top story.
Consistent Updates: The dataset is updated daily, ensuring users have access to the latest information and trends.
Potential Use Cases:
Trend Analysis: Identify and analyze emerging topics and shifts in the tech industry over time.
Sentiment Analysis: Examine community reactions and sentiments towards specific events or announcements.
Content Recommendation Systems: Develop algorithms to recommend articles based on popularity and user engagement metrics.
Sociological Research: Study the dynamics of online communities and the dissemination of information.
Data Structure:
The dataset is presented in CSV format with the following columns:
story_id: Unique identifier for each story. title: The headline of the story. url: Direct link to the content. author: Username of the individual who submitted the story. created_at: Timestamp of when the story was published. points: Score indicating the story's popularity. num_comments: Number of comments the story received. scrape_date: Date when the story was added to the dataset. Data Source:
All data is sourced from the Hacker News API, ensuring accuracy and reliability.
Licensing and Usage:
This dataset is released under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication, allowing for unrestricted use in both commercial and non-commercial projects.
Get Started:
To access the dataset and explore its potential applications, visit the Hacker News Top Stories Cumulative Dataset.
Stay ahead of the curve by leveraging this dataset to gain insights into the ever-evolving landscape of technology and innovation.
Facebook
Twitterhttps://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Password Hacking Software market has evolved significantly in recent years as both a response to and a driver of increasing cybersecurity threats. This software is primarily utilized by security professionals and ethical hackers to assess the robustness of password security systems, identify vulnerabilities, and
Facebook
TwitterBetween November 2023 and October 2024, organizations in the manufacturing sector worldwide saw around 818 incidents of data breaches caused by hacking. The healthcare industry ranked second, with 745 data breaches in the measured period. Furthermore, hacking caused 564 data breach incidents in the professional sector.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for HackAPrompt 💻🔍
This dataset contains submissions from a prompt hacking competition. An in-depth analysis of the dataset has been accepted at the EMNLP 2023 conference. 📊👾 Submissions were sourced from two environments: a playground for experimentation and an official submissions platform. The playground itself can be accessed here 🎮 More details about the competition itself here 🏆
Dataset Details 📋
Dataset Description 📄
We… See the full description on the dataset page: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Business Context: We are in a time where businesses are more digitally advanced than ever, and as technology improves, organizations’ security postures must be enhanced as well. Failure to do so could result in a costly data breach, as we’ve seen happen with many businesses. The cybercrime landscape has evolved, and threat actors are going after any type of organization, so in order to protect your business’s data, money and reputation, it is critical that you invest in an advanced security system. Cyber security can be described as the collective methods, technologies, and processes to help protect the confidentiality, integrity, and availability of computer systems, networks and data, against cyber-attacks or unauthorized access. a. Information Security vs. Cyber Security vs. Network Security: Information security (also known as InfoSec) ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. Information security differs from cyber security in that InfoSec aims to keep data in any form secure, whereas cyber security protects only digital data. Cyber security, a subset of information security, is the practice of defending your organization’s networks, computers and data from unauthorized digital access, attack or damage by implementing various processes, technologies and practices. With the countless sophisticated threat actors targeting all types of organizations, it is critical that your IT infrastructure is secured at all times to prevent a full-scale attack on your network and risk exposing your company’ data and reputation. Network security, a subset of cyber security, aims to protect any data that is being sent through devices in your network to ensure that the information is not changed or intercepted. The role of network security is to protect the organization’s IT infrastructure from all types of cyber threats including: Viruses, worms and Trojan horses a. Zero-day attacks b. Hacker attacks c. Denial of service attacks d. Spyware and adware Your network security team implements the hardware and software necessary to guard your security architecture. With the proper network security in place, your system can detect emerging threats before they infiltrate your network and compromise your data. There are many components to a network security system that work together to improve your security posture. The most common network security components include: a. Firewalls b. Anti-virus software c. Intrusion detection and prevention systems (IDS/IPS) d. Virtual private networks (VPN) Network Intrusions vs. Computer intrusions vs. Cyber Attacks 1. Computer Intrusions: Computer intrusions occur when someone tries to gain access to any part of your computer system. Computer intruders or hackers typically use automated computer programs when they try to compromise a computer’s security. There are several ways an intruder can try to gain access to your computer. They can Access your a. Computer to view, change, or delete information on your computer, b. Crash or slow down your computer c. Access your private data by examining the files on your system d. Use your computer to access other computers on the Internet. 2. Network Intrusions: A network intrusion refers to any unauthorized activity on a digital network. Network intrusions often involve stealing valuable network resources and almost always jeopardize the security of networks and/or their data. In order to proactively detect and respond to network intrusions, organizations and their cyber security teams need to have a thorough understanding of how network intrusions work and implement network intrusion, detection, and response systems that are designed with attack techniques and cover-up methods in mind. Network Intrusion Attack Techniques: Given the amount of normal activity constantly taking place on digital networks, it can be very difficult to pinpoint anomalies that could indicate a network intrusion has occurred. Below are some of the most common network intrusion attack techniques that organizations should continually look for: Living Off the Land: Attackers increasingly use existing tools and processes and stolen credentials when compromising networks. These tools like operating system utilities, business productivity software and scripting languages are clearly not malware and have very legitimate usage as well. In fact, in most cases, the vast majority of the usage is business justified, allowing an attacker to blend in. Multi-Routing: If a network allows for asymmetric routing, attackers will often leverage multiple routes to access the targeted device or network. This allows them to avoid being detected by having a large portion of suspicious packets bypass certain network segments and any relevant network intrusion systems. Buffer Overwrit...
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
"This report is a collaborative research effort by the cyber security authorities of five nations: Australia, Canada, New Zealand, the UK and USA. In it we highlight the use of five publicly-available tools, which have been used for malicious purposes in recent cyber incidents around the world. To aid the work of network defenders and systems administrators, we also provide advice on limiting the effectiveness of these tools and detecting their use on a network."
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Each row in the dataset represents a specific data breach incident. Here's an explanation of the columns in the dataset:
Number: An identifier for each data breach incident.
Name_of_Covered_Entity: The name of the organization or entity that experienced the data breach.
Business_Associate_Involved: Information about whether a business associate was involved in the breach.
Total_Individuals: The total number of individuals affected by the breach.
Individuals_Affected: The number of individuals whose information was compromised.
Type_of_Breach: The method or nature of the data breach (e.g., theft, loss, hacking/IT incident, unauthorized access/disclosure).
Location_of_Breached_Information: The location or type of device where the breached information was stored (e.g., laptop, desktop computer, network server).
Breach_Start: The start date of the data breach.
Breach_End: The end date of the data breach.
Branch: A categorical identifier, possibly indicating a specific branch or division of the organization.
Department: A categorical identifier, possibly indicating a specific department within the organization.
CountryBranch: The country associated with the branch.
Employee(who find out breach): The employee who discovered the breach.
Employee URL: A URL link associated with the employee who discovered the breach.
Estimate Stole Data(GB): An estimate of the amount of data stolen in gigabytes.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Explore the booming password hacking software market, projected to reach $6.1 billion by 2033. This in-depth analysis covers market size, growth drivers, key players (Hashcat, John the Ripper, Burp Suite), and regional trends. Learn about ethical hacking, penetration testing, and the evolving landscape of cybersecurity.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Explore the booming Ethical Hacking Certification market, driven by rising cyber threats. Discover key insights, market size, CAGR, and trends impacting SMEs and enterprises globally through 2033.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains automotive Controller Area Network (CAN) bus data from three systems: two cars (Opel Astra and Renault Clio) and from a CAN bus prototype we built ourselves. Its purpose is meant to evaluate CAN bus Network Intrusion Detection Systems (NIDS). For each system, the dataset consists in a collection of log files captured from its CAN bus: normal (attack-free) data for training and testing detection algorithms, and different CAN bus attacks (Diagnostic, Fuzzing attacks, Replay attack, Suspension attack and Denial-of-Service attack).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a compilation of data from various sources detailing data breaches. These sources include press reports, government news releases, and mainstream news articles. The list includes those involving the theft or compromise of 30,000 or more records, although many smaller breaches occur continually. In addition, the various methods used in the breaches are listed, with hacking being the most common.
Organizations of all types and sizes are susceptible to data breaches, which can have devastating consequences. This dataset can help shed light on which organizations are most at risk and how these breaches occur so that steps can be taken to prevent them in the future
There are many ways to use this dataset. Here are a few ideas:
- Use the data to understand which types of organizations are most commonly breached, and what methods are used most often.
- Analyze the data to see if there are any trends or patterns in when or how breaches occur.
- Use the data to create a visualizations or infographic showing the prevalence of data breaches
This dataset can be used to identify trends in data breaches in terms of methods used, types of organizations breached, and geographical distribution.
This dataset can be used to study the effect of data breaches on organizational reputation and customer trust.
This dataset can be used by organizations to benchmark their own security measures against those of similar organizations that have experienced data breaches
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: df_1.csv | Column name | Description | |:----------------------|:---------------------------------------------------------------------| | Entity | The name of the organization that was breached. (String) | | Year | The year when the breach occurred. (Integer) | | Records | The number of records that were compromised in the breach. (Integer) | | Organization type | The type of organization that was breached. (String) | | Method | The method that was used to breach the organization. (String) | | Sources | The sources from which the data was collected. (String) |
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Ethical Hacking Service market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
1. MalwareInfectionSet
We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
2. VictimAccessSet
We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
3. AccountAccessSet
The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Facebook
TwitterDuring the third quarter of 2025, data breaches exposed more than ** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the third quarter of ****, more than **** billion data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description Title: Car Hacking: CAN Intrusion Detection Dataset (Mirror) Source: Original dataset curated by the Hacking and Countermeasure Research Lab (HCRL) at Korea University
This dataset captures Controller Area Network (CAN) bus traffic from a real vehicle, simulating both normal driving scenarios and four types of cyberattacks:
Denial-of-Service (DoS)
Fuzzy/Flooding Attacks
Spoofing Attacks (RPM/Gear/Speed gauge manipulation)
Replay Attacks
It is widely used to develop machine learning models for detecting intrusions in automotive communication systems.
Dataset Structure Files:
normal.csv: Benign CAN traffic (2.1M messages).
attack_DoS.csv, attack_Fuzzy.csv, attack_spoofing.csv, attack_replay.csv: Attack-specific logs.
Features:
Timestamp, CAN ID, Data Length Code (DLC), Data (hexadecimal payload).
Label (0 for normal, 1 for attack).
Usage Examples Train ML/DL models (e.g., Random Forest, LSTM) for real-time CAN intrusion detection.
Benchmark automotive cybersecurity solutions.
Study CAN protocol vulnerabilities and attack patterns.
Citation If you use this dataset, cite the original work:
Publication Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.
Seo, Eunbi, Hyun Min Song, and Huy Kang Kim. "GIDS: GAN based Intrusion Detection System for In-Vehicle Network." 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018.'
This Kaggle dataset is a mirror of the original HCRL work.
For reproducibility, include the above citation in publications or projects using this data.