100+ datasets found
  1. Car-Hacking Dataset

    • kaggle.com
    zip
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav J (2025). Car-Hacking Dataset [Dataset]. https://www.kaggle.com/datasets/pranavjha24/car-hacking-dataset
    Explore at:
    zip(137852368 bytes)Available download formats
    Dataset updated
    Feb 3, 2025
    Authors
    Pranav J
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description Title: Car Hacking: CAN Intrusion Detection Dataset (Mirror) Source: Original dataset curated by the Hacking and Countermeasure Research Lab (HCRL) at Korea University

    This dataset captures Controller Area Network (CAN) bus traffic from a real vehicle, simulating both normal driving scenarios and four types of cyberattacks:

    Denial-of-Service (DoS)

    Fuzzy/Flooding Attacks

    Spoofing Attacks (RPM/Gear/Speed gauge manipulation)

    Replay Attacks

    It is widely used to develop machine learning models for detecting intrusions in automotive communication systems.

    Dataset Structure Files:

    normal.csv: Benign CAN traffic (2.1M messages).

    attack_DoS.csv, attack_Fuzzy.csv, attack_spoofing.csv, attack_replay.csv: Attack-specific logs.

    Features:

    Timestamp, CAN ID, Data Length Code (DLC), Data (hexadecimal payload).

    Label (0 for normal, 1 for attack).

    Usage Examples Train ML/DL models (e.g., Random Forest, LSTM) for real-time CAN intrusion detection.

    Benchmark automotive cybersecurity solutions.

    Study CAN protocol vulnerabilities and attack patterns.

    Citation If you use this dataset, cite the original work:

    Publication Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.

    Seo, Eunbi, Hyun Min Song, and Huy Kang Kim. "GIDS: GAN based Intrusion Detection System for In-Vehicle Network." 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018.'

    This Kaggle dataset is a mirror of the original HCRL work.

    For reproducibility, include the above citation in publications or projects using this data.

  2. Number of data compromises and impacted individuals in U.S. 2005-2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

  3. U.S. health data breaches caused by hacking 2014 - H1 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). U.S. health data breaches caused by hacking 2014 - H1 2024 [Dataset]. https://www.statista.com/statistics/972228/health-data-breaches-caused-by-hacking-us/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In the first half of 2024, the share of health-related U.S. data breaches caused by hacking was ** percent, which marked a *** percent increase from 2023, reaching its highest rate since 2014.

  4. Malicious Server Hack

    • kaggle.com
    zip
    Updated Oct 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LPLenka (2020). Malicious Server Hack [Dataset]. https://www.kaggle.com/lplenka/malicious-server-hack
    Explore at:
    zip(596537 bytes)Available download formats
    Dataset updated
    Oct 10, 2020
    Authors
    LPLenka
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    All the countries across the globe have adapted to means of digital payments and with the increased volume of digital payments, hacking has become a pretty common event wherein the hacker can try to hack your details just with your phone number linked to your bank account. However, there is data with some anonymized variables based on which one can predict that the hack is going to happen.

    Your work is to build a predictive model which can identify a pattern in these variables and suggest that a hack is going to happen so that the cybersecurity can somehow stop it before it actually happens. You have to predict the column: "MALICIOUS OFFENSE.

    Content

    ColumnDescription
    INCIDENT_IDUnique identifier for an incident log
    DATEDate of incident occurence
    X_1 - X_15Anonymized logging parameters
    MALICIOUS_OFFENSE[Target] Indicates if the incident was a hack [1: Yes; 0:No]
  5. d

    Data from: Health IT, hacking, and cybersecurity: national trends in data...

    • search.dataone.org
    • datadryad.org
    Updated Apr 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jay G. Ronquillo; J. Erik Winterholler; Kamil Cwikla; Raphael Szymanski; Christopher Levy (2025). Health IT, hacking, and cybersecurity: national trends in data breaches of protected health information [Dataset]. http://doi.org/10.5061/dryad.24275c6
    Explore at:
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Jay G. Ronquillo; J. Erik Winterholler; Kamil Cwikla; Raphael Szymanski; Christopher Levy
    Time period covered
    May 25, 2019
    Description

    Objective: The rapid adoption of health information technology (IT) coupled with growing reports of ransomware, and hacking has made cybersecurity a priority in health care. This study leverages federal data in order to better understand current cybersecurity threats in the context of health IT.

    Materials and Methods: Retrospective observational study of all available reported data breaches in the United States from 2013 to 2017, downloaded from a publicly available federal regulatory database.

    Results: There were 1512 data breaches affecting 154 415 257 patient records from a heterogeneous distribution of covered entities (P < .001). There were 128 electronic medical record-related breaches of 4 867 920 patient records, while 363 hacking incidents affected 130 702 378 records.

    Discussion and Conclusion: Despite making up less than 25% of all breaches, hacking was responsible for nearly 85% of all affected patient records. As medicine becomes increasingly interconnected and ...

  6. Hacker News Cumulative Data

    • kaggle.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NexoCodAI (2025). Hacker News Cumulative Data [Dataset]. https://www.kaggle.com/datasets/nexocodai/automated-hacker-news-dataset-cumulative
    Explore at:
    zip(175826 bytes)Available download formats
    Dataset updated
    May 7, 2025
    Authors
    NexoCodAI
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introducing the Hacker News Top Stories Cumulative Dataset, a meticulously curated collection of top-ranked articles from the Hacker News community. This dataset offers a comprehensive view of the most discussed and impactful stories, making it an invaluable resource for researchers, data scientists, and enthusiasts interested in analyzing trends in technology, startups, and digital culture.

    Key Features:

    Comprehensive Coverage: Aggregates top stories over time, providing a historical perspective on trending topics and discussions within the tech community.

    Rich Metadata: Each entry includes detailed information such as the story title, URL, author, publication date, score, number of comments, and the date it was featured as a top story.

    Consistent Updates: The dataset is updated daily, ensuring users have access to the latest information and trends.

    Potential Use Cases:

    Trend Analysis: Identify and analyze emerging topics and shifts in the tech industry over time.

    Sentiment Analysis: Examine community reactions and sentiments towards specific events or announcements.

    Content Recommendation Systems: Develop algorithms to recommend articles based on popularity and user engagement metrics.

    Sociological Research: Study the dynamics of online communities and the dissemination of information.

    Data Structure:

    The dataset is presented in CSV format with the following columns:

    story_id: Unique identifier for each story. title: The headline of the story. url: Direct link to the content. author: Username of the individual who submitted the story. created_at: Timestamp of when the story was published. points: Score indicating the story's popularity. num_comments: Number of comments the story received. scrape_date: Date when the story was added to the dataset. Data Source:

    All data is sourced from the Hacker News API, ensuring accuracy and reliability.

    Licensing and Usage:

    This dataset is released under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication, allowing for unrestricted use in both commercial and non-commercial projects.

    Get Started:

    To access the dataset and explore its potential applications, visit the Hacker News Top Stories Cumulative Dataset.

    Stay ahead of the curve by leveraging this dataset to gain insights into the ever-evolving landscape of technology and innovation.

  7. S

    Global Password Hacking Software Market Risk Analysis 2025-2032

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Password Hacking Software Market Risk Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/password-hacking-software-market-59196
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Password Hacking Software market has evolved significantly in recent years as both a response to and a driver of increasing cybersecurity threats. This software is primarily utilized by security professionals and ethical hackers to assess the robustness of password security systems, identify vulnerabilities, and

  8. Global data breaches caused by hacking 2023-2024, by industry

    • statista.com
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global data breaches caused by hacking 2023-2024, by industry [Dataset]. https://www.statista.com/statistics/1419277/data-breaches-hacking-by-industry/
    Explore at:
    Dataset updated
    Sep 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 1, 2023 - Oct 31, 2024
    Area covered
    Worldwide
    Description

    Between November 2023 and October 2024, organizations in the manufacturing sector worldwide saw around 818 incidents of data breaches caused by hacking. The healthcare industry ranked second, with 745 data breaches in the measured period. Furthermore, hacking caused 564 data breach incidents in the professional sector.

  9. h

    hackaprompt-dataset

    • huggingface.co
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hackaprompt (2023). hackaprompt-dataset [Dataset]. https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset
    Explore at:
    Dataset updated
    Oct 23, 2023
    Dataset authored and provided by
    hackaprompt
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for HackAPrompt 💻🔍

    This dataset contains submissions from a prompt hacking competition. An in-depth analysis of the dataset has been accepted at the EMNLP 2023 conference. 📊👾 Submissions were sourced from two environments: a playground for experimentation and an official submissions platform. The playground itself can be accessed here 🎮 More details about the competition itself here 🏆

      Dataset Details 📋
    
    
    
    
    
    
    
      Dataset Description 📄
    

    We… See the full description on the dataset page: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset.

  10. Cyber Security

    • kaggle.com
    zip
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi Kumar (2024). Cyber Security [Dataset]. https://www.kaggle.com/datasets/rishikumarrajvansh/cyber-security
    Explore at:
    zip(8913512 bytes)Available download formats
    Dataset updated
    Jan 29, 2024
    Authors
    Rishi Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Business Context: We are in a time where businesses are more digitally advanced than ever, and as technology improves, organizations’ security postures must be enhanced as well. Failure to do so could result in a costly data breach, as we’ve seen happen with many businesses. The cybercrime landscape has evolved, and threat actors are going after any type of organization, so in order to protect your business’s data, money and reputation, it is critical that you invest in an advanced security system. Cyber security can be described as the collective methods, technologies, and processes to help protect the confidentiality, integrity, and availability of computer systems, networks and data, against cyber-attacks or unauthorized access. a. Information Security vs. Cyber Security vs. Network Security: Information security (also known as InfoSec) ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. Information security differs from cyber security in that InfoSec aims to keep data in any form secure, whereas cyber security protects only digital data. Cyber security, a subset of information security, is the practice of defending your organization’s networks, computers and data from unauthorized digital access, attack or damage by implementing various processes, technologies and practices. With the countless sophisticated threat actors targeting all types of organizations, it is critical that your IT infrastructure is secured at all times to prevent a full-scale attack on your network and risk exposing your company’ data and reputation. Network security, a subset of cyber security, aims to protect any data that is being sent through devices in your network to ensure that the information is not changed or intercepted. The role of network security is to protect the organization’s IT infrastructure from all types of cyber threats including: Viruses, worms and Trojan horses a. Zero-day attacks b. Hacker attacks c. Denial of service attacks d. Spyware and adware Your network security team implements the hardware and software necessary to guard your security architecture. With the proper network security in place, your system can detect emerging threats before they infiltrate your network and compromise your data. There are many components to a network security system that work together to improve your security posture. The most common network security components include: a. Firewalls b. Anti-virus software c. Intrusion detection and prevention systems (IDS/IPS) d. Virtual private networks (VPN) Network Intrusions vs. Computer intrusions vs. Cyber Attacks 1. Computer Intrusions: Computer intrusions occur when someone tries to gain access to any part of your computer system. Computer intruders or hackers typically use automated computer programs when they try to compromise a computer’s security. There are several ways an intruder can try to gain access to your computer. They can Access your a. Computer to view, change, or delete information on your computer, b. Crash or slow down your computer c. Access your private data by examining the files on your system d. Use your computer to access other computers on the Internet. 2. Network Intrusions: A network intrusion refers to any unauthorized activity on a digital network. Network intrusions often involve stealing valuable network resources and almost always jeopardize the security of networks and/or their data. In order to proactively detect and respond to network intrusions, organizations and their cyber security teams need to have a thorough understanding of how network intrusions work and implement network intrusion, detection, and response systems that are designed with attack techniques and cover-up methods in mind. Network Intrusion Attack Techniques: Given the amount of normal activity constantly taking place on digital networks, it can be very difficult to pinpoint anomalies that could indicate a network intrusion has occurred. Below are some of the most common network intrusion attack techniques that organizations should continually look for: Living Off the Land: Attackers increasingly use existing tools and processes and stolen credentials when compromising networks. These tools like operating system utilities, business productivity software and scripting languages are clearly not malware and have very legitimate usage as well. In fact, in most cases, the vast majority of the usage is business justified, allowing an attacker to blend in. Multi-Routing: If a network allows for asymmetric routing, attackers will often leverage multiple routes to access the targeted device or network. This allows them to avoid being detected by having a large portion of suspicious packets bypass certain network segments and any relevant network intrusion systems. Buffer Overwrit...

  11. u

    Joint Report On Publicly Available Hacking Tools - Catalogue - Canadian...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Joint Report On Publicly Available Hacking Tools - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-b83288ca-8c65-431f-b85f-ffa26f975d45
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    "This report is a collaborative research effort by the cyber security authorities of five nations: Australia, Canada, New Zealand, the UK and USA. In it we highlight the use of five publicly-available tools, which have been used for malicious purposes in recent cyber incidents around the world. To aid the work of network defenders and systems administrators, we also provide advice on limiting the effectiveness of these tools and detecting their use on a network."

  12. Cybersecurity Breaches Information 2010-2023

    • kaggle.com
    zip
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hello_world 3112 (2023). Cybersecurity Breaches Information 2010-2023 [Dataset]. https://www.kaggle.com/datasets/sumanth3112/hello-world
    Explore at:
    zip(43549 bytes)Available download formats
    Dataset updated
    Dec 11, 2023
    Authors
    Hello_world 3112
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Each row in the dataset represents a specific data breach incident. Here's an explanation of the columns in the dataset:

    Number: An identifier for each data breach incident.

    Name_of_Covered_Entity: The name of the organization or entity that experienced the data breach.

    Business_Associate_Involved: Information about whether a business associate was involved in the breach.

    Total_Individuals: The total number of individuals affected by the breach.

    Individuals_Affected: The number of individuals whose information was compromised.

    Type_of_Breach: The method or nature of the data breach (e.g., theft, loss, hacking/IT incident, unauthorized access/disclosure).

    Location_of_Breached_Information: The location or type of device where the breached information was stored (e.g., laptop, desktop computer, network server).

    Breach_Start: The start date of the data breach.

    Breach_End: The end date of the data breach.

    Branch: A categorical identifier, possibly indicating a specific branch or division of the organization.

    Department: A categorical identifier, possibly indicating a specific department within the organization.

    CountryBranch: The country associated with the branch.

    Employee(who find out breach): The employee who discovered the breach.

    Employee URL: A URL link associated with the employee who discovered the breach.

    Estimate Stole Data(GB): An estimate of the amount of data stolen in gigabytes.

  13. P

    Password Hacking Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Password Hacking Software Report [Dataset]. https://www.datainsightsmarket.com/reports/password-hacking-software-1982234
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Explore the booming password hacking software market, projected to reach $6.1 billion by 2033. This in-depth analysis covers market size, growth drivers, key players (Hashcat, John the Ripper, Burp Suite), and regional trends. Learn about ethical hacking, penetration testing, and the evolving landscape of cybersecurity.

  14. E

    Ethical Hacking Certification Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Ethical Hacking Certification Report [Dataset]. https://www.datainsightsmarket.com/reports/ethical-hacking-certification-1988933
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Sep 28, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Explore the booming Ethical Hacking Certification market, driven by rising cyber threats. Discover key insights, market size, CAGR, and trends impacting SMEs and enterprises globally through 2033.

  15. f

    Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2

    • figshare.com
    • data.4tu.nl
    zip
    Updated Jul 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Dupont; Alexios Lekidis; J. (Jerry) den Hartog; S. (Sandro) Etalle (2020). Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2 [Dataset]. http://doi.org/10.4121/uuid:b74b4928-c377-4585-9432-2004dfa20a5d
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 28, 2020
    Dataset provided by
    4TU.ResearchData
    Authors
    Guillaume Dupont; Alexios Lekidis; J. (Jerry) den Hartog; S. (Sandro) Etalle
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset contains automotive Controller Area Network (CAN) bus data from three systems: two cars (Opel Astra and Renault Clio) and from a CAN bus prototype we built ourselves. Its purpose is meant to evaluate CAN bus Network Intrusion Detection Systems (NIDS). For each system, the dataset consists in a collection of log files captured from its CAN bus: normal (attack-free) data for training and testing detection algorithms, and different CAN bus attacks (Diagnostic, Fuzzing attacks, Replay attack, Suspension attack and Denial-of-Service attack).

  16. n

    Data from: The extent and consequences of p-hacking in science

    • data-staging.niaid.nih.gov
    • researchdata.edu.au
    • +3more
    zip
    Updated Feb 24, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan L. Head; Luke Holman; Rob Lanfear; Andrew T. Kahn; Michael D. Jennions (2016). The extent and consequences of p-hacking in science [Dataset]. http://doi.org/10.5061/dryad.79d43
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2016
    Dataset provided by
    Australian National University
    Authors
    Megan L. Head; Luke Holman; Rob Lanfear; Andrew T. Kahn; Michael D. Jennions
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.

  17. Data Breaches

    • kaggle.com
    zip
    Updated Nov 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Data Breaches [Dataset]. https://www.kaggle.com/datasets/thedevastator/data-breaches-a-comprehensive-list/discussion
    Explore at:
    zip(9067 bytes)Available download formats
    Dataset updated
    Nov 10, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data Breaches Dataset

    30,000 Records of cyber-security data breaches

    About this dataset

    This dataset is a compilation of data from various sources detailing data breaches. These sources include press reports, government news releases, and mainstream news articles. The list includes those involving the theft or compromise of 30,000 or more records, although many smaller breaches occur continually. In addition, the various methods used in the breaches are listed, with hacking being the most common.

    Organizations of all types and sizes are susceptible to data breaches, which can have devastating consequences. This dataset can help shed light on which organizations are most at risk and how these breaches occur so that steps can be taken to prevent them in the future

    How to use the dataset

    There are many ways to use this dataset. Here are a few ideas:

    • Use the data to understand which types of organizations are most commonly breached, and what methods are used most often.
    • Analyze the data to see if there are any trends or patterns in when or how breaches occur.
    • Use the data to create a visualizations or infographic showing the prevalence of data breaches

    Research Ideas

    • This dataset can be used to identify trends in data breaches in terms of methods used, types of organizations breached, and geographical distribution.

    • This dataset can be used to study the effect of data breaches on organizational reputation and customer trust.

    • This dataset can be used by organizations to benchmark their own security measures against those of similar organizations that have experienced data breaches

    Acknowledgements

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: df_1.csv | Column name | Description | |:----------------------|:---------------------------------------------------------------------| | Entity | The name of the organization that was breached. (String) | | Year | The year when the breach occurred. (Integer) | | Records | The number of records that were compromised in the breach. (Integer) | | Organization type | The type of organization that was breached. (String) | | Method | The method that was used to breach the organization. (String) | | Sources | The sources from which the data was collected. (String) |

  18. E

    Ethical Hacking Service Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Ethical Hacking Service Report [Dataset]. https://www.datainsightsmarket.com/reports/ethical-hacking-service-1966542
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Ethical Hacking Service market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.

  19. Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juha Nurmi; Juha Nurmi; Mikko Niemelä; Mikko Niemelä; Billy Brumley; Billy Brumley (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. http://doi.org/10.5281/zenodo.8047205
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juha Nurmi; Juha Nurmi; Mikko Niemelä; Mikko Niemelä; Billy Brumley; Billy Brumley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

    Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

    We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

    1. MalwareInfectionSet
    We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

    2. VictimAccessSet
    We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

    3. AccountAccessSet
    The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

    Credits Authors

    • Billy Bob Brumley (Tampere University, Tampere, Finland)
    • Juha Nurmi (Tampere University, Tampere, Finland)
    • Mikko Niemelä (Cyber Intelligence House, Singapore)

    Funding

    This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

    Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.

  20. Global number of breached user accounts Q1 2020-Q3 2025

    • statista.com
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global number of breached user accounts Q1 2020-Q3 2025 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    During the third quarter of 2025, data breaches exposed more than ** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the third quarter of ****, more than **** billion data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pranav J (2025). Car-Hacking Dataset [Dataset]. https://www.kaggle.com/datasets/pranavjha24/car-hacking-dataset
Organization logo

Car-Hacking Dataset

Car-Hacking Dataset for the intrusion detection

Explore at:
zip(137852368 bytes)Available download formats
Dataset updated
Feb 3, 2025
Authors
Pranav J
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Description Title: Car Hacking: CAN Intrusion Detection Dataset (Mirror) Source: Original dataset curated by the Hacking and Countermeasure Research Lab (HCRL) at Korea University

This dataset captures Controller Area Network (CAN) bus traffic from a real vehicle, simulating both normal driving scenarios and four types of cyberattacks:

Denial-of-Service (DoS)

Fuzzy/Flooding Attacks

Spoofing Attacks (RPM/Gear/Speed gauge manipulation)

Replay Attacks

It is widely used to develop machine learning models for detecting intrusions in automotive communication systems.

Dataset Structure Files:

normal.csv: Benign CAN traffic (2.1M messages).

attack_DoS.csv, attack_Fuzzy.csv, attack_spoofing.csv, attack_replay.csv: Attack-specific logs.

Features:

Timestamp, CAN ID, Data Length Code (DLC), Data (hexadecimal payload).

Label (0 for normal, 1 for attack).

Usage Examples Train ML/DL models (e.g., Random Forest, LSTM) for real-time CAN intrusion detection.

Benchmark automotive cybersecurity solutions.

Study CAN protocol vulnerabilities and attack patterns.

Citation If you use this dataset, cite the original work:

Publication Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.

Seo, Eunbi, Hyun Min Song, and Huy Kang Kim. "GIDS: GAN based Intrusion Detection System for In-Vehicle Network." 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018.'

This Kaggle dataset is a mirror of the original HCRL work.

For reproducibility, include the above citation in publications or projects using this data.

Search
Clear search
Close search
Google apps
Main menu