9 datasets found
  1. Healthcare Ransomware Dataset

    • kaggle.com
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rivalytics (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rivalytics
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    📌 Context of the Dataset

    The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

    Why is this important?

    Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

    📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

    1️⃣ IBM Cost of a Data Breach Report (2024)

    The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

    2️⃣ Sophos State of Ransomware in Healthcare (2024)

    67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

    3️⃣ Health & Human Services (HHS) Cybersecurity Reports

    Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

    4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

    Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

    5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

    The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

    📌 Why is This a Simulated Dataset?

    This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

    How It Was Created:

    1️⃣ Defining the Dataset Structure

    The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

    Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

    2️⃣ Generating Realistic Data Using ChatGPT & Python

    ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

    3️⃣ Ensuring Logical Relationships Between Data Points

    Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.

  2. Number of data compromises and impacted individuals in U.S. 2005-2024

    • statista.com
    • ai-chatbox.pro
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

  3. d

    FishVis, predicted occurrence and vulnerability for 13 fish species for...

    • catalog.data.gov
    • data.usgs.gov
    • +4more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). FishVis, predicted occurrence and vulnerability for 13 fish species for current (1961 - 1990) and future (2046 - 2100) climate conditions in Great Lakes streams [Dataset]. https://catalog.data.gov/dataset/fishvis-predicted-occurrence-and-vulnerability-for-13-fish-species-for-current-1961-1990-a
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    The Great Lakes
    Description

    Climate change is expected to alter the distributions and community composition of stream fishes in the Great Lakes region in the 21st century, in part as a result of altered hydrological systems (stream temperature, streamflow, and habitat). Resource managers need information and tools to understand where fish species and stream habitats are expected to change under future conditions. Fish sample collections and environmental variables from multiple sources across the United States Great Lakes Basin were integrated and used to develop empirical models to predict fish species occurrence under present-day climate conditions. Random Forests models were used to predict the probability of occurrence of 13 lotic fish species within each stream reach in the study area. Downscaled climate data from general circulation models were integrated with the fish species occurrence models to project fish species occurrence under future climate conditions. The 13 fish species represented three ecological guilds associated with water temperature (cold, cool, and warm), and the species were distributed in streams across the Great Lakes region. Vulnerability (loss of species) and opportunity (gain of species) scores were calculated for all stream reaches by evaluating changes in fish species occurrence from present-day to future climate conditions. The 13 fish species included 4 cold-water species, 5 cool-water species, and 4 warm-water species. Presently, the 4 cold-water species occupy from 15 percent (55,000 kilometers [km]) to 35 percent (130,000 km) of the total stream length (369,215 km) across the study area; the 5 cool-water species, from 9 percent (33,000 km) to 58 percent (215,000 km); and the 4 warm-water species, from 9 percent (33,000 km) to 38 percent (141,000 km). Fish models linked to projections from 13 downscaled climate models projected that in the mid to late 21st century (2046–65 and 2081–2100, respectively) habitats suitable for all 4 cold-water species and 4 of 5 cool-water species under present-day conditions will decline as much as 86 percent and as little as 33 percent, and habitats suitable for all 4 warm-water species will increase as much as 33 percent and as little as 7 percent. This report documents the approach and data used to predict and project fish species occurrence under present-day and future climate conditions for 13 lotic fish species in the United States Great Lakes Basin. A Web-based decision support mapping application termed “FishVis” was developed to provide a means to integrate, visualize, query, and download the results of these projected climate-driven responses and help inform conservation planning efforts within the region. A geodatabase containing the full dataset of results that are being mapped in FishVis can be downloaded from the FishVis mapping application at http://ccviewer.wim.usgs.gov/FishVis/ or through USGS ScienceBase as a Data Release (Stewart and others, 2016). The geodatabase contains five feature classes, each with their own metadata record and include data attributed to the stream reach (fishvis_reacha83 and fishvis_search_reacha83), catchment (fishvis_catcha83 and fishvis_reacha83), and huc12 (fishvis_huc12a83). The citation for the USGS Scientific Investigation Report that documents this dataset is: Stewart, J.S., Covert, S.A., Estes, N.J., Westenbroek, S.M., Krueger, Damon, Wieferich, D.J., Slattery, M.T., Lyons, J.D., McKenna, J.E., Jr., Infante, D.M., Bruce, J.L., 2016, FishVis, A regional decision support tool for identifying vulnerabilities of riverine habitat and fishes to climate change in the Great Lakes Region: U.S. Geological Survey Scientific Investigations Report 2016-5124, 15 p., http://dx.doi.org/10.3133/sir20165124.

  4. Global number of breached user accounts Q1 2020-Q3 2024

    • statista.com
    • ai-chatbox.pro
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global number of breached user accounts Q1 2020-Q3 2024 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    During the third quarter of 2024, data breaches exposed more than *** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the first quarter of ***, more than *** million data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.

  5. i

    IoT Healthcare Security Dataset

    • ieee-dataport.org
    Updated Aug 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Hussain (2021). IoT Healthcare Security Dataset [Dataset]. https://ieee-dataport.org/documents/iot-healthcare-security-dataset
    Explore at:
    Dataset updated
    Aug 16, 2021
    Authors
    Faisal Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    smart city

  6. p

    Software Vulnerability Dumps

    • data.public.lu
    json
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Computer Incident Response Center Luxembourg (2025). Software Vulnerability Dumps [Dataset]. https://data.public.lu/datasets/software-vulnerability-dumps/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    Computer Incident Response Center Luxembourg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A daily dump of all the vulnerability sources are exported including CVE and many others is published with the expanded values as seen on https://vulnerability.circl.lu/dumps/

  7. P

    How to Login Router Account?: A Complete Guide Dataset

    • paperswithcode.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How to Login Router Account?: A Complete Guide Dataset [Dataset]. https://paperswithcode.com/dataset/how-to-login-router-account-a-complete-guide
    Explore at:
    Dataset updated
    Jun 17, 2025
    Description

    (Toll Free) Number +1-341-900-3252

    In the digital era, (Toll Free) Number +1-341-900-3252 our daily lives revolve around internet connectivity. (Toll Free) Number +1-341-900-3252 From smart homes to remote work and entertainment, everything depends on a stable and secure network. (Toll Free) Number +1-341-900-3252 At the heart of this network is your router, and to control, configure, or troubleshoot it, you need to access the router account login (Toll Free) Number +1-341-900-3252 interface.

    (Toll Free) Number +1-341-900-3252

    In this guide, we'll walk you through everything you need to know about (Toll Free) Number +1-341-900-3252 your router account login—how to access it, why it’s important, common issues you might face, and tips to keep it secure.

    What Is a Router Account Login? A router account login is the administrative gateway to your router’s settings and features. (Toll Free) Number +1-341-900-3252 It allows you to access the router's web-based management panel where you can make changes such as:

    (Toll Free) Number +1-341-900-3252

    Configuring Wi-Fi settings

    Changing the network name (SSID) and password

    Updating firmware

    Setting up parental controls or guest networks

    Managing connected devices

    Enhancing security settings

    Whether you’re troubleshooting a slow connection or simply want to update your password, (Toll Free) Number +1-341-900-3252 knowing how to use your router account login is essential.

    Why Accessing Your Router Account Login Matters Most people set up their router once and forget about it. However, (Toll Free) Number +1-341-900-3252 logging into your router’s account gives you control over your network and the ability to:

    Improve security by changing default credentials (Toll Free) Number +1-341-900-3252

    Optimize performance by choosing the best channel or frequency band

    Monitor devices connected to your network

    Set limits for children or guests

    Update firmware to fix bugs or vulnerabilities

    In short, using the router account login interface helps (Toll Free) Number +1-341-900-3252 keep your internet experience secure, fast, and tailored to your needs.

    How to Access Your Router Account Login Accessing your router account login is usually straightforward. Follow these steps:

    Connect to the Router Make sure your device (laptop, smartphone, or desktop) is connected to the router (Toll Free) Number +1-341-900-3252 either via Wi-Fi or through an Ethernet cable.

    Find the Router’s IP Address The default IP address is often printed on the back or bottom of the router. Common default IPs include: (Toll Free) Number +1-341-900-3252

    192.168.0.1

    192.168.1.1

    10.0.0.1

    If you’re unsure, you can find it by opening a Command Prompt (Windows) or Terminal (Mac) and typing:

    nginx CopyEdit ipconfig (on Windows) ifconfig (on Mac/Linux) Look for the "Default Gateway" address.

    Enter the IP in a Browser Open a web browser and enter the IP address into the address bar. Press Enter.

    Enter Username and Password You’ll be prompted to enter your router account credentials. (Toll Free) Number +1-341-900-3252 These are often set to default values like: (Toll Free) Number +1-341-900-3252

    Username: admin

    Password: admin or password

    If you’ve changed them before and forgotten the details, you may need to reset the router.

    Common Issues with Router Account Login Sometimes users face trouble logging into their router account login page. Here are common issues and how to resolve them:

    Incorrect IP Address If you enter the wrong IP, you won’t reach the login page. Double-check the correct IP through your system’s settings.

    Forgotten Username or Password If you’ve changed your login credentials and can’t remember them, you may need to reset the router to factory settings. Hold down the reset button (usually found on the back) for 10–30 seconds. (Toll Free) Number +1-341-900-3252

    Browser Cache Issues Clear your browser’s cache or try using incognito/private mode if the login page isn’t loading correctly. (Toll Free) Number +1-341-900-3252

    No Internet Access Even without internet access, you should still be able to access the router’s settings locally if your device is connected directly.

    Securing Your Router Account Login Your router is the first line of defense in your home or office network. Keeping your router account login secure is critical for protecting your personal data.

    Here are essential tips:

    Change Default Credentials: Hackers often target routers with default usernames and passwords.

    Enable Strong Wi-Fi Passwords: Use WPA2 or WPA3 encryption.

    Update Firmware Regularly: Patches fix vulnerabilities.

    Disable Remote Access: Unless necessary, remote management should be turned off.

    Use a Secure Password: Use a complex combination of characters, and store it safely. (Toll Free) Number +1-341-900-3252

    What You Can Do After Logging In Once logged in, the router account login dashboard gives you access to many useful tools and settings: (Toll Free) Number +1-341-900-3252

    Change Network Name and Password Make your Wi-Fi easier to identify and more secure.

    Manage Connected Devices Monitor who’s using your network and block unauthorized users.

    Set Parental Controls Limit internet access during specific hours or block inappropriate content.

    Create a Guest Network Allow guests to connect without giving them access to your main network.

    Update Router Firmware Check for updates and install them to keep your router running smoothly.

    Router Brands and Login Interfaces Most major router brands (such as TP-Link, Netgear, Linksys, ASUS, and D-Link) offer similar router account login processes, though their interfaces may differ in layout. Many also offer mobile apps that simplify the login and management process. (Toll Free) Number +1-341-900-3252

    Conclusion Understanding how to access (Toll Free) Number +1-341-900-3252 and use your router account login is essential (Toll Free) Number +1-341-900-3252 for maintaining a secure and reliable internet connection. Whether you're a casual user wanting (Toll Free) Number +1-341-900-3252 to update a Wi-Fi password (Toll Free) Number +1-341-900-3252 or a more advanced user aiming to configure custom settings, this access puts full control of your network in your hands (Toll Free) Number +1-341-900-3252.

  8. w

    Mortality Risk from High Temperatures in London (Triple Jeopardy Mapping)

    • data.wu.ac.at
    • data.europa.eu
    html
    Updated Mar 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greater London Authority (GLA) (2018). Mortality Risk from High Temperatures in London (Triple Jeopardy Mapping) [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/ZmUwZTI2YWMtNWYxNC00MTRkLTg0YWYtMzY3OTdhODI3YWMw
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 15, 2018
    Dataset provided by
    Greater London Authority (GLA)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    London
    Description

    A heatwave refers to a prolonged period of unusually hot weather. While there is no standard definition of a heatwave in England, the Met Office generally uses the World Meteorological Organization definition of a heatwave, which is "when the daily maximum temperature of more than five consecutive days exceeds the average maximum temperature by 5°C, the normal period being 1961-1990". They are common in the northern and southern hemisphere during summer, and have historically been associated with health problems and an increase in mortality. The urban heat island (UHI) is the phenomenon where temperatures are relatively higher in cities compared to surrounding rural areas due to, for example, the urban surfaces and anthropogenic heat sources. For an example of an urban heat island map during an average summer, see this dataset. For an example of an urban heat island map during a warm summer, see this dataset. As well as outdoor temperature, an individual’s heat exposure may also depend on the type of building they are inside, if indoors. Indoor temperature exposure may depend on a number of characteristics, such as the building geometry, construction materials, window sizes, and the ability to add extra ventilation. It is also known that people have different vulnerabilities to heat, with some more prone to negative health issues when exposed to high temperatures. This Triple Jeopardy dataset combines: Urban Heat Island information for London, based on the 55 days between May 26th -July 19th 2006, where the last four days were considered a heatwave An estimate of the indoor temperatures for individual dwellings in London across this time period Population age, as a proxy for heat vulnerability, and distribution From this, local levels of heat-related mortality were estimated using a mortality model derived from epidemiological data. The dataset comprises four layers: Ind_Temp_A – indoor Temperature Anomaly is the difference in degrees Celsius between the estimated indoor temperatures for dwellings and the average indoor temperature estimate for the whole of London, averaged by ward. Positive numbers show dwellings with a greater tendency to overheat in comparison with the London average HeatMortpM – total estimated mortality due to heat (outdoor and indoor) per million population over the entire 55 day period, inclusive of age effects HeatMorUHI – estimated mortality per million population due to increased outdoor temperature exposure caused by the UHI over the 55 day period (excluding the effect of overheating housing), inclusive of age effects HeatMorInd - estimated mortality per million population due to increased temperature exposure caused by heat-vulnerable dwellings (excluding the effect of the UHI) over the 55 day period, inclusive of age effects More information is on this website and in the Triple Jeopardy leaflet. The maps are also available as one combined PDF. More information is on this website and in the Triple Jeopardy leaflet.

  9. D

    PROactive Cohort Study: Data

    • dataverse.nl
    docx, pdf, xlsx
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanne Nijhof; Sanne Nijhof; Elise van de Putte; Elise van de Putte; Johanna Wilhelmina Hoefnagels; Johanna Wilhelmina Hoefnagels (2024). PROactive Cohort Study: Data [Dataset]. http://doi.org/10.34894/FXUGHW
    Explore at:
    pdf(84294), pdf(1121139), pdf(81313), xlsx(1633165), pdf(92817), pdf(246346), pdf(472708), pdf(219248), pdf(146208), docx(82916), pdf(79030), pdf(248513), pdf(203559), pdf(644462)Available download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    DataverseNL
    Authors
    Sanne Nijhof; Sanne Nijhof; Elise van de Putte; Elise van de Putte; Johanna Wilhelmina Hoefnagels; Johanna Wilhelmina Hoefnagels
    License

    https://dataverse.nl/api/datasets/:persistentId/versions/9.0/customlicense?persistentId=doi:10.34894/FXUGHWhttps://dataverse.nl/api/datasets/:persistentId/versions/9.0/customlicense?persistentId=doi:10.34894/FXUGHW

    Description

    Children with a chronic disease face more obstacles than their healthy peers, which may impact their physical, social-emotional, and cognitive development. In the long run, children with a chronic disease reach developmental milestones later than their healthy peers and many children will remain dependent on medication and/ or will be limited in their daily life activities. The PROactive Cohort Study aims to assess fatigue, participation, and psychosocial well-being across children with various chronic diseases over the course of their lifespan since their increased vulnerability is a fact. These factors have the potential to influence their identity and how they grow into autonomous adults that take part in our society. Also the PROactive Cohort Study is aimed at supporting people with chronic and/or life-threatening conditions to increase their ability to adapt, and their self-manage capacities. This means that PROactive also systematically monitors the child's capacity and ability to play and the well-being of the patients and their families. This knowledge can be used as an innovative and interactive method for creating prevention and treatment strategies. This will help to assess vulnerabilities and resilience among children with chronic and/or life-threatening conditions and their families. This cohort study follows a continuous longitudinal design. It is based at the Wilhelmina Children's Hospital in the Netherlands and has been running since December 2016. Children with a chronic disease (e.g. cystic fibrosis, juvenile idiopathic arthritis, chronic kidney disease, or congenital heart disease) in a broad age range (2-18 years) are included, as well as their parent(s). Patient-reported outcome measures (PROMs) are collected from parents (children between 2-18 years) and children (8-18 years). The PROactive Cohort Study uses a flexible design in which the research assessment is an integrated part of clinical care. Children are included when they visit the outpatient clinic and are followed up annually, preferably linked to another outpatient visit.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rivalytics (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
Organization logo

Healthcare Ransomware Dataset

Analyze attacks, strengthen security, and improve recovery in healthcare

Explore at:
199 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rivalytics
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

📌 Context of the Dataset

The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

Why is this important?

Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

1️⃣ IBM Cost of a Data Breach Report (2024)

The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

2️⃣ Sophos State of Ransomware in Healthcare (2024)

67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

3️⃣ Health & Human Services (HHS) Cybersecurity Reports

Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

📌 Why is This a Simulated Dataset?

This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

How It Was Created:

1️⃣ Defining the Dataset Structure

The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

2️⃣ Generating Realistic Data Using ChatGPT & Python

ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

3️⃣ Ensuring Logical Relationships Between Data Points

Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.

Search
Clear search
Close search
Google apps
Main menu