48 datasets found
  1. Healthcare Ransomware Dataset

    • kaggle.com
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rivalytics (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rivalytics
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    📌 Context of the Dataset

    The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

    Why is this important?

    Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

    📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

    1️⃣ IBM Cost of a Data Breach Report (2024)

    The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

    2️⃣ Sophos State of Ransomware in Healthcare (2024)

    67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

    3️⃣ Health & Human Services (HHS) Cybersecurity Reports

    Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

    4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

    Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

    5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

    The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

    📌 Why is This a Simulated Dataset?

    This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

    How It Was Created:

    1️⃣ Defining the Dataset Structure

    The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

    Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

    2️⃣ Generating Realistic Data Using ChatGPT & Python

    ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

    3️⃣ Ensuring Logical Relationships Between Data Points

    Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.

  2. Healthcare Professionals Data | Healthcare & Hospital Executives in Europe |...

    • datarade.ai
    Updated Jan 1, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2018). Healthcare Professionals Data | Healthcare & Hospital Executives in Europe | Verified Global Profiles from 700M+ Dataset | Best Price Guarantee [Dataset]. https://datarade.ai/data-products/healthcare-professionals-data-healthcare-hospital-executi-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jan 1, 2018
    Dataset provided by
    Area covered
    Denmark, Belarus, Luxembourg, Åland Islands, Russian Federation, Jersey, Holy See, Finland, Sweden, Guernsey
    Description

    Success.ai’s Healthcare Professionals Data for Healthcare & Hospital Executives in Europe provides a reliable and comprehensive dataset tailored for businesses aiming to connect with decision-makers in the European healthcare and hospital sectors. Covering healthcare executives, hospital administrators, and medical directors, this dataset offers verified contact details, professional insights, and leadership profiles.

    With access to over 700 million verified global profiles and data from 70 million businesses, Success.ai ensures your outreach, market research, and partnership strategies are powered by accurate, continuously updated, and GDPR-compliant data. Backed by our Best Price Guarantee, this solution is indispensable for navigating and thriving in Europe’s healthcare industry.

    Why Choose Success.ai’s Healthcare Professionals Data?

    1. Verified Contact Data for Targeted Engagement

      • Access verified work emails, phone numbers, and LinkedIn profiles of healthcare executives, hospital administrators, and medical directors.
      • AI-driven validation ensures 99% accuracy, reducing data gaps and improving communication effectiveness.
    2. Comprehensive Coverage of European Healthcare Professionals

      • Includes profiles of professionals from top hospitals, healthcare organizations, and medical institutions across Europe.
      • Gain insights into regional healthcare trends, operational challenges, and emerging technologies.
    3. Continuously Updated Datasets

      • Real-time updates capture changes in leadership roles, organizational structures, and market dynamics.
      • Stay aligned with the fast-evolving healthcare landscape to identify emerging opportunities.
    4. Ethical and Compliant

      • Fully adheres to GDPR, CCPA, and other global data privacy regulations, ensuring responsible and lawful data usage.

    Data Highlights:

    • 700M+ Verified Global Profiles: Connect with healthcare professionals and decision-makers in Europe’s hospital and healthcare sectors.
    • 70M+ Business Profiles: Access detailed firmographic data, including hospital sizes, revenue ranges, and geographic footprints.
    • Leadership Insights: Engage with CEOs, medical directors, and administrative leaders shaping healthcare strategies.
    • Regional Healthcare Trends: Understand trends in digital healthcare adoption, operational efficiency, and patient care management.

    Key Features of the Dataset:

    1. Comprehensive Professional Profiles

      • Identify and connect with key players, including hospital executives, medical directors, and department heads in the healthcare industry.
      • Access data on professional histories, certifications, and areas of expertise for precise targeting.
    2. Advanced Filters for Precision Campaigns

      • Filter professionals by hospital size, geographic location, or job function (administrative, medical, or operational).
      • Tailor campaigns to align with specific needs such as digital transformation, patient care solutions, or regulatory compliance.
    3. Healthcare Industry Insights

      • Leverage data on operational trends, hospital management practices, and regional healthcare needs.
      • Refine product offerings and outreach strategies to address pressing challenges in the European healthcare market.
    4. AI-Driven Enrichment

      • Profiles enriched with actionable data allow for personalized messaging, highlight unique value propositions, and improve engagement outcomes with healthcare professionals.

    Strategic Use Cases:

    1. Marketing and Outreach to Healthcare Executives

      • Promote healthcare IT solutions, medical devices, or operational efficiency tools to executives managing hospitals and clinics.
      • Use verified contact data for multi-channel outreach, including email, phone, and digital marketing.
    2. Partnership Development and Collaboration

      • Build relationships with hospitals, healthcare providers, and medical institutions exploring strategic partnerships or new technology adoption.
      • Foster alliances that drive patient care improvements, cost savings, or operational efficiency.
    3. Market Research and Competitive Analysis

      • Analyze trends in European healthcare to refine product development, marketing strategies, and engagement plans.
      • Benchmark against competitors to identify growth opportunities, underserved segments, and innovative solutions.
    4. Recruitment and Workforce Solutions

      • Target HR professionals and hiring managers in healthcare institutions recruiting for administrative, medical, or operational roles.
      • Provide workforce optimization platforms, training solutions, or staffing services tailored to the healthcare sector.

    Why Choose Success.ai?

    1. Best Price Guarantee

      • Access premium-quality healthcare professional data at competitive prices, ensuring strong ROI for your marketing, sales, and strategic initiatives.
    2. Seamless Integration
      ...

  3. VHA hospitals Timely Care Data

    • kaggle.com
    Updated Jan 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). VHA hospitals Timely Care Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/vha-hospitals-timely-care-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    VHA hospitals Timely Care Data

    Performance on Clinical Measures and Processes of Care

    By US Open Data Portal, data.gov [source]

    About this dataset

    This dataset provides an inside look at the performance of the Veterans Health Administration (VHA) hospitals on timely and effective care measures. It contains detailed information such as hospital names, addresses, census-designated cities and locations, states, ZIP codes county names, phone numbers and associated conditions. Additionally, each entry includes a score, sample size and any notes or footnotes to give further context. This data is collected through either Quality Improvement Organizations for external peer review programs as well as direct electronic medical records. By understanding these performance scores of VHA hospitals on timely care measures we can gain valuable insights into how VA healthcare services are delivering values throughout the country!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains information about the performance of Veterans Health Administration hospitals on timely and effective care measures. In this dataset, you can find the hospital name, address, city, state, ZIP code, county name, phone number associated with each hospital as well as data related to the timely and effective care measure such as conditions being measured and their associated scores.

    To use this dataset effectively, we recommend first focusing on identifying an area of interest for analysis. For example: what condition is most impacting wait times for patients? Once that has been identified you can narrow down which fields would best fit your needs - for example if you are studying wait times then “Score” may be more valuable to filter than Footnote. Additionally consider using aggregation functions over certain fields (like average score over time) in order to get a better understanding of overall performance by factor--for instance Location.

    Ultimately this dataset provides a snapshot into how Veteran's Health Administration hospitals are performing on timely and effective care measures so any research should focus around that aspect of healthcare delivery

    Research Ideas

    • Analyzing and predicting hospital performance on a regional level to improve the quality of healthcare for veterans across the country.
    • Using this dataset to identify trends and develop strategies for hospitals that consistently score low on timely and effective care measures, with the goal of improving patient outcomes.
    • Comparison analysis between different VHA hospitals to discover patterns and best practices in providing effective care so they can be shared with other hospitals in the system

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: csv-1.csv | Column name | Description | |:-----------------------|:-------------------------------------------------------------| | Hospital Name | Name of the VHA hospital. (String) | | Address | Street address of the VHA hospital. (String) | | City | City where the VHA hospital is located. (String) | | State | State where the VHA hospital is located. (String) | | ZIP Code | ZIP code of the VHA hospital. (Integer) | | County Name | County where the VHA hospital is located. (String) | | Phone Number | Phone number of the VHA hospital. (String) | | Condition | Condition being measured. (String) | | Measure Name | Measure used to measure the condition. (String) | | Score | Score achieved by the VHA h...

  4. Healthcare Professionals Data | Global Healthcare Professionals Contact Data...

    • datarade.ai
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). Healthcare Professionals Data | Global Healthcare Professionals Contact Data | Verified Profiles, Work Emails & Phone Data | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/healthcare-professionals-data-global-healthcare-professiona-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Area covered
    United Kingdom, Virgin Islands (British), Grenada, Myanmar, Niue, United Arab Emirates, Russian Federation, Greece, Argentina, Qatar
    Description

    Success.ai’s B2B Contact Data and Healthcare Professionals Data for Global Healthcare Professionals offers businesses a powerful resource to connect with healthcare administrators and decision-makers across the globe. Derived from over 170 million verified professional profiles, this dataset delivers unparalleled accuracy and reach, enabling effective outreach and building strategic relationships with professionals in the healthcare sector.

    Why Choose Success.ai’s Global Healthcare Professionals Contact Data?

    1. Verified Contact Information:
    2. Access accurate and up-to-date work emails and direct phone numbers for healthcare administrators, executives, and other key professionals.
    3. Every profile is validated using advanced AI algorithms, ensuring up to 99% accuracy.

    4. Global Reach Across Healthcare:

    5. Connect with healthcare professionals and decision-makers in hospitals, clinics, research institutions, and public health organizations worldwide.

    6. Includes data for regions such as North America, Europe, Asia-Pacific, and beyond.

    7. Continuously Updated Profiles:

    8. Ensure your campaigns are supported by the latest data with our real-time updates.

    9. Adapt to industry trends and professional movements dynamically.

    10. Compliance with Data Privacy Laws:

    11. Fully adheres to global regulations like GDPR and CCPA, ensuring ethical use of contact information.

    Data Highlights: - 170M+ Verified Professional Profiles: A vast dataset encompassing professionals from multiple industries, including healthcare. - 50M Work Emails: Verified and AI-validated for precise and reliable communication. - 30M Company Profiles: Gain insights into the organizations where healthcare professionals operate. - 700M Global Professional Profiles: Comprehensive datasets that enhance your outreach and analytics.

    Key Features of the Dataset: - Comprehensive Professional Profiles: Verified work emails, direct phone numbers, and LinkedIn profiles for accurate targeting. - Customizable Segmentation: Filter by job titles, industries, company sizes, and geographic locations. - AI-Driven Insights: Profiles enriched with role-specific and industry-specific insights for maximum relevance.

    Strategic Use Cases:

    1. Sales and Business Development:
    2. Equip your teams with accurate data to directly connect with healthcare administrators and executives.
    3. Streamline outreach efforts to build relationships and close deals faster.

    4. Targeted Marketing Campaigns:

    5. Execute tailored email and phone campaigns for healthcare professionals and organizations.

    6. Maximize engagement with hyper-personalized outreach strategies.

    7. Recruitment in Healthcare:

    8. Find and connect with top talent for executive, administrative, and clinical roles in the healthcare industry.

    9. Ensure you’re reaching the right candidates with continuously updated contact information.

    10. Market Research and Strategic Planning:

    11. Analyze trends in healthcare hiring, administration, and innovation using rich data insights.

    12. Identify partnership opportunities with institutions and organizations at the forefront of healthcare.

    Why Success.ai is Your Trusted Partner?

    1. Best Price Guarantee:
    2. High-quality datasets at the most competitive prices in the market.

    3. Tailored Solutions:

    4. Flexible options for accessing and integrating datasets based on your unique business needs.

    5. Seamless Integration:

    6. Choose API integration or downloadable datasets to match your workflows.

    7. Unmatched Accuracy and Scale:

    8. Sourced from 170M verified professional profiles, our data is enriched and validated with AI to deliver industry-leading accuracy.

    APIs for Advanced Data Solutions:

    • Data Enrichment API: Enhance your existing data with real-time updates and additional insights for healthcare professionals.

    • Lead Generation API: Directly integrate our healthcare professional contact data into your CRM or marketing platforms for seamless campaigns.

    Transform your outreach and engagement strategies with B2B Contact Data for Global Healthcare Professionals from Success.ai. Whether you’re targeting administrators, executives, or decision-makers, our verified and continuously updated profiles provide the precision and depth you need to succeed.

    Enjoy the benefits of our Best Price Guarantee and experience the difference with Success.ai. Contact us now to empower your business with AI-validated contact data that drives real results!

    No one beats us on price. Period.

  5. o

    Hand Washing Video Dataset Annotated According to the World Health...

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Dec 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atis Elsts; Maksims Ivanovs; Martins Lulla; Aleksejs Rutkovskis; Aija Vilde; Agita Melbārde-Kelmere; Olga Zemlanuhina; Andreta Slavinska; Olegs Sabelnikovs (2021). Hand Washing Video Dataset Annotated According to the World Health Organization's Handwashing Guidelines - Jurmala Hospital Subset [Dataset]. http://doi.org/10.5281/zenodo.5808763
    Explore at:
    Dataset updated
    Dec 29, 2021
    Authors
    Atis Elsts; Maksims Ivanovs; Martins Lulla; Aleksejs Rutkovskis; Aija Vilde; Agita Melbārde-Kelmere; Olga Zemlanuhina; Andreta Slavinska; Olegs Sabelnikovs
    Area covered
    Jūrmala
    Description

    Overview: This is a large-scale real-world dataset with videos recording medical staff washing their hands as part of their normal job duties in the Jurmala Hospital located in Jurmala, Latvia. There are 2427 hand washing episodes in total, almost all of which are annotated by two persons. The annotations classify the washing movements according to the World Health Organization's (WHO) guidelines by marking each frame in each video with a certain movement code. This dataset is part on three dataset series all following the same format: https://zenodo.org/record/4537209 - data collected in Pauls Stradins Clinical University Hospital https://zenodo.org/record/5808764 - data collected in Jurmala Hospital https://zenodo.org/record/5808789 - data collected in the Medical Education Technology Center (METC) of Riga Stradins University Applications: The intention of this dataset is twofold: to serve as a basis for training machine learning classifiers for automated hand washing movement recognition and quality control, and to allow to investigate the real-world quality of washing performed by working medical staff. Statistics: Frame rate: 30 FPS Resolution: 320x240 and 640x480 Number of videos: 2427 Number of annotation files: 4818 Movement codes (both in CSV and JSON files): 1: Hand washing movement ��� Palm to palm 2: Hand washing movement ��� Palm over dorsum, fingers interlaced 3: Hand washing movement ��� Palm to palm, fingers interlaced 4: Hand washing movement ��� Backs of fingers to opposing palm, fingers interlocked 5: Hand washing movement ��� Rotational rubbing of the thumb 6: Hand washing movement ��� Fingertips to palm 7: Turning off the faucet with a paper towel 0: Other hand washing movement Acknowledgments: The dataset collection was funded by the Latvian Council of Science project: "Automated hand washing quality control and quality evaluation system with real-time feedback", No: lzp - Nr. 2020/2-0309. References: For more detailed information, see this article, describing a similar dataset collected in a different project: M. Lulla, A. Rutkovskis, A. Slavinska, A. Vilde, A. Gromova, M. Ivanovs, A. Skadins, R. Kadikis, A. Elsts. Hand-Washing Video Dataset Annotated According to the World Health Organization���s Hand-Washing Guidelines. Data. 2021; 6(4):38. https://doi.org/10.3390/data6040038 Contact information: atis.elsts@edi.lv

  6. Children's Hospitals Pricing Information

    • kaggle.com
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Children's Hospitals Pricing Information [Dataset]. https://www.kaggle.com/datasets/thedevastator/children-s-hospitals-pricing-information/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Children's Hospitals Pricing Information

    Children's Hospitals Pricing Information

    By Amber Thomas [source]

    About this dataset

    This dataset contains machine-readable hospital pricing information for Children's Hospitals and Clinics of Minnesota. It includes three separate files:

    1. 2022-top-25-hospital-based-clinics-list.csv: This file provides the top 25 primary care procedure prices, including procedure codes, fees, and insurance coverage details.
    2. 2022-standard-list-of-charges-hospital-op.csv: This file includes machine-readable hospital pricing information, including procedure codes, fees, and insurance coverage details.
    3. 2022-msdrg.csv: This file also contains machine-readable hospital pricing information, including procedure codes, fees, and insurance coverage details.

    The data was collected programmatically using a custom script written in Node.js and Microsoft Playwright. These files were then mirrored on the data.world platform using the Import from URL option.

    If you find any errors in the dataset or have any questions or concerns, please leave a note in the Discussion tab of this dataset or contact supportdata.world for assistance

    How to use the dataset

    • Dataset Overview:

      • The dataset contains three files: a) 2022-top-25-hospital-based-clinics-list.csv: This file includes the top 25 primary care procedure prices for Children's Hospitals and Clinics of Minnesota, including procedure codes, fees, and insurance coverages. b) 2022-standard-list-of-charges-hospital-op.csv: This file includes machine-readable hospital pricing information for Children's Hospitals and Clinics of Minnesota, including procedure codes, fees, and insurance coverages. c) 2022-msdrg.csv: This file includes machine-readable hospital pricing information for Children's Hospitals and Clinics of Minnesota, including MSDRG (Medicare Severity Diagnosis Related Groups) codes, fees, and insurance coverages.
    • Data Collection:

      • The data was collected programmatically using a custom script written in Node.js with the assistance of Microsoft Playwright.
      • These datasets were programmatically mirrored on the data.world platform using the Import from URL option.
    • Usage Guidelines:

      • Explore Procedure Prices: You can analyze the top 25 primary care procedure prices by referring to the '2022-top-25-hospital-based-clinics-list.csv' file. It provides information on procedure codes (identifiers), associated fees (costs), and insurance coverage details.

      • Analyze Hospital Price Information: The '2022-standard-list-of-charges-hospital-op.csv' contains comprehensive machine-readable hospital pricing information. You can examine various procedures by their respective codes along with associated fees as well as corresponding insurance coverage details.

      • Understand MSDRG Codes & Fees: The '2022-msdrg.csv' file includes machine-readable hospital pricing information based on MSDRG (Medicare Severity Diagnosis Related Groups) codes. You can explore the relationship between diagnosis groups and associated fees, along with insurance coverage details.

    • Reporting Errors:

      • If you identify any errors or discrepancies in the dataset, please leave a note in the Discussion tab of this dataset to notify others who may be interested.
      • Alternatively, you can reach out to the data.world team at supportdata.world for further assistance.

    Research Ideas

    • Comparative Analysis: Researchers and healthcare professionals can use this dataset to compare the pricing of primary care procedures at Children's Hospitals and Clinics of Minnesota with other hospitals. This can help identify any variations or discrepancies in pricing, enabling better cost management and transparency.
    • Insurance Coverage Analysis: The insurance coverage information provided in this dataset can be used to analyze which procedures are covered by different insurance providers. This analysis can help patients understand their out-of-pocket expenses for specific procedures and choose the best insurance plan accordingly.
    • Cost Estimation: Patients can utilize this dataset to estimate the cost of primary care procedures at Children's Hospitals and Clinics of Minnesota before seeking medical treatment. By comparing procedure fees across different hospitals, patients can make informed decisions about where to receive their healthcare services based on affordability and quality

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **Unknown License - Please chec...

  7. w

    Dataset of book subjects where books includes Which country has the World's...

    • workwithdata.com
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects where books includes Which country has the World's best health care? [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=includes&fval0=Which+country+has+the+World%27s+best+health+care
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects, has 5 rows. and is filtered where the books includes Which country has the World's best health care?. It features 10 columns including book subject, number of authors, number of books, earliest publication date, and latest publication date. The preview is ordered by number of books (descending).

  8. Global patterns of current and future road infrastructure - Supplementary...

    • zenodo.org
    • explore.openaire.eu
    bin, zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meijer; Meijer; Huijbregts; Huijbregts; Schotten; Schipper; Schipper; Schotten (2022). Global patterns of current and future road infrastructure - Supplementary spatial data [Dataset]. http://doi.org/10.5281/zenodo.6420961
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Apr 7, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Meijer; Meijer; Huijbregts; Huijbregts; Schotten; Schipper; Schipper; Schotten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global patterns of current and future road infrastructure - Supplementary spatial data

    Authors: Johan Meijer, Mark Huijbregts, Kees Schotten, Aafke Schipper

    Research paper summary: Georeferenced information on road infrastructure is essential for spatial planning, socio-economic assessments and environmental impact analyses. Yet current global road maps are typically outdated or characterized by spatial bias in coverage. In the Global Roads Inventory Project we gathered, harmonized and integrated nearly 60 geospatial datasets on road infrastructure into a global roads dataset. The resulting dataset covers 222 countries and includes over 21 million km of roads, which is two to three times the total length in the currently best available country-based global roads datasets. We then related total road length per country to country area, population density, GDP and OECD membership, resulting in a regression model with adjusted R2 of 0.90, and found that that the highest road densities are associated with densely populated and wealthier countries. Applying our regression model to future population densities and GDP estimates from the Shared Socioeconomic Pathway (SSP) scenarios, we obtained a tentative estimate of 3.0–4.7 million km additional road length for the year 2050. Large increases in road length were projected for developing nations in some of the world's last remaining wilderness areas, such as the Amazon, the Congo basin and New Guinea. This highlights the need for accurate spatial road datasets to underpin strategic spatial planning in order to reduce the impacts of roads in remaining pristine ecosystems.

    Contents: The GRIP dataset consists of global and regional vector datasets in ESRI filegeodatabase and shapefile format, and global raster datasets of road density at a 5 arcminutes resolution (~8x8km). The GRIP dataset is mainly aimed at providing a roads dataset that is easily usable for scientific global environmental and biodiversity modelling projects. The dataset is not suitable for navigation. GRIP4 is based on many different sources (including OpenStreetMap) and to the best of our ability we have verified their public availability, as a criteria in our research. The UNSDI-Transportation datamodel was applied for harmonization of the individual source datasets. GRIP4 is provided under a Creative Commons License (CC-0) and is free to use. The GRIP database and future global road infrastructure scenario projections following the Shared Socioeconomic Pathways (SSPs) are described in the paper by Meijer et al (2018). Due to shapefile file size limitations the global file is only available in ESRI filegeodatabase format.

    Regional coding of the other vector datasets in shapefile and ESRI fgdb format:

    • Region 1: North America
    • Region 2: Central and South America
    • Region 3: Africa
    • Region 4: Europe
    • Region 5: Middle East and Central Asia
    • Region 6: South and East Asia
    • Region 7: Oceania

    Road density raster data:

    • Total density, all types combined
    • Type 1 density (highways)
    • Type 2 density (primary roads)
    • Type 3 density (secondary roads)
    • Type 4 density (tertiary roads)
    • Type 5 density (local roads)

    Keyword: global, data, roads, infrastructure, network, global roads inventory project (GRIP), SSP scenarios

  9. Johns Hopkins COVID-19 Case Tracker

    • data.world
    csv, zip
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 8, 2025
    Dataset provided by
    data.world, Inc.
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  10. World Countries Generalized

    • hub.arcgis.com
    • pacificgeoportal.com
    • +6more
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2022). World Countries Generalized [Dataset]. https://hub.arcgis.com/datasets/esri::world-countries-generalized
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    World,
    Description

    World Countries Generalized represents generalized boundaries for the countries of the world as of August 2022. The generalized political boundaries improve draw performance and effectiveness at a global or continental level. This layer is best viewed out beyond a scale of 1:5,000,000.This layer's geography was developed by Esri and sourced from Garmin International, Inc., the U.S. Central Intelligence Agency (The World Factbook), and the National Geographic Society for use as a world basemap. It is updated annually as country names or significant borders change.

  11. u

    Global Tropical Cyclone "Best Track" Position and Intensity Data

    • data.ucar.edu
    • oidc.rda.ucar.edu
    ascii
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Meteorology, Australia; Joint Typhoon Warning Center, U.S. Navy, U.S. Department of Defense; National Hurricane Center,Tropical Prediction Center, National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce; Research Data Archive, Computational and Information Systems Laboratory, National Center for Atmospheric Research, University Corporation for Atmospheric Research; Science Applications International Corporation (2024). Global Tropical Cyclone "Best Track" Position and Intensity Data [Dataset]. https://data.ucar.edu/dataset/global-tropical-cyclone-best-track-position-and-intensity-data
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
    Authors
    Bureau of Meteorology, Australia; Joint Typhoon Warning Center, U.S. Navy, U.S. Department of Defense; National Hurricane Center,Tropical Prediction Center, National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce; Research Data Archive, Computational and Information Systems Laboratory, National Center for Atmospheric Research, University Corporation for Atmospheric Research; Science Applications International Corporation
    Time period covered
    Jun 25, 1851 - Nov 26, 2011
    Area covered
    Description

    Time series of tropical cyclone "best track" position and intensity data are provided for all ocean basins where tropical cyclones occur. Position and intensity data are available at 6-hourly intervals over the duration of each cyclone's life. The general period of record begins in 1851, but this varies by ocean basin. See the inventories [http://rda.ucar.edu/datasets/ds824.1/inventories/] for data availability specific to each basin. This data set was received as a revision to an NCDC tropical cyclone data set, with data generally available through the late 1990s. Since then, the set is being continually updated from the U.S. NOAA National Hurricane Center and the U.S. Navy Joint Typhoon Warning Center best track archives. For a complete history of updates for each ocean basin, see the dataset documentation [http://rda.ucar.edu/datasets/ds824.1/docs/].

  12. Relevance and Redundancy ranking: Code and Supplementary material

    • springernature.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller (2023). Relevance and Redundancy ranking: Code and Supplementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5418706.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the code for Relevance and Redundancy ranking; a an efficient filter-based feature ranking framework for evaluating relevance based on multi-feature interactions and redundancy on mixed datasets.Source code is in .scala and .sbt format, metadata in .xml, all of which can be accessed and edited in standard, openly accessible text edit software. Diagrams are in openly accessible .png format.Supplementary_2.pdf: contains the results of experiments on multiple classifiers, along with parameter settings and a description of how KLD converges to mutual information based on its symmetricity.dataGenerator.zip: Synthetic data generator inspired from NIPS: Workshop on variable and feature selection (2001), http://www.clopinet.com/isabelle/Projects/NIPS2001/rar-mfs-master.zip: Relevance and Redundancy Framework containing overview diagram, example datasets, source code and metadata. Details on installing and running are provided below.Background. Feature ranking is benfie cial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by themselves might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundancy on mixed datasets. In the related publication, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundancy. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state of-the-art feature selection techniques.# Relevance and Redundancy Framework (rar-mfs) Build Statusrar-mfs is an algorithm for feature selection and can be employed to select features from labelled data sets. The Relevance and Redundancy Framework (RaR), which is the theory behind the implementation, is a novel feature selection algorithm that - works on large data sets (polynomial runtime),- can handle differently typed features (e.g. nominal features and continuous features), and- handles multivariate correlations.## InstallationThe tool is written in scala and uses the weka framework to load and handle data sets. You can either run it independently providing the data as an .arff or .csv file or you can include the algorithm as a (maven / ivy) dependency in your project. As an example data set we use heart-c. ### Project dependencyThe project is published to maven central (link). To depend on the project use:- maven xml de.hpi.kddm rar-mfs_2.11 1.0.2 - sbt: sbt libraryDependencies += "de.hpi.kddm" %% "rar-mfs" % "1.0.2" To run the algorithm usescalaimport de.hpi.kddm.rar._// ...val dataSet = de.hpi.kddm.rar.Runner.loadCSVDataSet(new File("heart-c.csv", isNormalized = false, "")val algorithm = new RaRSearch( HicsContrastPramsFA(numIterations = config.samples, maxRetries = 1, alphaFixed = config.alpha, maxInstances = 1000), RaRParamsFixed(k = 5, numberOfMonteCarlosFixed = 5000, parallelismFactor = 4))algorithm.selectFeatures(dataSet)### Command line tool- EITHER download the prebuild binary which requires only an installation of a recent java version (>= 6) 1. download the prebuild jar from the releases tab (latest) 2. run java -jar rar-mfs-1.0.2.jar--help Using the prebuild jar, here is an example usage: sh rar-mfs > java -jar rar-mfs-1.0.2.jar arff --samples 100 --subsetSize 5 --nonorm heart-c.arff Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ...- OR build the repository on your own: 1. make sure sbt is installed 2. clone repository 3. run sbt run Simple example using sbt directly after cloning the repository: sh rar-mfs > sbt "run arff --samples 100 --subsetSize 5 --nonorm heart-c.arff" Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ... ### [Optional]To speed up the algorithm, consider using a fast solver such as Gurobi (http://www.gurobi.com/). Install the solver and put the provided gurobi.jar into the java classpath. ## Algorithm### IdeaAbstract overview of the different steps of the proposed feature selection algorithm:https://github.com/tmbo/rar-mfs/blob/master/docu/images/algorithm_overview.png" alt="Algorithm Overview">The Relevance and Redundancy ranking framework (RaR) is a method able to handle large scale data sets and data sets with mixed features. Instead of directly selecting a subset, a feature ranking gives a more detailed overview into the relevance of the features. The method consists of a multistep approach where we 1. repeatedly sample subsets from the whole feature space and examine their relevance and redundancy: exploration of the search space to gather more and more knowledge about the relevance and redundancy of features 2. decude scores for features based on the scores of the subsets 3. create the best possible ranking given the sampled insights.### Parameters| Parameter | Default value | Description || ---------- | ------------- | ------------|| m - contrast iterations | 100 | Number of different slices to evaluate while comparing marginal and conditional probabilities || alpha - subspace slice size | 0.01 | Percentage of all instances to use as part of a slice which is used to compare distributions || n - sampling itertations | 1000 | Number of different subsets to select in the sampling phase|| k - sample set size | 5 | Maximum size of the subsets to be selected in the sampling phase|

  13. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  14. Z

    MoreFixes: Largest CVE dataset with fixes

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GADYATSKAYA, Olga (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119
    Explore at:
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    GADYATSKAYA, Olga
    Rietveld, Kristian F. D.
    Rahim Nouri, Sajad
    Akhoundali, Jafar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

    We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

    cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

    MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

    For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

    If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

    This product uses the NVD API but is not endorsed or certified by the NVD.

    This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

    To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

    POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

    Please use this for citation:

     title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery},
     author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga},
     booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering},
     pages={42--51},
     year={2024}
    }
    
  15. US Healthcare Readmissions and Mortality

    • kaggle.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Healthcare Readmissions and Mortality [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-healthcare-readmissions-and-mortality/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Area covered
    United States
    Description

    US Healthcare Readmissions and Mortality

    Evaluating Hospital Performance

    By Health [source]

    About this dataset

    This dataset contains detailed information about 30-day readmission and mortality rates of U.S. hospitals. It is an essential tool for stakeholders aiming to identify opportunities for improving healthcare quality and performance across the country. Providers benefit by having access to comprehensive data regarding readmission, mortality rate, score, measure start/end dates, compared average to national as well as other pertinent metrics like zip codes, phone numbers and county names. Use this data set to conduct evaluations of how hospitals are meeting industry standards from a quality and outcomes perspective in order to make more informed decisions when designing patient care strategies and policies

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides data on 30-day readmission and mortality rates of U.S. hospitals, useful in understanding the quality of healthcare being provided. This data can provide insight into the effectiveness of treatments, patient care, and staff performance at different healthcare facilities throughout the country.

    In order to use this dataset effectively, it is important to understand each column and how best to interpret them. The ‘Hospital Name’ column displays the name of the facility; ‘Address’ lists a street address for the hospital; ‘City’ indicates its geographic location; ‘State’ specifies a two-letter abbreviation for that state; ‘ZIP Code’ provides each facility's 5 digit zip code address; 'County Name' specifies what county that particular hospital resides in; 'Phone number' lists a phone contact for any given facility ;'Measure Name' identifies which measure is being recorded (for instance: Elective Delivery Before 39 Weeks); 'Score' value reflects an average score based on patient feedback surveys taken over time frame listed under ' Measure Start Date.' Then there are also columns tracking both lower estimates ('Lower Estimate') as well as higher estimates ('Higher Estimate'); these create variability that can be tracked by researchers seeking further answers or formulating future studies on this topic or field.; Lastly there is one more measure oissociated with this set: ' Footnote,' which may highlight any addional important details pertinent to analysis such as numbers outlying National averages etc..

    This data set can be used by hospitals, research facilities and other interested parties in providing inciteful information when making decisions about patient care standards throughout America . It can help find patterns about readmitis/mortality along county lines or answer questions about preformance fluctuations between different hospital locations over an extended amount of time. So if you are ever curious about 30 days readmitted within US Hospitals don't hesitate to dive into this insightful dataset!

    Research Ideas

    • Comparing hospitals on a regional or national basis to measure the quality of care provided for readmission and mortality rates.
    • Analyzing the effects of technological advancements such as telemedicine, virtual visits, and AI on readmission and mortality rates at different hospitals.
    • Using measures such as Lower Estimate Higher Estimate scores to identify systematic problems in readmissions or mortality rate management at hospitals and informing public health care policy

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: Readmissions_and_Deaths_-_Hospital.csv | Column name | Description | |:-------------------------|:---------------------------------------------------------------------------------------------------| | Hospital Name ...

  16. Z

    MGD: Music Genre Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana O. Silva (2021). MGD: Music Genre Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4778562
    Explore at:
    Dataset updated
    May 28, 2021
    Dataset provided by
    Gabriel P. Oliveira
    Mariana O. Silva
    Anisio Lacerda
    Danilo B. Seufitelli
    Mirella M. Moro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MGD: Music Genre Dataset

    Over recent years, the world has seen a dramatic change in the way people consume music, moving from physical records to streaming services. Since 2017, such services have become the main source of revenue within the global recorded music market. Therefore, this dataset is built by using data from Spotify. It provides a weekly chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.

    Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to December 2019, considering eight of the top 10 music markets according to IFPI: United States (1st), Japan (2nd), United Kingdom (3rd), Germany (4th), France (5th), Canada (8th), Australia (9th), and Brazil (10th).

    We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:

    Genre Networks: Success-based genre collaboration networks

    Genre Mapping: Genre mapping from Spotify genres to super-genres

    Artist Networks: Success-based artist collaboration networks

    Artists: Some artist data

    Hit Songs: Hit Song data and features

    Charts: Enhanced data from Spotify Weekly Top 200 Charts

    This dataset was originally built for a conference paper at ISMIR 2020. If you make use of the dataset, please also cite the following paper:

    Gabriel P. Oliveira, Mariana O. Silva, Danilo B. Seufitelli, Anisio Lacerda, and Mirella M. Moro. Detecting Collaboration Profiles in Success-based Music Genre Networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020), 2020.

    @inproceedings{ismir/OliveiraSSLM20, title = {Detecting Collaboration Profiles in Success-based Music Genre Networks}, author = {Gabriel P. Oliveira and Mariana O. Silva and Danilo B. Seufitelli and Anisio Lacerda and Mirella M. Moro}, booktitle = {21st International Society for Music Information Retrieval Conference} pages = {726--732}, year = {2020} }

  17. P

    Myket Android Application Install Dataset

    • paperswithcode.com
    Updated Aug 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Loghmani; Mohammadamin Fazli (2023). Myket Android Application Install Dataset [Dataset]. https://paperswithcode.com/dataset/myket-android-application-install
    Explore at:
    Dataset updated
    Aug 12, 2023
    Authors
    Erfan Loghmani; Mohammadamin Fazli
    Description

    This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

    Data Creation The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

    We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

    Data Structure The dataset has two main files.

    myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero. app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

    Dataset Details

    Total Instances: 694,121 install interaction instances Instances Format: Triplets of user_id, app_name, timestamp 10,000 users and 7,988 android applications Item features for 7,606 applications

    For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

    Top 20 Most Installed Applications | Package Name | Count of Interactions | | ---------------------------------- | --------------------- | | com.instagram.android | 15292 | | ir.resaneh1.iptv | 12143 | | com.tencent.ig | 7919 | | com.ForgeGames.SpecialForcesGroup2 | 7797 | | ir.nomogame.ClutchGame | 6193 | | com.dts.freefireth | 6041 | | com.whatsapp | 5876 | | com.supercell.clashofclans | 5817 | | com.mojang.minecraftpe | 5649 | | com.lenovo.anyshare.gps | 5076 | | ir.medu.shad | 4673 | | com.firsttouchgames.dls3 | 4641 | | com.activision.callofduty.shooter | 4357 | | com.tencent.iglite | 4126 | | com.aparat | 3598 | | com.kiloo.subwaysurf | 3135 | | com.supercell.clashroyale | 2793 | | co.palang.QuizOfKings | 2589 | | com.nazdika.app | 2436 | | com.digikala | 2413 |

    Comparison with SNAP Datasets The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

    Dataset#Users#Items#InteractionsAverage Interactions per UserAverage Unique Items per User
    Myket10,0007,988694,12169.454.6
    LastFM9801,0001,293,1031,319.5158.2
    Reddit10,000984672,44767.27.9
    Wikipedia8,2271,000157,47419.12.2
    MOOC7,04797411,74958.425.3

    The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

    Citation If you use this dataset in your research, please cite the following preprint:

    @misc{loghmani2023effect, title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, author={Erfan Loghmani and MohammadAmin Fazli}, year={2023}, eprint={2308.06862}, archivePrefix={arXiv}, primaryClass={cs.LG} }

  18. Success.ai | LinkedIn Data – 700M Public Profiles & 70M Companies Full...

    • datarade.ai
    Updated Jan 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2022). Success.ai | LinkedIn Data – 700M Public Profiles & 70M Companies Full Global Dataset – Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-linkedin-data-700m-public-profiles-70m-compa-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jan 1, 2022
    Dataset provided by
    Area covered
    Finland, Sao Tome and Principe, Portugal, Bahrain, Georgia, Costa Rica, Andorra, Mali, Dominica, Cayman Islands
    Description

    Success.ai’s LinkedIn Data Solutions offer unparalleled access to a vast dataset of 700 million public LinkedIn profiles and 70 million LinkedIn company records, making it one of the most comprehensive and reliable LinkedIn datasets available on the market today. Our employee data and LinkedIn data are ideal for businesses looking to streamline recruitment efforts, build highly targeted lead lists, or develop personalized B2B marketing campaigns.

    Whether you’re looking for recruiting data, conducting investment research, or seeking to enrich your CRM systems with accurate and up-to-date LinkedIn profile data, Success.ai provides everything you need with pinpoint precision. By tapping into LinkedIn company data, you’ll have access to over 40 critical data points per profile, including education, professional history, and skills.

    Key Benefits of Success.ai’s LinkedIn Data: Our LinkedIn data solution offers more than just a dataset. With GDPR-compliant data, AI-enhanced accuracy, and a price match guarantee, Success.ai ensures you receive the highest-quality data at the best price in the market. Our datasets are delivered in Parquet format for easy integration into your systems, and with millions of profiles updated daily, you can trust that you’re always working with fresh, relevant data.

    Global Reach and Industry Coverage: Our LinkedIn data covers professionals across all industries and sectors, providing you with detailed insights into businesses around the world. Our geographic coverage spans 259M profiles in the United States, 22M in the United Kingdom, 27M in India, and thousands of profiles in regions such as Europe, Latin America, and Asia Pacific. With LinkedIn company data, you can access profiles of top companies from the United States (6M+), United Kingdom (2M+), and beyond, helping you scale your outreach globally.

    Why Choose Success.ai’s LinkedIn Data: Success.ai stands out for its tailored approach and white-glove service, making it easy for businesses to receive exactly the data they need without managing complex data platforms. Our dedicated Success Managers will curate and deliver your dataset based on your specific requirements, so you can focus on what matters most—reaching the right audience. Whether you’re sourcing employee data, LinkedIn profile data, or recruiting data, our service ensures a seamless experience with 99% data accuracy.

    • Best Price Guarantee: We offer unbeatable pricing on LinkedIn data, and we’ll match any competitor.
    • Global Scale: Access 700 million LinkedIn profiles and 70 million company records globally.
    • AI-Verified Accuracy: Enjoy 99% data accuracy through our advanced AI and manual validation processes.
    • Real-Time Data: Profiles are updated daily, ensuring you always have the most relevant insights.
    • Tailored Solutions: Get custom-curated LinkedIn data delivered directly, without managing platforms.
    • Ethically Sourced Data: Compliant with global privacy laws, ensuring responsible data usage.
    • Comprehensive Profiles: Over 40 data points per profile, including job titles, skills, and company details.
    • Wide Industry Coverage: Covering sectors from tech to finance across regions like the US, UK, Europe, and Asia.

    Key Use Cases:

    • Sales Prospecting and Lead Generation: Build targeted lead lists using LinkedIn company data and professional profiles, helping sales teams engage decision-makers at high-value accounts.
    • Recruitment and Talent Sourcing: Use LinkedIn profile data to identify and reach top candidates globally. Our employee data includes work history, skills, and education, providing all the details you need for successful recruitment.
    • Account-Based Marketing (ABM): Use our LinkedIn company data to tailor marketing campaigns to key accounts, making your outreach efforts more personalized and effective.
    • Investment Research & Due Diligence: Identify companies with strong growth potential using LinkedIn company data. Access key data points such as funding history, employee count, and company trends to fuel investment decisions.
    • Competitor Analysis: Stay ahead of your competition by tracking hiring trends, employee movement, and company growth through LinkedIn data. Use these insights to adjust your market strategy and improve your competitive positioning.
    • CRM Data Enrichment: Enhance your CRM systems with real-time updates from Success.ai’s LinkedIn data, ensuring that your sales and marketing teams are always working with accurate and up-to-date information.
    • Comprehensive Data Points for LinkedIn Profiles: Our LinkedIn profile data includes over 40 key data points for every individual and company, ensuring a complete understanding of each contact:

    LinkedIn URL: Access direct links to LinkedIn profiles for immediate insights. Full Name: Verified first and last names. Job Title: Current job titles, and prior experience. Company Information: Company name, LinkedIn URL, domain, and location. Work and Per...

  19. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 19, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of 1244.08; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  20. Living Standards Survey IV 1998-1999 - World Bank SHIP Harmonized Dataset -...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +2more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghana Statistical Service (GSS) (2019). Living Standards Survey IV 1998-1999 - World Bank SHIP Harmonized Dataset - Ghana [Dataset]. https://datacatalog.ihsn.org/catalog/2359
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Ghana Statistical Services
    Authors
    Ghana Statistical Service (GSS)
    Time period covered
    1998 - 1999
    Area covered
    Ghana
    Description

    Abstract

    Survey based Harmonized Indicators (SHIP) files are harmonized data files from household surveys that are conducted by countries in Africa. To ensure the quality and transparency of the data, it is critical to document the procedures of compiling consumption aggregation and other indicators so that the results can be duplicated with ease. This process enables consistency and continuity that make temporal and cross-country comparisons consistent and more reliable.

    Four harmonized data files are prepared for each survey to generate a set of harmonized variables that have the same variable names. Invariably, in each survey, questions are asked in a slightly different way, which poses challenges on consistent definition of harmonized variables. The harmonized household survey data present the best available variables with harmonized definitions, but not identical variables. The four harmonized data files are

    a) Individual level file (Labor force indicators in a separate file): This file has information on basic characteristics of individuals such as age and sex, literacy, education, health, anthropometry and child survival. b) Labor force file: This file has information on labor force including employment/unemployment, earnings, sectors of employment, etc. c) Household level file: This file has information on household expenditure, household head characteristics (age and sex, level of education, employment), housing amenities, assets, and access to infrastructure and services. d) Household Expenditure file: This file has consumption/expenditure aggregates by consumption groups according to Purpose (COICOP) of Household Consumption of the UN.

    Geographic coverage

    National

    Analysis unit

    • Individual level for datasets with suffix _I and _L
    • Household level for datasets with suffix _H and _E

    Universe

    The survey covered all de jure household members (usual residents).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLE DESIGN FOR ROUND 4 OF THE GLSS A nationally representative sample of households was selected in order to achieve the survey objectives.

    Sample Frame For the purposes of this survey the list of the 1984 population census Enumeration Areas (EAs) with population and household information was used as the sampling frame. The primary sampling units were the 1984 EAs with the secondary units being the households in the EAs. This frame, though quite old, was considered inadequate, it being the best available at the time. Indeed, this frame was used for the earlier rounds of the GLSS.

    Stratification In order to increase precision and reliability of the estimates, the technique of stratification was employed in the sample design, using geographical factors, ecological zones and location of residence as the main controls. Specifically, the EAs were first stratified according to the three ecological zones namely; Coastal, Forest and Savannah, and then within each zone further stratification was done based on the size of the locality into rural or urban.

    SAMPLE SELECTION EAs A two-stage sample was selected for the survey. At the first stage, 300 EAs were selected using systematic sampling with probability proportional to size method (PPS) where the size measure is the 1984 number of households in the EA. This was achieved by ordering the list of EAs with their sizes according to the strata. The size column was then cumulated, and with a random start and a fixed interval the sample EAs were selected.

    It was observed that some of the selected EAs had grown in size over time and therefore needed segmentation. In this connection, such EAs were divided into approximately equal parts, each segment constituting about 200 households. Only one segment was then randomly selected for listing of the households.

    Households At the second stage, a fixed number of 20 households was systematically selected from each selected EA to give a total of 6,000 households. Additional 5 households were selected as reserve to replace missing households. Equal number of households was selected from each EA in order to reflect the labour force focus of the survey.

    NOTE: The above sample selection procedure deviated slightly from that used for the earlier rounds of the GLSS, as such the sample is not self-weighting. This is because, 1. given the long period between 1984 and the GLSS 4 fieldwork the number of households in the various EAs are likely to have grown at different rates. 2. the listing exercise was not properly done as some of the selected EAs were not listed completely. Moreover, it was noted that the segmentation done for larger EAs during the listing was a bit arbitrary.

    Mode of data collection

    Face-to-face [f2f]

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rivalytics (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
Organization logo

Healthcare Ransomware Dataset

Analyze attacks, strengthen security, and improve recovery in healthcare

Explore at:
177 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rivalytics
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

📌 Context of the Dataset

The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

Why is this important?

Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

1️⃣ IBM Cost of a Data Breach Report (2024)

The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

2️⃣ Sophos State of Ransomware in Healthcare (2024)

67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

3️⃣ Health & Human Services (HHS) Cybersecurity Reports

Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

📌 Why is This a Simulated Dataset?

This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

How It Was Created:

1️⃣ Defining the Dataset Structure

The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

2️⃣ Generating Realistic Data Using ChatGPT & Python

ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

3️⃣ Ensuring Logical Relationships Between Data Points

Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.

Search
Clear search
Close search
Google apps
Main menu