100+ datasets found

Comprehensive Medical Q&A Dataset
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
Explore at:
zip(5126941 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

By Huggingface Hub [source]

About this dataset

The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

Research Ideas

Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.

Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.

Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
A
AI Training Dataset In Healthcare Market Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). AI Training Dataset In Healthcare Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-in-healthcare-market-5352
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
global
Variables measured
Market Size
Description
The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.
Multilingual Healthcare Text Dataset (Hi, En, Pu)
kaggle.com
zip
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kajol Bagga (2025). Multilingual Healthcare Text Dataset (Hi, En, Pu) [Dataset]. https://www.kaggle.com/datasets/kajolagga/multilingual-healthcare-text-dataset-hi-en-pu
Explore at:
zip(421647 bytes)Available download formats
Dataset updated
Feb 13, 2025
Authors
Kajol Bagga
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains three healthcare datasets in Hindi and Punjabi, translated from English. The datasets cover medical diagnoses, disease names, and related healthcare information. The data has been carefully cleaned and formatted to ensure accuracy and usability for various applications, including machine learning, NLP, and healthcare analysis.

Diagnosis: Description of the medical condition or disease. Symptoms: List of symptoms associated with the diagnosis. Treatment: Common treatments or recommended procedures. Severity: Severity level of the disease (e.g., mild, moderate, severe). Risk Factors: Known risk factors associated with the condition. Language: Specifies the language of the dataset (Hindi, Punjabi, or English). The purpose of these datasets is to facilitate research and development in regional language processing, especially in the healthcare sector.

Column Descriptions: Original Data Columns: patient_id – Unique identifier for each patient. age – Age of the patient. gender – Gender of the patient (e.g., Male/Female/Other). Diagnosis – The diagnosed medical condition or disease. Remarks – Additional notes or comments from the doctor. doctor_id – Unique identifier for the doctor treating the patient. Patient History – Medical history of the patient, including previous conditions. age_group – Categorized age group (e.g., Child, Adult, Senior). gender_numeric – Numeric encoding for gender (e.g., 0 = Female, 1 = Male). symptoms – List of symptoms reported by the patient. treatment – Recommended treatment or medication. timespan – Duration of the illness or treatment period. Diagnosis Category – General category of the diagnosis (e.g., Cardiovascular, Neurological). Pseudonymized Data Columns: These columns replace personally identifiable information with anonymized versions for privacy compliance:

Pseudonymized_patient_id – An anonymized patient identifier. Pseudonymized_age – Anonymized age value. Pseudonymized_gender – Anonymized gender field. Pseudonymized_Diagnosis – Diagnosis field with anonymized identifiers. Pseudonymized_Remarks – Anonymized doctor notes. Pseudonymized_doctor_id – Anonymized doctor identifier. Pseudonymized_Patient History – Anonymized version of patient history. Pseudonymized_age_group – Anonymized version of age groups. Pseudonymized_gender_numeric – Anonymized numeric encoding of gender. Pseudonymized_symptoms – Anonymized symptom descriptions. Pseudonymized_treatment – Anonymized treatment descriptions. Pseudonymized_timespan – Anonymized illness/treatment duration. Pseudonymized_Diagnosis Category – Anonymized category of diagnosis.
o
Public Health Portfolio (Directly Funded Research - Programmes and Training...
nihr.opendatasoft.com
nihr.aws-ec2-eu-central-1.opendatasoft.com
csv, excel, json
Updated Nov 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Public Health Portfolio (Directly Funded Research - Programmes and Training Awards) [Dataset]. https://nihr.opendatasoft.com/explore/dataset/phof-datase/
Explore at:
excel, json, csvAvailable download formats
Dataset updated
Nov 4, 2025
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Public Health Portfolio (Directly Funded Research - Programme and Training Awards) dataset contains NIHR directly funded research awards where the funding is allocated to an award holder or host organisation to carry out a specific piece of research or complete a training award. The NIHR also invests significantly in centres of excellence, collaborations, services and facilities to support research in England. Collectively these form NIHR infrastructure support. NIHR infrastructure supported projects are available in the Public Health Portfolio (Infrastructure Support) dataset which you can find here.NIHR directly funded research awards (Programmes and Training Awards) that were funded between January 2006 and the present extraction date are eligible for inclusion in this dataset. An agreed inclusion/exclusion criteria is used to categorise awards as public health awards (see below). Following inclusion in the dataset, public health awards are second level coded to one of the four Public Health Outcomes Framework domains. These domains are: (1) wider determinants (2) health improvement (3) health protection (4) healthcare and premature mortality.More information on the Public Health Outcomes Framework domains can be found here.This dataset is updated quarterly to include new NIHR awards categorised as public health awards. Please note that for those Public Health Research Programme projects showing an Award Budget of £0.00, the project is undertaken by an on-call team for example, PHIRST, Public Health Review Team, or Knowledge Mobilisation Team, as part of an ongoing programme of work.Inclusion CriteriaThe NIHR Public Health Overview project team worked with colleagues across NIHR public health research to define the inclusion criteria for NIHR public health research. NIHR directly funded research awards are categorised as public health if they are determined to be ‘investigations of interventions in, or studies of, populations that are anticipated to have an effect on health or on health inequity at a population level.’ This definition of public health is intentionally broad to capture the wide range of NIHR public health research across prevention, health improvement, health protection, and healthcare services (both within and outside of NHS settings). This dataset does not reflect the NIHR’s total investment in public health research. The intention is to showcase a subset of the wider NIHR public health portfolio. This dataset includes NIHR directly funded research awards categorised as public health awards. This dataset does not include public health awards or projects funded by any of the three NIHR Research Schools or NIHR Health Protection Research Units.DisclaimersUsers of this dataset should acknowledge the broad definition of public health that has been used to develop the inclusion criteria for this dataset. Please note that this dataset is currently subject to a limited data quality review. We are working to improve our data collection methodologies. Please also note that some awards may also appear in other NIHR curated datasets. Further InformationFurther information on the individual awards shown in the dataset can be found on the NIHR’s Funding & Awards website here. Further information on individual NIHR Research Programme’s decision making processes for funding health and social care research can be found here.Further information on NIHR’s investment in public health research can be found as follows:The NIHR is one of the main funders of public health research in the UK. Public health research falls within the remit of a range of NIHR Directly Funded Research (Programmes and Training Awards), and NIHR Infrastructure Support. NIHR School for Public Health here.NIHR Public Health Policy Research Unit here. NIHR Health Protection Research Units here.NIHR Public Health Research Programme Health Determinants Research Collaborations (HDRC) here.NIHR Public Health Research Programme Public Health Intervention Responsive Studies Teams (PHIRST) here.
Healthcare Patient Satisfaction - Data Collection
kaggle.com
zip
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KagglePro (2023). Healthcare Patient Satisfaction - Data Collection [Dataset]. https://www.kaggle.com/datasets/kaggleprollc/healthcare-patient-satisfaction-data-collection
Explore at:
zip(42995888 bytes)Available download formats
Dataset updated
Sep 21, 2023
Authors
KagglePro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the U.S., every hospital that receives payments from Medicare and Medicaid is mandated to provide quality data to The Centers for Medicare and Medicaid Services (CMS) annually. This data helps gauge patient satisfaction levels across the country. While overall hospital scores can be influenced by the quality of customer services, there may also be variations in satisfaction based on the type of hospital or its location.

Year: 2016 - 2020

The Star Rating Program, implemented by The Centers for Medicare & Medicaid Services (CMS), employs a five-star grading system to evaluate the experiences of Medicare beneficiaries with their respective health plans and the overall healthcare system. Health plans receive scores ranging from 1 to 5 stars, with 5 stars denoting the highest quality.

Benefits:

Historical Analysis: With data spanning from 2016 to 2020, researchers and analysts can observe trends over time, understanding how patient satisfaction has evolved over these years.

Benchmarking: Hospitals can compare their performance against national averages or against peer institutions to see where they stand.

Identifying Areas for Improvement: By analyzing specific metrics and feedback, hospitals can pinpoint areas where their services may be lacking and need enhancement.

Policy and Decision Making: Governments and healthcare administrators can use the data to make informed decisions about healthcare policies, funding allocations, and other strategic decisions.

Research and Academic Purposes: Academics and researchers can use the dataset for various studies, including correlational studies, predictions, and more.

Geographical Insights: The dataset may provide insights into regional variations in patient satisfaction, helping to identify areas or states with particularly high or low scores.

Understanding Factors Affecting Satisfaction: By correlating satisfaction scores with other variables (e.g., hospital type, size, location), it might be possible to determine which factors play the most significant role in patient satisfaction.

Performance Evaluation: Hospitals can use the data to evaluate the efficacy of any interventions or changes they've made over the years in terms of improving patient satisfaction.

Enhancing Patient Trust: Demonstrating transparency and a commitment to improvement can enhance patient trust and loyalty.

Informed Patients: By making such data publicly available, potential patients can make more informed decisions about where to seek care based on the satisfaction ratings of previous patients.

Source: https://data.cms.gov/provider-data/archived-data/hospitals
d
Health Plan Prior Authorization Data
catalog.data.gov
data.wa.gov
+2more
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.wa.gov (2024). Health Plan Prior Authorization Data [Dataset]. https://catalog.data.gov/dataset/health-plan-prior-authorization-data
Explore at:
Dataset updated
Dec 20, 2024
Dataset provided by
data.wa.gov
Description
In 2020, the Washington State Legislature enacted Engrossed Substitute Senate Bill (ESSB) 6404 (Chapter 316, Laws of 2020, codified at RCW 48.43.0161), which requires that health carriers with at least one percent of the market share in Washington State annually report certain aggregated and de-identified data related to prior authorization to the Office of the Insurance Commissioner (OIC). Prior authorization is a utilization review tool used by carriers to review the medical necessity of requested health care services for specific health plan enrollees. Carriers choose the services that are subject to prior authorization review. The reported data includes prior authorization information for the following categories of health services: • Inpatient medical/surgical • Outpatient medical/surgical • Inpatient mental health and substance use disorder • Outpatient mental health and substance use disorder • Diabetes supplies and equipment • Durable medical equipment The carriers must report the following information for the prior plan year (PY) for their individual and group health plans for each category of services: • The 10 codes with the highest number of prior authorization requests and the percent of approved requests. • The 10 codes with the highest percentage of approved prior authorization requests and the total number of requests. • The 10 codes with the highest percentage of prior authorization requests that were initially denied and then approved on appeal and the total number of such requests. Carriers also must include the average response time in hours for prior authorization requests and the number of requests for each covered service in the lists above for: • Expedited decisions. • Standard decisions. • Extenuating-circumstances decisions. Engrossed Second Substitute House Bill 1357 added additional prescription drug prior authorization reporting requirements for health carriers beginning in reporting year 2024. Carriers were provided the opportunity to submit voluntary prescription drug prior authorization data for the 2023 reporting period. Prescription drug reporting was required for the 2024 reporting period.
h
A granular assessment of the day-to-day variation in emergency presentations...
healthdatagateway.org
unknown
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). A granular assessment of the day-to-day variation in emergency presentations [Dataset]. https://healthdatagateway.org/en/dataset/175
Explore at:
unknownAvailable download formats
Dataset updated
Oct 8, 2024
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
The acute-care pathway (from the emergency department (ED) through acute medical units or ambulatory care and on to wards) is the most visible aspect of the hospital health-care system to most patients. Acute hospital admissions are increasing yearly and overcrowded emergency departments and high bed occupancy rates are associated with a range of adverse patient outcomes. Predicted growth in demand for acute care driven by an ageing population and increasing multimorbidity is likely to exacerbate these problems in the absence of innovation to improve the processes of care.

Key targets for Emergency Medicine services are changing, moving away from previous 4-hour targets. This will likely impact the assessment of patients admitted to hospital through Emergency Departments.

This data set provides highly granular patient level information, showing the day-to-day variation in case mix and acuity. The data includes detailed demography, co-morbidity, symptoms, longitudinal acuity scores, physiology and laboratory results, all investigations, prescriptions, diagnoses and outcomes. It could be used to develop new pathways or understand the prevalence or severity of specific disease presentations.

PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix.

Electronic Health Record: University Hospital Birmingham is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

Scope: All patients with a medical emergency admitted to hospital, flowing through the acute medical unit. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes patient demographics, co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care (timings, admissions, wards and readmissions), physiology readings (NEWS2 score and clinical frailty scale), Charlson comorbidity index and time dimensions.

Available supplementary data: Matched controls; ambulance data, OMOP data, synthetic data.

Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
C
Healthcare Payments Data Snapshot
data.chhs.ca.gov
data.ca.gov
+3more
csv, pdf, zip
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Healthcare Payments Data Snapshot [Dataset]. https://data.chhs.ca.gov/dataset/healthcare-payments-data-snapshot
Explore at:
zip, pdf(458278), csv(907195), csv(107962), csv(1023), pdf(218738), csv(769), pdf(245152), csv(4432152), csv(1003)Available download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
This dataset contains data for the Healthcare Payments Data (HPD) Snapshot visualization. The Enrollment data file contains counts of claims and encounter data collected for California's statewide HPD Program. It includes counts of enrollment records, service records from medical and pharmacy claims, and the number of individuals represented across these records. Aggregate counts are grouped by payer type (Commercial, Medi-Cal, or Medicare), product type, and year. The Medical data file contains counts of medical procedures from medical claims and encounter data in HPD. Procedures are categorized using claim line procedure codes and grouped by year, type of setting (e.g., outpatient, laboratory, ambulance), and payer type. The Pharmacy data file contains counts of drug prescriptions from pharmacy claims and encounter data in HPD. Prescriptions are categorized by name and drug class using the reported National Drug Code (NDC) and grouped by year, payer type, and whether the drug dispensed is branded or a generic.
t
Complex Healthcare Dataset - Dataset - LDM
service.tib.eu
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Complex Healthcare Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/complex-healthcare-dataset
Explore at:
Dataset updated
Jan 2, 2025
Description
The dataset used in this paper is a complex healthcare dataset, which includes various attributes such as medical coding, laboratory reports, imaging procedures, payment claims, and public health databases.
Health, United States
catalog.data.gov
healthdata.gov
+3more
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Health, United States [Dataset]. https://catalog.data.gov/dataset/health-united-states-e04e6
Explore at:
Dataset updated
Apr 23, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Area covered
United States
Description
Health, United States is the report on the health status of the country. Every year, the report presents an overview of national health trends organized around four subject areas: health status and determinants, utilization of health resources, health care resources, and health care expenditures and payers.
d
Office-based Health Care Providers Database
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the National Coordinator for Health Information Technology (2025). Office-based Health Care Providers Database [Dataset]. https://catalog.data.gov/dataset/office-based-health-care-providers-database
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
Office of the National Coordinator for Health Information Technologyhttp://healthit.gov/
Description
ONC uses the SK&A Office-based Provider Database to calculate the counts of medical doctors, doctors of osteopathy, nurse practitioners, and physician assistants at the state and count level from 2011 through 2013. These counts are grouped as a total, as well as segmented by each provider type and separately as counts of primary care providers.
d
Call Center Metrics for the Health Service System
catalog.data.gov
data.sfgov.org
+2more
Updated Mar 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). Call Center Metrics for the Health Service System [Dataset]. https://catalog.data.gov/dataset/call-center-metrics-for-the-health-service-system
Explore at:
Dataset updated
Mar 29, 2025
Dataset provided by
data.sfgov.org
Description
This dataset captures monthly data from HSS' phone system and includes metrics pertaining to Calls Answered, Average Speed of Answer, Abandonment Rate, In-person Assistance. This data supports the City's Performance Measures requirements. In April of 2023 HSS switched to a new phone system - WEBEX (Finess).
d
Revolutionizing Healthcare Through Information Technology
catalog.data.gov
s.cnmilf.com
+1more
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCO NITRD (2025). Revolutionizing Healthcare Through Information Technology [Dataset]. https://catalog.data.gov/dataset/revolutionizing-healthcare-through-information-technology
Explore at:
Dataset updated
May 14, 2025
Dataset provided by
NCO NITRD
Description
The Presidents Information Technology Advisory Committee PITAC is appointed by the President to provide independent expert advice on maintaining Americas preeminence in advanced information technology IT. PITAC members are IT leaders in industry and academia with expertise relevant to critical elements of the national information infrastructure such as high-performance computing, large-scale networking, and high-assurance software and systems design. The Committees studies help guide the Administrations efforts to accelerate the development and adoption of information technologies vital for American prosperity in the 21st century.
C
Pre-2012 Home Health Agencies & Hospice Annual Utilization Report - Complete...
data.chhs.ca.gov
healthdata.gov
+4more
html, pdf, txt, xls +2
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Pre-2012 Home Health Agencies & Hospice Annual Utilization Report - Complete Data Set [Dataset]. https://data.chhs.ca.gov/dataset/pre-2012-home-health-agencies-hospice-annual-utilization-report-complete-data-set
Explore at:
pdf, txt, xls, xlsx, zip, htmlAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
Home Health Agencies (HHA) provide at home skilled nursing, personal care and therapeutic services. Hospices provide palliative care and alleviate the physical, emotional, social and spiritual discomforts of an individual who is experiencing the last phases of life due to the existence of a terminal disease. In addition, hospices provide supportive care for the primary care giver and the family of the hospice patient. Home health agencies and hospices submit an annual utilization report to the Office at the end of each calendar year. The report includes information on services capacity, visits, utilization, patient characteristics, and capital/equipment expenditures, and gross revenues. The documentation, including report forms, is available for each reporting year.
g
Medical Staff People Tracking Dataset
gts.ai
json
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Globose Technology Solutions Pvt. Ltd. (2023). Medical Staff People Tracking Dataset [Dataset]. https://gts.ai/dataset-download/de-identified-dictation-notes/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2023
Dataset authored and provided by
Globose Technology Solutions Pvt. Ltd.
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Medical Staff People Tracking Dataset provides high-quality, anonymized clinical and movement data of healthcare personnel in medical environments. It is designed to support AI and ML models for hospital workflow optimization, safety monitoring, and activity analysis while ensuring privacy and compliance.
Health, lifestyle, health care use and supply, causes of death; key figures
data.overheid.nl
cbs.nl
atom, json
Updated Apr 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centraal Bureau voor de Statistiek (Rijk) (2025). Health, lifestyle, health care use and supply, causes of death; key figures [Dataset]. https://data.overheid.nl/dataset/4268-health--lifestyle--health-care-use-and-supply--causes-of-death--key-figures
Explore at:
atom(KB), json(KB)Available download formats
Dataset updated
Apr 7, 2025
Dataset provided by
Centraal Bureau voor de Statistiek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This table provides an overview of the key figures on health and care available on StatLine. All figures are taken from other tables on StatLine, either directly or through a simple conversion. In the original tables, breakdowns by characteristics of individuals or other variables are possible. The period after the year of review before data become available differs between the data series. The number of exam passes/graduates in year t is the number of persons who obtained a diploma in school/study year starting in t-1 and ending in t.

Data available from: 2001

Status of the figures:

2024: Most available figures are definite. Figures are provisional for: - causes of death; - youth care; - persons employed in health and welfare; - persons employed in healthcare; - Mbo health care graduates; - Hbo nursing graduates / medicine graduates (university).

2023: Most available figures are definite. Figures are provisional for: - perinatal mortality at pregnancy duration at least 24 weeks; - diagnoses known to the general practitioner; - hospital admissions by some diagnoses; - average period of hospitalisation; - supplied drugs; - AWBZ/Wlz-funded long term care; - physicians and nurses employed in care; - persons employed in health and welfare; - average distance to facilities; - profitability and operating results at institutions. Figures are revised provisional for: - expenditures on health and welfare.

2022: Most available figures are definite. Figures are revised provisional for: - expenditures on health and welfare.

2021: Most available figures are definite, Figures are revised provisional for: - expenditures on health and welfare.f

2020 and earlier: All available figures are definite.

Changes as of 4 July 2025: More recent figures have been added for: - causes of death; - life expectancy; - life expectancy in perceived good health; - self-perceived health; - hospital admissions by some diagnoses; - sickness absence; - average period of hospitalisation; - contacts with health professionals; - youth care; - smoking, heavy drinkers, physical activity; - overweight; - high blood pressure; - physicians and nurses employed in care; - persons employed in health and welfare; - persons employed in healthcare; - Mbo health care graduates; - Hbo nursing graduates / medicine graduates (university); - expenditures on health and welfare; - profitability and operating results at institutions.

Changes as of 18 december 2024: - Distance to facilities: the figures withdrawn on 5 June have been replaced (unchanged). - Youth care: the previously published final results for 2021 and 2022 have been adjusted due to improvements in the processing. - Due to a revision of the statistics Expenditure on health and welfare 2021, figures for expenditure on health and welfare care have been replaced from 2021 onwards. - Due to the revision of the National Accounts, the figures on persons employed in health and welfare have been replaced for all years. - AWBZ/Wlz-funded long term care: from 2015, the series Wlz residential care including total package at home has been replaced by total Wlz care. This series fits better with the chosen demarcation of indications for Wlz care.

When will new figures be published? New figures will be published in December 2025.
EMRBots: a 10,000-patient database
figshare.com
zip
Updated Sep 3, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uri Kartoun (2018). EMRBots: a 10,000-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040060.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7040060.v3
Dataset updated
Sep 3, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Uri Kartoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A 10,000-patient database that contains in total 10,000 virtual patients, 36,143 admissions, and 10,726,505 lab observations.
p
EHRCon: Dataset for Checking Consistency between Unstructured Notes and...
physionet.org
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeonsu Kwon; Jiho Kim; Gyubok Lee; Seongsu Bae; Daeun Kyung; Wonchul Cha; Tom Pollard; Alistair Johnson; Edward Choi (2025). EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records [Dataset]. http://doi.org/10.13026/m4vd-y789
Explore at:
Unique identifier
https://doi.org/10.13026/m4vd-y789
Dataset updated
Mar 19, 2025
Authors
Yeonsu Kwon; Jiho Kim; Gyubok Lee; Seongsu Bae; Daeun Kyung; Wonchul Cha; Tom Pollard; Alistair Johnson; Edward Choi
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon, a new dataset and task specifically designed to ensure data consistency between structured tables and unstructured notes in EHRs. EHRCon was crafted in collaboration with healthcare professionals using the MIMIC-III EHR dataset, and includes manual annotations of 4,101 entities across 105 clinical notes checked against database entries for consistency. EHRCon has two versions, one using the original MIMIC-III schema, and another using the OMOP CDM schema, in order to increase its applicability and generalizability.
d
EHR Developers Reported by Health Care Providers Participating in Federal...
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the National Coordinator for Health Information Technology (2025). EHR Developers Reported by Health Care Providers Participating in Federal Programs [Dataset]. https://catalog.data.gov/dataset/ehr-developers-reported-by-health-care-providers-participating-in-federal-programs
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
Office of the National Coordinator for Health Information Technologyhttp://healthit.gov/
Description
The Medicare & Medicaid Electronic Health Record (EHR) Incentive Programs provide incentives to eligible ambulatory and inpatient providers to adopt electronic health records. This dataset provides the counts of health care providers that have reported a developer's product through participation in the Medicare EHR Incentive Program. The data are provided beginning in 2011. This dataset enables the tracking of trends in the adoption of healthIT by developer and by both office-based health care providers and non-federal acute-care hospitals. Filter the data by Program Year to get the most recent counts by health care provider type. The most recent data is available through the 2016 Program Year.
Australian synthetic healthcare data with Synthea
data.csiro.au
researchdata.edu.au
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahima Diouf; Mitchell O'Brien; Hamed Hassanzadeh; Donna Truran; Hoa Ngo; Parnesh Raniga; Denis Bauer; David Hansen; Sankalp Khanna; Roc Reguant Comellas; Michael Lawley; John Grimes (2024). Australian synthetic healthcare data with Synthea [Dataset]. http://doi.org/10.25919/efcw-bm49
Explore at:
Unique identifier
https://doi.org/10.25919/efcw-bm49
Dataset updated
Jul 4, 2024
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Ibrahima Diouf; Mitchell O'Brien; Hamed Hassanzadeh; Donna Truran; Hoa Ngo; Parnesh Raniga; Denis Bauer; David Hansen; Sankalp Khanna; Roc Reguant Comellas; Michael Lawley; John Grimes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Australia
Dataset funded by
CSIROhttp://www.csiro.au/
Description
We developed an Australianised version of Synthea. Synthea is a synthetic data generation software that uses publicly available population aggregate statistics such as demographics, disease prevalence and incidence rates, and health reports. Synthea generates data based on manually curated models of clinical workflows and disease progression that cover a patient’s entire life and does not use real patient data; guaranteeing a completely synthetic dataset. We generated 117,258 synthetic patients from Queensland.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

zip(5126941 bytes)Available download formats

Dataset updated

Nov 24, 2023

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

By Huggingface Hub [source]

About this dataset

The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

Research Ideas

Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.

Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.

Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

Clear search

Close search

Google apps

Main menu

Comprehensive Medical Q&A Dataset

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

AI Training Dataset In Healthcare Market Report

Multilingual Healthcare Text Dataset (Hi, En, Pu)

Public Health Portfolio (Directly Funded Research - Programmes and Training...

Healthcare Patient Satisfaction - Data Collection

Health Plan Prior Authorization Data

A granular assessment of the day-to-day variation in emergency presentations...

Healthcare Payments Data Snapshot

Complex Healthcare Dataset - Dataset - LDM

Health, United States

Office-based Health Care Providers Database

Call Center Metrics for the Health Service System

Revolutionizing Healthcare Through Information Technology

Pre-2012 Home Health Agencies & Hospice Annual Utilization Report - Complete...

Medical Staff People Tracking Dataset

Health, lifestyle, health care use and supply, causes of death; key figures

EMRBots: a 10,000-patient database

EHRCon: Dataset for Checking Consistency between Unstructured Notes and...

EHR Developers Reported by Health Care Providers Participating in Federal...

Australian synthetic healthcare data with Synthea

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements