100+ datasets found

g
Healthcare Dataset
gts.ai
json
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.
Comprehensive Medical Q&A Dataset
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
Explore at:
zip(5126941 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

By Huggingface Hub [source]

About this dataset

The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

Research Ideas

Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.

Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.

Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
healthcare-dataset-stroke-data
kaggle.com
zip
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aouatif Cherdid (2023). healthcare-dataset-stroke-data [Dataset]. https://www.kaggle.com/datasets/aouatifcherdid/healthcare-dataset-stroke-data
Explore at:
zip(69007 bytes)Available download formats
Dataset updated
Dec 3, 2023
Authors
Aouatif Cherdid
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Aouatif Cherdid

Released under CC0: Public Domain

Contents
g
Medical Staff People Tracking Dataset
gts.ai
json
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Globose Technology Solutions Pvt. Ltd. (2023). Medical Staff People Tracking Dataset [Dataset]. https://gts.ai/dataset-download/de-identified-dictation-notes/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2023
Dataset authored and provided by
Globose Technology Solutions Pvt. Ltd.
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Medical Staff People Tracking Dataset provides high-quality, anonymized clinical and movement data of healthcare personnel in medical environments. It is designed to support AI and ML models for hospital workflow optimization, safety monitoring, and activity analysis while ensuring privacy and compliance.
G
Open Database of Healthcare Facilities
open.canada.ca
catalogue.arctic-sdi.org
csv, esri rest +4
Updated Mar 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). Open Database of Healthcare Facilities [Dataset]. https://open.canada.ca/data/en/dataset/a1bcd4ee-8e57-499b-9c6f-94f6902fdf32
Explore at:
fgdb/gdb, esri rest, csv, html, pdf, wmsAvailable download formats
Dataset updated
Mar 2, 2022
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The Open Database of Healthcare Facilities (ODHF) is a collection of open data containing the names, types, and locations of health facilities across Canada. It is released under the Open Government License - Canada. The ODHF compiles open, publicly available, and directly-provided data on health facilities across Canada. Data sources include regional health authorities, provincial, territorial and municipal governments, and public health and professional healthcare bodies. This database aims to provide enhanced access to a harmonized listing of health facilities across Canada by making them available as open data. This database is a component of the Linkable Open Data Environment (LODE).
g
Global Health Statistics Dataset
gts.ai
json
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2025). Global Health Statistics Dataset [Dataset]. https://gts.ai/dataset-download/global-health-statistics-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jan 24, 2025
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Access comprehensive global health data on disease prevalence, mortality rates, treatment effectiveness, and healthcare infrastructure.
Healthcare Diabetes Dataset
kaggle.com
zip
Updated Aug 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandita Pore (2023). Healthcare Diabetes Dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/healthcare-diabetes
Explore at:
zip(27316 bytes)Available download formats
Dataset updated
Aug 23, 2023
Authors
Nandita Pore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description: Welcome to the Diabetes Prediction Dataset, a valuable resource for researchers, data scientists, and medical professionals interested in the field of diabetes risk assessment and prediction. This dataset contains a diverse range of health-related attributes, meticulously collected to aid in the development of predictive models for identifying individuals at risk of diabetes. By sharing this dataset, we aim to foster collaboration and innovation within the data science community, leading to improved early diagnosis and personalized treatment strategies for diabetes.

Columns: 1. Id: Unique identifier for each data entry. 2. Pregnancies: Number of times pregnant. 3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test. 4. BloodPressure: Diastolic blood pressure (mm Hg). 5. SkinThickness: Triceps skinfold thickness (mm). 6. Insulin: 2-Hour serum insulin (mu U/ml). 7. BMI: Body mass index (weight in kg / height in m^2). 8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes. 9. Age: Age in years. 10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.

Utilize this dataset to explore the relationships between various health indicators and the likelihood of diabetes. You can apply machine learning techniques to develop predictive models, feature selection strategies, and data visualization to uncover insights that may contribute to more accurate risk assessments. As you embark on your journey with this dataset, remember that your discoveries could have a profound impact on diabetes prevention and management.

Please ensure that you adhere to ethical guidelines and respect the privacy of individuals represented in this dataset. Proper citation and recognition of this dataset's source are appreciated to promote collaboration and knowledge sharing.

Start your exploration of the Diabetes Prediction Dataset today and contribute to the ongoing efforts to combat diabetes through data-driven insights and innovations.
R
Healthcare Dataset
universe.roboflow.com
zip
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
healthcare (2025). Healthcare Dataset [Dataset]. https://universe.roboflow.com/healthcare-cditm/healthcare-pann5/model/6
Explore at:
zipAvailable download formats
Dataset updated
May 12, 2025
Dataset authored and provided by
healthcare
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects
Description
Healthcare

## Overview Healthcare is a dataset for computer vision tasks - it contains Objects annotations for 302 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...
zenodo.org
bin, csv, zip
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
Explore at:
zip, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10079370
Dataset updated
Jan 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.
m
EHR Dataset for Patient Treatment Classification
data.mendeley.com
Updated May 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mujiono Sadikin (2020). EHR Dataset for Patient Treatment Classification [Dataset]. http://doi.org/10.17632/7kv3rctx7m.1
Explore at:
Unique identifier
https://doi.org/10.17632/7kv3rctx7m.1
Dataset updated
May 10, 2020
Authors
Mujiono Sadikin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.
Healthcare Management System
kaggle.com
zip
Updated Dec 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
Explore at:
zip(74279 bytes)Available download formats
Dataset updated
Dec 23, 2023
Authors
Anouska Abhisikta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Patients Table:

PatientID: Unique identifier for each patient.

firstname: First name of the patient.

lastname: Last name of the patient.

email: Email address of the patient.

This table stores information about individual patients, including their names and contact details.

Doctors Table:

DoctorID: Unique identifier for each doctor.

DoctorName: Full name of the doctor.

Specialization: Area of medical specialization.

DoctorContact: Contact details of the doctor.

This table contains details about healthcare providers, including their names, specializations, and contact information.

Appointments Table:

AppointmentID: Unique identifier for each appointment.

Date: Date of the appointment.

Time: Time of the appointment.

PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.

DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

This table records scheduled appointments, linking patients to doctors.

MedicalProcedure Table:

ProcedureID: Unique identifier for each medical procedure.

ProcedureName: Name or description of the medical procedure.

AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

This table stores details about medical procedures associated with specific appointments.

Billing Table:

InvoiceID: Unique identifier for each billing transaction.

PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.

Items: Description of items or services billed.

Amount: Amount charged for the billing transaction.

This table maintains records of billing transactions, associating them with specific patients.

demo Table:

ID: Primary key, serves as a unique identifier for each record.

Name: Name of the entity.

Hint: Additional information or hint about the entity.

This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.
e
Kenya - Healthcare Facilities - Dataset - ENERGYDATA.INFO
energydata.info
Updated Nov 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Kenya - Healthcare Facilities - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/kenya-healthcare-facilities
Explore at:
Dataset updated
Nov 28, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Kenya
Description
Data on healthcare facility locations in Kenya. The dataset was provided by the Government of Kenya.
c
Mental Health - Datasets - CTData.org
data.ctdata.org
Updated Jun 24, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Mental Health - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/mental-health
Explore at:
Dataset updated
Jun 24, 2016
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mental Health reports the prevalence of the mental illness in the past year by age range.
Reddit Mental Health Dataset
zenodo.org
csv
Updated Oct 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel M. Low; Daniel M. Low; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh (2020). Reddit Mental Health Dataset [Dataset]. http://doi.org/10.17605/osf.io/7peyq
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.17605/osf.io/7peyq
Dataset updated
Oct 16, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel M. Low; Daniel M. Low; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020 and included older timeframes to obtain baseline posts before COVID-19.

Please cite if you use this dataset:

Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., & Talkar, T. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. Journal of medical Internet research, 22(10), e22635.

@article{low2020natural, title={Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study}, author={Low, Daniel M and Rumker, Laurie and Torous, John and Cecchi, Guillermo and Ghosh, Satrajit S and Talkar, Tanya}, journal={Journal of medical Internet research}, volume={22}, number={10}, pages={e22635}, year={2020}, publisher={JMIR Publications Inc., Toronto, Canada} }

License

This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/

It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.

Reddit Mental Health Dataset

Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:

15 specific mental health support groups (r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)

2 broad mental health subreddits (r/mentalhealth, r/COVID19_support)

11 non-mental health subreddits (r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).

filenames and corresponding timeframes:

post: Jan 1 to April 20, 2020 (called "mid-pandemic" in manuscript; r/COVID19_support appears). Unique users: 320,364.

pre: Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts. Unique users: 327,289.

2019: Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match post data. Unique users: 282,560.

2018: Jan 1 to April 20, 2018. A control for seasonal fluctuations to match post data. Unique users: 177,089

Unique users across all time windows (pre and 2019 overlap): 826,961.

See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.

Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.
m
Data from: Dataset of health insurance portfolio
data.mendeley.com
producciocientifica.uv.es
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josep Lledó (2025). Dataset of health insurance portfolio [Dataset]. http://doi.org/10.17632/386vmj2tbk.4
Explore at:
Unique identifier
https://doi.org/10.17632/386vmj2tbk.4
Dataset updated
Nov 26, 2025
Authors
Josep Lledó
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is formatted as a spreadsheet, encompassing the primary activities over a span of three full years (2017, 2018 and 2019) concerning non-life health insurance portfolio. This dataset comprises 228,711 rows and 42 columns. Each row signifies a insured (individual) policy, while each column represents a distinct variable.
Data from: UK Health Accounts
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). UK Health Accounts [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/datasets/healthaccountsreferencetables
Explore at:
xlsxAvailable download formats
Dataset updated
Apr 30, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
United Kingdom
Description
UK healthcare expenditure data by financing scheme, function and provider, and additional analyses produced to internationally standardised definitions.
p
MIMIC-III Clinical Database
physionet.org
oppositeofnorth.com
Updated Sep 4, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Tom Pollard; Roger Mark (2016). MIMIC-III Clinical Database [Dataset]. http://doi.org/10.13026/C2XW26
Explore at:
Unique identifier
https://doi.org/10.13026/C2XW26
Dataset updated
Sep 4, 2016
Authors
Alistair Johnson; Tom Pollard; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.
MedQuAD: Medical Question-Answer Dataset
kaggle.com
zip
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afroz (2024). MedQuAD: Medical Question-Answer Dataset [Dataset]. https://www.kaggle.com/datasets/pythonafroz/medquad-medical-question-answer-for-ai-research
Explore at:
zip(5188686 bytes)Available download formats
Dataset updated
Sep 7, 2024
Authors
Afroz
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Medical Questions: Unveiling the MedQuAD Dataset

Have you ever wondered where medical chatbots or intelligent search engines for health information get their knowledge? The answer lies in large datasets like MedQuAD! This rich resource provides a treasure trove of real-world medical questions and informative answers, paving the way for advancements in Natural Language Processing (NLP) and Information Retrieval (IR) within the healthcare domain.

What is MedQuAD?

MedQuAD, short for Medical Question Answering Dataset, is a collection of question-answer pairs meticulously curated from 12 trusted National Institutes of Health (NIH) websites. These websites cover a wide range of health topics, from cancer.gov to GARD (Genetic and Rare Diseases Information Resource).

What makes MedQuAD unique?

Beyond the sheer volume of data, MedQuAD offers unique features that empower researchers and developers:

Diversity of Questions: MedQuAD encompasses a spectrum of 37 question types, ranging from treatment options and diagnosis inquiries to understanding side effects. This variety reflects the diverse needs of individuals seeking medical information.

Focus on Specific Entities: MedQuAD goes beyond just questions and answers. It delves deeper by associating each question with the entity it focuses on, such as diseases, drugs, or other medical tests. This targeted approach facilitates more focused research and NLP applications.

Rich Annotations: While the answers from MedlinePlus collections are excluded due to copyright restrictions, MedQuAD retains valuable annotations within its XML files. These annotations include question type, synonyms, unique identifiers (CUI) for medical concepts, and semantic types. This additional information opens doors for more sophisticated NLP tasks.

The Power of MedQuAD

MedQuAD serves as a valuable springboard for various applications in the medical NLP and IR field. Here are some potential uses:

Training Chatbots and Virtual Assistants: AI-powered medical chatbots can leverage MedQuAD to learn how to respond accurately and informatively to a wide range of health inquiries from users.

Developing Intelligent Search Engines: Search engines can be enhanced to provide more relevant and accurate health information by drawing insights from the question types and focuses presented in MedQuAD.

Studying User Concerns in Healthcare: Analyzing the types of questions within MedQuAD can reveal valuable insights into what information users are most interested in and what areas require clearer explanations.

In essence, MedQuAD is a powerful tool for unlocking the potential of NLP and IR in the medical domain. By leveraging this rich dataset, researchers and developers are paving the way for a future where individuals can access accurate and comprehensive health information with increasing ease and efficiency.

Reference:

If you use the MedQuAD dataset or the associated QA test collection, please cite the following paper: Ben Abacha, A., & Demner-Fushman, D. (2019). A Question-Entailment Approach to Question Answering. BMC Bioinformatics, 20(1), 511. https://doi.org/10.1186/s12859-019-3119-4
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
data.ca.gov
+4more
csv, data, doc, html +5
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
xlsx, xlsx(754073), pdf(333268), xlsx(758376), xlsx(769128), xls(19599360), xlsx(770931), pdf(303198), xlsx(779866), xls(51424256), pdf(121968), xlsx(765216), csv(205488092), xls(18301440), html, xlsx(756356), xls(14657536), xlsx(768036), zip, xlsx(752914), xlsx(763636), xls(19650048), xlsx(791201), xlsm(1360350), xlsx(783155), xls, xls(18445312), pdf(310420), pdf(383996), xls(44967936), data, xlsx(750199), doc, xlsx(14714368), xlsx(777616), xls(51554816), xls(44933632), xlsx(758089), xls(920576), pdf(258239), xlsx(770375), xls(16002048), xls(19577856), xlsm(1369828), xlsx(780332)Available download formats
Dataset updated
Oct 8, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
d
Study of Womens Health Across the Nation (SWAN) Public Use Data
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (NIH) (2023). Study of Womens Health Across the Nation (SWAN) Public Use Data [Dataset]. https://catalog.data.gov/dataset/study-of-womens-health-across-the-nation-swan-public-use-data
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Institutes of Health (NIH)
Description
The SWAN Public Use Datasets provide access to longitudinal data describing the physical, biological, psychological, and social changes that occur during the menopausal transition. Data collected from 3,302 SWAN participants from Baseline through the 10th Annual Follow-Up visit are currently available to the public. Registered users are able to download datasets in a variety of formats, search variables and view recent publications.

Facebook

Twitter

Click to copy link

Link copied

Cite

GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/

Healthcare Dataset

Explore at:

jsonAvailable download formats

Dataset updated

Oct 19, 2024

Dataset provided by

GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED

Authors

GTS

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.

Clear search

Close search

Google apps

Main menu

Healthcare Dataset

Comprehensive Medical Q&A Dataset

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

healthcare-dataset-stroke-data

Dataset

Contents

Medical Staff People Tracking Dataset

Open Database of Healthcare Facilities

Global Health Statistics Dataset

Healthcare Diabetes Dataset

Healthcare Dataset

Healthcare

The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

EHR Dataset for Patient Treatment Classification

Healthcare Management System

Kenya - Healthcare Facilities - Dataset - ENERGYDATA.INFO

Mental Health - Datasets - CTData.org

Reddit Mental Health Dataset

Data from: Dataset of health insurance portfolio

Data from: UK Health Accounts

MIMIC-III Clinical Database

MedQuAD: Medical Question-Answer Dataset

Medical Questions: Unveiling the MedQuAD Dataset

What is MedQuAD?

What makes MedQuAD unique?

The Power of MedQuAD

Hospital Annual Financial Data - Selected Data & Pivot Tables

Study of Womens Health Across the Nation (SWAN) Public Use Data

Healthcare DatasetSee More Versions

Healthcare Dataset