100+ datasets found
  1. g

    Healthcare Dataset

    • gts.ai
    json
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.

  2. Comprehensive Medical Q&A Dataset

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
    Explore at:
    zip(5126941 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Comprehensive Medical Q&A Dataset

    Unlocking Healthcare Data with Natural Language Processing

    By Huggingface Hub [source]

    About this dataset

    The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

    Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

    Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

    Research Ideas

    • Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.
    • Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.
    • Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  3. healthcare-dataset-stroke-data

    • kaggle.com
    zip
    Updated Dec 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aouatif Cherdid (2023). healthcare-dataset-stroke-data [Dataset]. https://www.kaggle.com/datasets/aouatifcherdid/healthcare-dataset-stroke-data
    Explore at:
    zip(69007 bytes)Available download formats
    Dataset updated
    Dec 3, 2023
    Authors
    Aouatif Cherdid
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Aouatif Cherdid

    Released under CC0: Public Domain

    Contents

  4. g

    Medical Staff People Tracking Dataset

    • gts.ai
    json
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Globose Technology Solutions Pvt. Ltd. (2023). Medical Staff People Tracking Dataset [Dataset]. https://gts.ai/dataset-download/de-identified-dictation-notes/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset authored and provided by
    Globose Technology Solutions Pvt. Ltd.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Medical Staff People Tracking Dataset provides high-quality, anonymized clinical and movement data of healthcare personnel in medical environments. It is designed to support AI and ML models for hospital workflow optimization, safety monitoring, and activity analysis while ensuring privacy and compliance.

  5. G

    Open Database of Healthcare Facilities

    • open.canada.ca
    • catalogue.arctic-sdi.org
    csv, esri rest +4
    Updated Mar 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2022). Open Database of Healthcare Facilities [Dataset]. https://open.canada.ca/data/en/dataset/a1bcd4ee-8e57-499b-9c6f-94f6902fdf32
    Explore at:
    fgdb/gdb, esri rest, csv, html, pdf, wmsAvailable download formats
    Dataset updated
    Mar 2, 2022
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The Open Database of Healthcare Facilities (ODHF) is a collection of open data containing the names, types, and locations of health facilities across Canada. It is released under the Open Government License - Canada. The ODHF compiles open, publicly available, and directly-provided data on health facilities across Canada. Data sources include regional health authorities, provincial, territorial and municipal governments, and public health and professional healthcare bodies. This database aims to provide enhanced access to a harmonized listing of health facilities across Canada by making them available as open data. This database is a component of the Linkable Open Data Environment (LODE).

  6. g

    Global Health Statistics Dataset

    • gts.ai
    json
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). Global Health Statistics Dataset [Dataset]. https://gts.ai/dataset-download/global-health-statistics-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Access comprehensive global health data on disease prevalence, mortality rates, treatment effectiveness, and healthcare infrastructure.

  7. Healthcare Diabetes Dataset

    • kaggle.com
    zip
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandita Pore (2023). Healthcare Diabetes Dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/healthcare-diabetes
    Explore at:
    zip(27316 bytes)Available download formats
    Dataset updated
    Aug 23, 2023
    Authors
    Nandita Pore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description: Welcome to the Diabetes Prediction Dataset, a valuable resource for researchers, data scientists, and medical professionals interested in the field of diabetes risk assessment and prediction. This dataset contains a diverse range of health-related attributes, meticulously collected to aid in the development of predictive models for identifying individuals at risk of diabetes. By sharing this dataset, we aim to foster collaboration and innovation within the data science community, leading to improved early diagnosis and personalized treatment strategies for diabetes.

    Columns: 1. Id: Unique identifier for each data entry. 2. Pregnancies: Number of times pregnant. 3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test. 4. BloodPressure: Diastolic blood pressure (mm Hg). 5. SkinThickness: Triceps skinfold thickness (mm). 6. Insulin: 2-Hour serum insulin (mu U/ml). 7. BMI: Body mass index (weight in kg / height in m^2). 8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes. 9. Age: Age in years. 10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.

    Utilize this dataset to explore the relationships between various health indicators and the likelihood of diabetes. You can apply machine learning techniques to develop predictive models, feature selection strategies, and data visualization to uncover insights that may contribute to more accurate risk assessments. As you embark on your journey with this dataset, remember that your discoveries could have a profound impact on diabetes prevention and management.

    Please ensure that you adhere to ethical guidelines and respect the privacy of individuals represented in this dataset. Proper citation and recognition of this dataset's source are appreciated to promote collaboration and knowledge sharing.

    Start your exploration of the Diabetes Prediction Dataset today and contribute to the ongoing efforts to combat diabetes through data-driven insights and innovations.

  8. R

    Healthcare Dataset

    • universe.roboflow.com
    zip
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    healthcare (2025). Healthcare Dataset [Dataset]. https://universe.roboflow.com/healthcare-cditm/healthcare-pann5/model/6
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset authored and provided by
    healthcare
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects
    Description

    Healthcare

    ## Overview
    
    Healthcare is a dataset for computer vision tasks - it contains Objects annotations for 302 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

    • zenodo.org
    bin, csv, zip
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
    Explore at:
    zip, bin, csvAvailable download formats
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

    Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

    Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

    For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.

  10. m

    EHR Dataset for Patient Treatment Classification

    • data.mendeley.com
    Updated May 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mujiono Sadikin (2020). EHR Dataset for Patient Treatment Classification [Dataset]. http://doi.org/10.17632/7kv3rctx7m.1
    Explore at:
    Dataset updated
    May 10, 2020
    Authors
    Mujiono Sadikin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.

  11. Healthcare Management System

    • kaggle.com
    zip
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
    Explore at:
    zip(74279 bytes)Available download formats
    Dataset updated
    Dec 23, 2023
    Authors
    Anouska Abhisikta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patients Table:

    • PatientID: Unique identifier for each patient.
    • firstname: First name of the patient.
    • lastname: Last name of the patient.
    • email: Email address of the patient.

    This table stores information about individual patients, including their names and contact details.

    Doctors Table:

    • DoctorID: Unique identifier for each doctor.
    • DoctorName: Full name of the doctor.
    • Specialization: Area of medical specialization.
    • DoctorContact: Contact details of the doctor.

    This table contains details about healthcare providers, including their names, specializations, and contact information.

    Appointments Table:

    • AppointmentID: Unique identifier for each appointment.
    • Date: Date of the appointment.
    • Time: Time of the appointment.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
    • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

    This table records scheduled appointments, linking patients to doctors.

    MedicalProcedure Table:

    • ProcedureID: Unique identifier for each medical procedure.
    • ProcedureName: Name or description of the medical procedure.
    • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

    This table stores details about medical procedures associated with specific appointments.

    Billing Table:

    • InvoiceID: Unique identifier for each billing transaction.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
    • Items: Description of items or services billed.
    • Amount: Amount charged for the billing transaction.

    This table maintains records of billing transactions, associating them with specific patients.

    demo Table:

    • ID: Primary key, serves as a unique identifier for each record.
    • Name: Name of the entity.
    • Hint: Additional information or hint about the entity.

    This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

    This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

  12. e

    Kenya - Healthcare Facilities - Dataset - ENERGYDATA.INFO

    • energydata.info
    Updated Nov 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Kenya - Healthcare Facilities - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/kenya-healthcare-facilities
    Explore at:
    Dataset updated
    Nov 28, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Kenya
    Description

    Data on healthcare facility locations in Kenya. The dataset was provided by the Government of Kenya.

  13. c

    Mental Health - Datasets - CTData.org

    • data.ctdata.org
    Updated Jun 24, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Mental Health - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/mental-health
    Explore at:
    Dataset updated
    Jun 24, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mental Health reports the prevalence of the mental illness in the past year by age range.

  14. Reddit Mental Health Dataset

    • zenodo.org
    csv
    Updated Oct 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel M. Low; Daniel M. Low; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh (2020). Reddit Mental Health Dataset [Dataset]. http://doi.org/10.17605/osf.io/7peyq
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel M. Low; Daniel M. Low; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020 and included older timeframes to obtain baseline posts before COVID-19.

    Please cite if you use this dataset:

    Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., & Talkar, T. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. Journal of medical Internet research, 22(10), e22635.

    @article{low2020natural,
     title={Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study},
     author={Low, Daniel M and Rumker, Laurie and Torous, John and Cecchi, Guillermo and Ghosh, Satrajit S and Talkar, Tanya},
     journal={Journal of medical Internet research},
     volume={22},
     number={10},
     pages={e22635},
     year={2020},
     publisher={JMIR Publications Inc., Toronto, Canada}
    }


    License

    This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/

    It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.

    Reddit Mental Health Dataset

    Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:

    • 15 specific mental health support groups (r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)
    • 2 broad mental health subreddits (r/mentalhealth, r/COVID19_support)
    • 11 non-mental health subreddits (r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).

    filenames and corresponding timeframes:

    • post: Jan 1 to April 20, 2020 (called "mid-pandemic" in manuscript; r/COVID19_support appears). Unique users: 320,364.
    • pre: Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts. Unique users: 327,289.
    • 2019: Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match post data. Unique users: 282,560.
    • 2018: Jan 1 to April 20, 2018. A control for seasonal fluctuations to match post data. Unique users: 177,089

    Unique users across all time windows (pre and 2019 overlap): 826,961.

    See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.

    Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.

  15. m

    Data from: Dataset of health insurance portfolio

    • data.mendeley.com
    • producciocientifica.uv.es
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josep Lledó (2025). Dataset of health insurance portfolio [Dataset]. http://doi.org/10.17632/386vmj2tbk.4
    Explore at:
    Dataset updated
    Nov 26, 2025
    Authors
    Josep Lledó
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data is formatted as a spreadsheet, encompassing the primary activities over a span of three full years (2017, 2018 and 2019) concerning non-life health insurance portfolio. This dataset comprises 228,711 rows and 42 columns. Each row signifies a insured (individual) policy, while each column represents a distinct variable.

  16. Data from: UK Health Accounts

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). UK Health Accounts [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/datasets/healthaccountsreferencetables
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    UK healthcare expenditure data by financing scheme, function and provider, and additional analyses produced to internationally standardised definitions.

  17. p

    MIMIC-III Clinical Database

    • physionet.org
    • oppositeofnorth.com
    Updated Sep 4, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Tom Pollard; Roger Mark (2016). MIMIC-III Clinical Database [Dataset]. http://doi.org/10.13026/C2XW26
    Explore at:
    Dataset updated
    Sep 4, 2016
    Authors
    Alistair Johnson; Tom Pollard; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.

  18. MedQuAD: Medical Question-Answer Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afroz (2024). MedQuAD: Medical Question-Answer Dataset [Dataset]. https://www.kaggle.com/datasets/pythonafroz/medquad-medical-question-answer-for-ai-research
    Explore at:
    zip(5188686 bytes)Available download formats
    Dataset updated
    Sep 7, 2024
    Authors
    Afroz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Medical Questions: Unveiling the MedQuAD Dataset

    Have you ever wondered where medical chatbots or intelligent search engines for health information get their knowledge? The answer lies in large datasets like MedQuAD! This rich resource provides a treasure trove of real-world medical questions and informative answers, paving the way for advancements in Natural Language Processing (NLP) and Information Retrieval (IR) within the healthcare domain.

    What is MedQuAD?

    MedQuAD, short for Medical Question Answering Dataset, is a collection of question-answer pairs meticulously curated from 12 trusted National Institutes of Health (NIH) websites. These websites cover a wide range of health topics, from cancer.gov to GARD (Genetic and Rare Diseases Information Resource).

    What makes MedQuAD unique?

    Beyond the sheer volume of data, MedQuAD offers unique features that empower researchers and developers:

    1. Diversity of Questions: MedQuAD encompasses a spectrum of 37 question types, ranging from treatment options and diagnosis inquiries to understanding side effects. This variety reflects the diverse needs of individuals seeking medical information.
    2. Focus on Specific Entities: MedQuAD goes beyond just questions and answers. It delves deeper by associating each question with the entity it focuses on, such as diseases, drugs, or other medical tests. This targeted approach facilitates more focused research and NLP applications.
    3. Rich Annotations: While the answers from MedlinePlus collections are excluded due to copyright restrictions, MedQuAD retains valuable annotations within its XML files. These annotations include question type, synonyms, unique identifiers (CUI) for medical concepts, and semantic types. This additional information opens doors for more sophisticated NLP tasks.

    The Power of MedQuAD

    MedQuAD serves as a valuable springboard for various applications in the medical NLP and IR field. Here are some potential uses:

    1. Training Chatbots and Virtual Assistants: AI-powered medical chatbots can leverage MedQuAD to learn how to respond accurately and informatively to a wide range of health inquiries from users.
    2. Developing Intelligent Search Engines: Search engines can be enhanced to provide more relevant and accurate health information by drawing insights from the question types and focuses presented in MedQuAD.
    3. Studying User Concerns in Healthcare: Analyzing the types of questions within MedQuAD can reveal valuable insights into what information users are most interested in and what areas require clearer explanations.

    In essence, MedQuAD is a powerful tool for unlocking the potential of NLP and IR in the medical domain. By leveraging this rich dataset, researchers and developers are paving the way for a future where individuals can access accurate and comprehensive health information with increasing ease and efficiency.

    Reference:

    If you use the MedQuAD dataset or the associated QA test collection, please cite the following paper: Ben Abacha, A., & Demner-Fushman, D. (2019). A Question-Entailment Approach to Question Answering. BMC Bioinformatics, 20(1), 511. https://doi.org/10.1186/s12859-019-3119-4

  19. C

    Hospital Annual Financial Data - Selected Data & Pivot Tables

    • data.chhs.ca.gov
    • data.ca.gov
    • +4more
    csv, data, doc, html +5
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
    Explore at:
    xlsx, xlsx(754073), pdf(333268), xlsx(758376), xlsx(769128), xls(19599360), xlsx(770931), pdf(303198), xlsx(779866), xls(51424256), pdf(121968), xlsx(765216), csv(205488092), xls(18301440), html, xlsx(756356), xls(14657536), xlsx(768036), zip, xlsx(752914), xlsx(763636), xls(19650048), xlsx(791201), xlsm(1360350), xlsx(783155), xls, xls(18445312), pdf(310420), pdf(383996), xls(44967936), data, xlsx(750199), doc, xlsx(14714368), xlsx(777616), xls(51554816), xls(44933632), xlsx(758089), xls(920576), pdf(258239), xlsx(770375), xls(16002048), xls(19577856), xlsm(1369828), xlsx(780332)Available download formats
    Dataset updated
    Oct 8, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

    Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

    There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.

  20. d

    Study of Womens Health Across the Nation (SWAN) Public Use Data

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). Study of Womens Health Across the Nation (SWAN) Public Use Data [Dataset]. https://catalog.data.gov/dataset/study-of-womens-health-across-the-nation-swan-public-use-data
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    The SWAN Public Use Datasets provide access to longitudinal data describing the physical, biological, psychological, and social changes that occur during the menopausal transition. Data collected from 3,302 SWAN participants from Baseline through the 10th Annual Follow-Up visit are currently available to the public. Registered users are able to download datasets in a variety of formats, search variables and view recent publications.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GTS (2024). Healthcare Dataset [Dataset]. https://gts.ai/dataset-download/healthcare-dataset/

Healthcare Dataset

Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Explore our synthetic healthcare dataset designed for machine learning, data science, and healthcare analytics.

Search
Clear search
Close search
Google apps
Main menu