100+ datasets found
  1. P

    Personalized Healthcare Treatment Plans Dataset

    • paperswithcode.com
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Personalized Healthcare Treatment Plans Dataset [Dataset]. https://paperswithcode.com/dataset/personalized-healthcare-treatment-plans
    Explore at:
    Dataset updated
    Mar 6, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    Healthcare providers often rely on generalized treatment protocols that may not address the unique needs of individual patients. This approach led to variability in treatment outcomes, reduced efficacy, and limited patient satisfaction. A leading hospital sought a solution to develop personalized treatment plans tailored to each patient’s medical history, genetic profile, and current health status.

    Challenge

    Implementing a personalized healthcare treatment system involved overcoming the following challenges:

    Integrating diverse patient data, including medical history, lab results, genetic information, and lifestyle factors.

    Developing predictive models capable of identifying optimal treatment plans for individual patients.

    Ensuring compliance with privacy regulations and maintaining data security throughout the process.

    Solution Provided

    An advanced healthcare treatment recommendation system was developed using machine learning models and predictive analytics. The solution was designed to:

    Analyze patient data to identify patterns and predict treatment outcomes.

    Recommend individualized treatment plans optimized for efficacy and patient preferences.

    Continuously learn and adapt to improve recommendations based on new medical insights and patient feedback.

    Development Steps

    Data Collection

    Aggregated data from electronic health records (EHR), genetic testing reports, and patient-provided health information.

    Preprocessing

    Standardized and anonymized data to ensure accuracy, consistency, and compliance with healthcare privacy regulations.

    Model Development

    Trained machine learning models to identify correlations between patient characteristics and treatment outcomes. Developed predictive algorithms to recommend personalized treatment plans for conditions like chronic diseases, cancer, and rare disorders.

    Validation

    Tested the system on historical patient data to evaluate its accuracy in predicting successful treatment outcomes.

    Deployment

    Integrated the solution into the hospital’s clinical decision support systems, enabling healthcare providers to access personalized treatment recommendations during consultations.

    Continuous Monitoring & Improvement

    Established a feedback mechanism to refine models using real-world treatment outcomes and patient satisfaction data.

    Results

    Improved Patient Outcomes

    The system delivered personalized treatment recommendations that significantly improved recovery rates and health outcomes.

    Increased Treatment Efficacy

    Optimized treatment plans reduced trial-and-error approaches, leading to more effective interventions and fewer side effects.

    Personalized Healthcare Experiences

    Patients reported higher satisfaction levels due to treatment plans tailored to their individual needs and preferences.

    Enhanced Decision-Making

    Healthcare providers benefited from data-driven insights, enabling more informed and confident decisions.

    Scalable and Future-Ready Solution

    The system scaled seamlessly to support diverse medical specialties and adapted to incorporate emerging medical research.

  2. Gold Standard/Manual Reviewed Annotated Datasets for Technical Validation

    • figshare.com
    xlsx
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zoie SY Wong (2023). Gold Standard/Manual Reviewed Annotated Datasets for Technical Validation [Dataset]. http://doi.org/10.6084/m9.figshare.23504922.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Zoie SY Wong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page shares the technical validation datasets used to evaluate a Large Dataset of Annotated Incident Reports on Medication Errors and its machine annotator. The files contain in this repository include the IFMIR gold standard dataset (CrossValid_IFMIR_522.xlsx), randomly sampled labeled incident reports from 2010 – 2020 (InternalValid_JQ2010-20_40.xlsx), randomly sampled labeled incident reports from 2021 (ExternalValid_JQ2021_20.xlsx) and Error-free reports (Error_analysis.xlsx).

    To use any of these datasets, one should also cite this original data source: Medical Adverse Event Information Collection Project [Iryō jiko jōhō shūshū-tō jigyō]  Japan Council for Quality Health Care; 2022 [Available from: https://www.med-safe.jp/index.html.]

  3. S

    Test dataset of ChatGPT in medical field

    • scidb.cn
    Updated Mar 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    Science Data Bank
    Authors
    robin shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

  4. CarePrecise Collection U.S. HCP/HCO Dataset

    • datarade.ai
    .csv
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CarePrecise (2021). CarePrecise Collection U.S. HCP/HCO Dataset [Dataset]. https://datarade.ai/data-products/careprecise-collection-u-s-hcp-hco-dataset-careprecise
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset authored and provided by
    CarePrecise
    Area covered
    United States of America
    Description

    The CarePrecise U.S. HCP/HCO Collection Dataset includes deep data on all 6.7 million U.S. HIPAA-covered healthcare practitioners and organizations. Monthly full updates. Includes linkages between the individual practitioners and their practice groups, hospitals, and hospital systems. Licensing plans are available for basic (internal use), derivative products, and redistribution. Data updates are delivered quarterly or monthly to suit customer need; FTP push is available, standard delivery is via CDN. Single download for evaluation is available. CarePrecise is a leader in the fields of HCP/HCO data, supplying provider data to the industry since 2008. Note regarding pricing: The Collection price shown in Pricing is separate from email addresses. Email addresses are priced as low as $0.075 per, based on volume. Pricing shown is without derivative product (DP) licensing for use in web applications; DP license ranges in price from $1,900/year to $9,000/year on top of data purchase, based on application and overall exposure estimate. DP license is sold in two-year term and requires a license agreement.

  5. P

    LLM Health Benchmarks Dataset

    • paperswithcode.com
    Updated Feb 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). LLM Health Benchmarks Dataset [Dataset]. https://paperswithcode.com/dataset/llm-health-benchmarks
    Explore at:
    Dataset updated
    Feb 14, 2025
    Description

    LLM Health Benchmarks Dataset The Health Benchmarks Dataset is a specialized resource for evaluating large language models (LLMs) in different medical specialties. It provides structured question-answer pairs designed to test the performance of AI models in understanding and generating domain-specific knowledge.

    Primary Purpose This dataset is built to: - Benchmark LLMs in medical specialties and subfields. - Assess the accuracy and contextual understanding of AI in healthcare. - Serve as a standardized evaluation suite for AI systems designed for medical applications.

    Key Features

    Covers 50+ medical and health-related topics, including both clinical and non-clinical domains. Includes ~7,500 structured question-answer pairs. Designed for fine-grained performance evaluation in medical specialties.

    Applications

    LLM Evaluation: Benchmarking AI models for domain-specific performance. Healthcare AI Research: Standardized testing for AI in healthcare. Medical Education AI: Testing AI systems designed for tutoring medical students.

    Dataset Structure The dataset is organized by medical specialties and subfields, each represented as a split. Below is a snapshot:

    SpecialtyNumber of Rows
    Lab Medicine158
    Ethics174
    Dermatology170
    Gastroenterology163
    Internal Medicine178
    Oncology180
    Orthopedics177
    General Surgery178
    Pediatrics180
    ...(and more)...

    Each split contains: - Questions: The medical questions for the specialty. - Answers: Corresponding high-quality answers.

    Usage Instructions Here’s how you can load and use the dataset:

    from datasets import load_dataset
    
    Load the dataset
    dataset = load_dataset("yesilhealth/Health_Benchmarks")
    
    Access specific specialty splits
    oncology = dataset["Oncology"]
    internal_medicine = dataset["Internal_Medicine"]
    
    View sample data
    print(oncology[:5])
    

    Evaluation Workflow

    Model Input: Provide the questions from each split to the LLM. Model Output: Collect the AI-generated answers. Scoring: Compare model answers to ground truth answers using metrics such as: Exact Match (EM) F1 Score Semantic Similarity

    Citation If you use this dataset for research or development, please cite:

    plaintext @dataset{yesilhealth_health_benchmarks, title={Health Benchmarks Dataset}, author={Yesil Health AI}, year={2024}, url={https://huggingface.co/datasets/yesilhealth/Health_Benchmarks} }

    License This dataset is licensed under the Apache 2.0 License.

    Feedback For questions, suggestions, or feedback, feel free to contact us via email at [hello@yesilhealth.com].

  6. MedMNIST: Standardized Biomedical Images

    • kaggle.com
    Updated Feb 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Möbius
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    "'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

    A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

    MedMNIST Landscape :

    https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

    About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

    Key Features

    ###

    Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

    Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

    User-Friendly: The small size of 28Ă—28 (2D) or 28Ă—28Ă—28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

    Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

    Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

    Starter Code: download more data and training

    Github Page: https://github.com/MedMNIST/MedMNIST

    My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

    Acknowledgements

    Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

    License and Citation

    The code is under Apache-2.0 License.

    The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...

  7. P

    Healthcare Patient Monitoring Dataset

    • paperswithcode.com
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Healthcare Patient Monitoring Dataset [Dataset]. https://paperswithcode.com/dataset/healthcare-patient-monitoring
    Explore at:
    Dataset updated
    Mar 7, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    Hospitals and healthcare providers faced challenges in ensuring continuous monitoring of patient vitals, especially for high-risk patients. Traditional monitoring methods often lacked real-time data processing and timely alerts, leading to delayed responses and increased hospital readmissions. The healthcare provider needed a solution to monitor patient health continuously and deliver actionable insights for improved care.

    Challenge

    Implementing an advanced patient monitoring system involved overcoming several challenges:

    Collecting and analyzing real-time data from multiple IoT-enabled medical devices.

    Ensuring accurate health insights while minimizing false alarms.

    Integrating the system seamlessly with hospital workflows and electronic health records (EHR).

    Solution Provided

    A comprehensive patient monitoring system was developed using IoT-enabled medical devices and AI-based monitoring systems. The solution was designed to:

    Continuously collect patient vital data such as heart rate, blood pressure, oxygen levels, and temperature.

    Analyze data in real-time to detect anomalies and provide early warnings for potential health issues.

    Send alerts to healthcare professionals and caregivers for timely interventions.

    Development Steps

    Data Collection

    Deployed IoT-enabled devices such as wearable monitors, smart sensors, and bedside equipment to collect patient data continuously.

    Preprocessing

    Cleaned and standardized data streams to ensure accurate analysis and integration with hospital systems.

    AI Model Development

    Built machine learning models to analyze vital trends and detect abnormalities in real-time

    Validation

    Tested the system in controlled environments to ensure accuracy and reliability in detecting health issues.

    Deployment

    Implemented the solution in hospitals and care facilities, integrating it with EHR systems and alert mechanisms for seamless operation.

    Continuous Monitoring & Improvement

    Established a feedback loop to refine models and algorithms based on real-world data and healthcare provider feedback.

    Results

    Enhanced Patient Care

    Real-time monitoring and proactive alerts enabled healthcare professionals to provide timely interventions, improving patient outcomes.

    Early Detection of Health Issues

    The system detected potential health complications early, reducing the severity of conditions and preventing critical events.

    Reduced Hospital Readmissions

    Continuous monitoring helped manage patient health effectively, leading to a significant decrease in readmission rates.

    Improved Operational Efficiency

    Automation and real-time insights reduced the burden on healthcare staff, allowing them to focus on critical cases.

    Scalable Solution

    The system adapted seamlessly to various healthcare settings, including hospitals, clinics, and home care environments.

  8. EHRSHOT

    • redivis.com
    • stanford.redivis.com
    application/jsonl +7
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83
    Explore at:
    csv, application/jsonl, sas, parquet, stata, spss, arrow, avroAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Shah Lab
    Description

    Abstract

    đź‘‚đź’‰ EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

    ⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

    ⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

    Methodology

    1. đź“– Overview

    EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

    • **6,739 **patients
    • 41.6 million clinical events
    • 921,499 visits
    • 15 prediction tasks

    %3C!-- --%3E

    2. đź’˝ Dataset

    EHRSHOT is sourced from Stanford’s STARR-OMOP database.

    • Data follows the OMOP CDM and is fully de-identified.
    • Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.
    • EHRSHOT does not contain clinical notes or images.

    %3C!-- --%3E

    We provide two versions of the dataset:

    • EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.
    • EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

    %3C!-- --%3E

    To access the raw data, please see the "Tables" and "Files"** **tabs above:

    3. đź’˝ Data Files and Formats

    We provide EHRSHOT in two file formats:

    • OMOP CDM v5.4
    • Medical Event Data Standard (MEDS)

    %3C!-- --%3E

    Within the "Tables" tab...

    1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

    Within the "Files" tab...

    1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: FEMR 0.1.16

    * Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

    2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: MEDS 0.3.3

    * Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

    3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

    4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

    4. 🤖 Model

    We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

    **5. 🧑‍💻 Code **

    Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

    Usage

    **NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

    Access to the EHRSHOT dataset requires the following:

    • Verified Affiliation with an **Academic, Government, **o
  9. Dataset for "Public health insurance coverage in India before and after...

    • figshare.com
    bin
    Updated Aug 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjay K Mohanty; Ashish Kumar Upadhyay; Suraj Maiti; Radhe Shyam Mishra; Fabrice Kämpfen; Jürgen Maurer; Owen O'Donell (2023). Dataset for "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" [Dataset]. http://doi.org/10.6084/m9.figshare.23919078.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 10, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sanjay K Mohanty; Ashish Kumar Upadhyay; Suraj Maiti; Radhe Shyam Mishra; Fabrice Kämpfen; Jürgen Maurer; Owen O'Donell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey dataThe National Family Health Survey (NFHS), India data is publicly available data set and can be accessed on request. It can be downloaded upon registration from the Demographic and Health Survey (DHS) website upon registration at The DHS Program - Request Access To Datasets. We have used data from the fourth and fifth round of NFHS, which can be accessed after registration from the link given here for NFHS 4 and NFHS 5 https://dhsprogram.com/data/dataset/India_Standard-DHS_2015.cfm?flag=0 and here https://dhsprogram.com/data/dataset/India_Standard-DHS_2020.cfm?flag=0 respectively. These datasets (HR file) have been used to obtain this combined dataset of a paper entitled "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" submitted to BMJ Global Health August 2023.

  10. E

    Minimum Hospital Data Set

    • healthinformationportal.eu
    html
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Public Service (FPS) Health, Food Chain Safety, and Environment (2022). Minimum Hospital Data Set [Dataset]. https://www.healthinformationportal.eu/health-information-sources/minimum-hospital-data-set
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset authored and provided by
    Federal Public Service (FPS) Health, Food Chain Safety, and Environment
    License

    https://fair.healthdata.be/dataset/12d69eca-4449-47d2-943d-e4448a467292https://fair.healthdata.be/dataset/12d69eca-4449-47d2-943d-e4448a467292

    Variables measured
    sex, title, topics, acronym, country, language, data_owners, description, contact_name, geo_coverage, and 14 more
    Measurement technique
    Hospital resources & Healthcare administrative area resources
    Description

    The MZG is a registration with which all non-psychiatric hospitals in Belgium must make their (anonymised) administrative, medical and nursing data available to the Federal Public Service (FPS) Public Health. The aim of the MZG is to support the government's health policy by

    • Determining the needs for hospital facilities;
    • Describing the qualitative and quantitative accreditation standards of hospitals and their services;
    • Organising the financing of hospitals;
    • Determining policy for the practice of medicine;
    • To outline epidemiological policy.

    The MZG aims also to support the health policy of hospitals by providing national and individual feedback so that a hospital can compare itself with other hospitals and adapt its internal policy.

    All reports can be found here (in French/Dutch).

  11. Addressing the Challenges of Health Data Standard Adoption and Usage: A...

    • zenodo.org
    bin
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Marfoglia; Alberto Marfoglia; Valerio Antonio Arcobelli; Valerio Antonio Arcobelli; SERENA MOSCATO; SERENA MOSCATO; Antonino Amedeo La Mattina; Antonino Amedeo La Mattina; Sabato Mellone; Sabato Mellone; ANTONELLA CARBONARO; ANTONELLA CARBONARO (2025). Addressing the Challenges of Health Data Standard Adoption and Usage: A Systematic Review - Data Extraction [Dataset]. http://doi.org/10.5281/zenodo.15358180
    Explore at:
    binAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Marfoglia; Alberto Marfoglia; Valerio Antonio Arcobelli; Valerio Antonio Arcobelli; SERENA MOSCATO; SERENA MOSCATO; Antonino Amedeo La Mattina; Antonino Amedeo La Mattina; Sabato Mellone; Sabato Mellone; ANTONELLA CARBONARO; ANTONELLA CARBONARO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 7, 2025
    Description

    This table presents the data extraction from the 99 studies included according to the criteria outlined in the main manuscript. It is provided as supplementary material to enhance the readability of the paper while ensuring that all relevant information is preserved and accessible without loss of detail.

    The names of the variables and their descriptions are provided in the attached file, along with the following details:

    VariableDescription
    Ref.The citation in the format: First author et al. [Year] (e.g., AuthorA et al. [2022]). This identifies the study's primary citation for easy reference.
    TitleThe title of the paper
    StandardThe healthcare data standard used in the study. Possible values are: OMOP, OpenEHR, FHIR.
    Study LocationThe country where the study was conducted.
    Objective for using the standardDetailedThe comprehensive explanation of the specific objective of using the standard in the study, describing how it supports the study’s goals.
    ShortThe primary purpose for applying the healthcare standard. Possible values are: Secondary data reuse, Data exchange, Clinical decision support, Vocabulary definition, EHR system design,
    Application domainTypeThe application domain type that represents the healthcare standard. Possible solution are: Clinical: Studies with a direct impact on clinical practice, applying established tools or methods in healthcare settings (e.g., predicting in-hospital mortality for heart attack patients) and Research: Studies proposing innovative tools, methodologies, or frameworks still in the design/testing phase, not yet clinically implemented.
    Healthcare AreaThe relevant healthcare domain for the study, such as Cardiovascular, Intensive Care Unit, Emergency Department, Oncology, Biology, etc.
    ClusterThe healthcare domain clusterized for easier readability. Possible values include: Clinical Medicine, Clinical Services and Diagnostics, Public Health, Health Information Management and Biomedical Sciences
    UseThis report if the results of the paper serving a Primary use (direct care) or a Secondary use (repurposing existing data or tools for new objectives).
    ScaleThe scale of the study. Possible values are: Single center (one hospital/clinic), Multi-center (multiple institutions), Regional (specific region), National level (countrywide).
    Dataset magnitude in patientsThe magnitude of the dataset expressed in chars. Possible values are: A (<10 to 99), B (100 to 9,999), C (10,000 to 999,999) and D (1,000,000 and above).
    N° ElementsThe number of variables of input in the process of standardization.
    Percentuage of mapped variablesThe percentage of successful data standardisation.
    Coverage of the standardThe methodology of standardisation wheter it was adapted or not.
    ETL ToolsData cleaning & extractionThe tools adopted for supporting data cleaning and extraction.
    MappingThe tools adopted for the mapping of the variables.
    ValidationThe tools adopted for the validation of the standardization process.
    DatabaseThe database adopted for storing the result of the healthcare data standardization.
    Process efficiency and Economic assessmentThe information about the economic impact if the consequences are concrete and measured by the authors (e.g., actual cost savings, resource usage reductions). If the authors did not measure the economic impact, this field remains blank.
    Comments by authorsLimitationsThe significant limitations or challenges faced during the study about the standard adopted, such as issues with data compatibility, scalability, or the need for customization.
    AdvantagesThe benefits of applying the standard model, such as improved data consistency, enhanced clinical outcomes, better interoperability, or more efficient workflows.
  12. Z

    PolyMed: A Medical Dataset Addressing Disease Imbalance for Robust Automatic...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dong-ho Lee (2023). PolyMed: A Medical Dataset Addressing Disease Imbalance for Robust Automatic Diagnosis Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7866102
    Explore at:
    Dataset updated
    May 3, 2023
    Dataset provided by
    Chan-Yang Ju
    Dong-ho Lee
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    We introduce the PolyMed dataset, designed to address the limitations of existing medical case data for Automatic Diagnosis Systems (ADS). ADS assists doctors by predicting diseases based on patients' basic information, such as age, gender, and symptoms. However, these systems face challenges due to imbalanced disease label data and difficulties in accessing or collecting medical data. To tackle these issues, the PolyMed dataset has been developed to improve the evaluation of ADS by incorporating medical knowledge graph data and diagnosis case data. The dataset aims to provide comprehensive evaluation, include diverse disease information, effectively utilize external knowledge, and perform tasks closer to real-world scenarios.

    We have also made the data collection tools publicly available to enable researchers and other interested parties to contribute additional data in a standardized format. These tools feature a range of customizable input fields that can be selectively utilized according to the user's specific requirements, ensuring consistency and professionalism in the data collection process.

    All train and test code of our data available in https://github.com/krchanyang/PolyMed

  13. w

    Dataset of book subjects that contain The political economy of universal...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain The political economy of universal healthcare in Africa : evidence from Ghana [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+political+economy+of+universal+healthcare+in+Africa+%3A+evidence+from+Ghana&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ghana
    Description

    This dataset is about book subjects. It has 1 row and is filtered where the books is The political economy of universal healthcare in Africa : evidence from Ghana. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  14. Global Health Expenditure Database

    • datacatalog.hshsl.umaryland.edu
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (2024). Global Health Expenditure Database [Dataset]. https://datacatalog.hshsl.umaryland.edu/dataset/77
    Explore at:
    Dataset updated
    Mar 27, 2024
    Dataset authored and provided by
    World Health Organizationhttps://who.int/
    Time period covered
    Jan 1, 2000 - Present
    Description

    The Global Health Expenditure Database (GHED) provides internationally comparable data on health spending for close to 190 countries. The database is open access and supports the goal of Universal Health Coverage by helping monitor the availability of resources for health and the extent to which they are used efficiently and equitably. This, in turn, helps ensure health services are available and affordable when people need them...WHO works collaboratively with Member States and updates the database annually using available data such as government budgets and health accounts studies. Where necessary, modifications and estimates are made to ensure the comprehensiveness and consistency of the data across countries and years. GHED is the source of the health expenditure data republished by the World Bank and the WHO Global Health Observatory. (from website)

  15. Z

    Data from: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark...

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiancheng Yang (2023). MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4269851
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Bingbing Ni
    Rui Shi
    Jiancheng Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data repository for MedMNIST v1 is out of date! Please check the latest version of MedMNIST v2.

    Abstract

    We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.

    Please note that this dataset is NOT intended for clinical use.

    We recommend our official code to download, parse and use the MedMNIST dataset:

    pip install medmnist

    Citation and Licenses

    If you find this project useful, please cite our ISBI'21 paper as: Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.

    or using bibtex: @article{medmnist, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, journal={arXiv preprint arXiv:2010.14925}, year={2020} }

    Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.

    PathMNIST

    Jakob Nikolas Kather, Johannes Krisam, et al., "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study," PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.

    License: CC BY 4.0

    ChestMNIST

    Xiaosong Wang, Yifan Peng, et al., "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases," in CVPR, 2017, pp. 3462–3471.

    License: CC0 1.0

    DermaMNIST

    Philipp Tschandl, Cliff Rosendahl, and Harald Kittler, "The ham10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions," Scientific data, vol. 5, pp. 180161, 2018.

    Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, and Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; arXiv:1902.03368.

    License: CC BY-NC 4.0

    OCTMNIST/PneumoniaMNIST

    Daniel S. Kermany, Michael Goldbaum, et al., "Identifying medical diagnoses and treatable diseases by image-based deep learning," Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.

    License: CC BY 4.0

    RetinaMNIST

    DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD), "The 2nd diabetic retinopathy – grading and image quality estimation challenge," https://isbi.deepdr.org/data.html, 2020.

    License: CC BY 4.0

    BreastMNIST

    Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy, "Dataset of breast ultrasound images," Data in Brief, vol. 28, pp. 104863, 2020.

    License: CC BY 4.0

    OrganMNIST_{Axial,Coronal,Sagittal}

    Patrick Bilic, Patrick Ferdinand Christ, et al., "The liver tumor segmentation benchmark (lits)," arXiv preprint arXiv:1901.04056, 2019.

    Xuanang Xu, Fugen Zhou, et al., "Efficient multiple organ localization in ct image using 3d region proposal network," IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.

    License: CC BY 4.0

  16. MedAlign

    • redivis.com
    • stanford.redivis.com
    application/jsonl +7
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Lab (2025). MedAlign [Dataset]. http://doi.org/10.57761/5b7c-pm72
    Explore at:
    avro, arrow, sas, parquet, csv, stata, application/jsonl, spssAvailable download formats
    Dataset updated
    Mar 30, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Shah Lab
    Description

    Abstract

    MedAlign is a benchmark dataset of 983 clinician-curated natural language instructions for EHR data, grounded by 275 longitudinal EHRs. It includes reference responses for 303 instructions and supports evaluation of LLMs on healthcare-specific tasks.

    Methodology

    **IMPORTANT USAGE NOTE: **MedAlign only includes test set examples. No training examples are provided for fine-tuning models.

    1. Overview

    MedAlign is a longitudinal EHR benchmark for instruction-following with LLMs. The dataset includes:

    • 275 patients
    • 46,252 clinical notes
    • 128 clinical note types
    • 3.6 million clinical events

    %3C!-- --%3E

    2. EHR Data

    EHR data is sourced from Stanford’s STARR-OMOP database. Data are standardized in the OMOP CDM schema and are scrubbed on identifying PHI information. Complete technical details are included in the paper, but key highlights:

    • Dates are jittered within patient to conceal real dates (but preserve deltas between dates)
    • Data for patients %3E= 90 years old are removed

    %3C!-- --%3E

    • Unstructured text fields not mappable to OMOP standard concepts are redacted

    %3C!-- --%3E

    • All clinical note text has been scrubbed of PHI variables using hiding-in-plain-sight (HIPS) Carrell et al. 2013.
    • HIV test results are redacted.
    • Provider names and NPIs are redacted

    %3C!-- --%3E

    3. Instruction Following Benchmark

    See "medalign_instructions_responses_v1_2.zip" for instructions, responses, and EHR text timelines.

    Please see our Github repo to obtain code for loading the dataset.

    Usage

    Access to the MedAlign dataset requires the following:

    • Verified Affiliation (Academic, Government, Industry Research Lab). Please use your verified email address when applying, **do not use gmail or personal emails. **Applications using personal, unverified email addresses will be rejected.
    • Encryption Verification / Attestation for Data Storage
    • Signing the terms of the MedAlign Data Set License 1.0
    • Providing a short description of your intended research use of MedAlign
    • CITI Training

    %3C!-- --%3E

    **These data must remain on your encrypted machine. Redistribution of data is FORBIDDEN and will result in immediate termination of access privileges. **

    IMPORTANT NOTES:

    • Our policy on derived works aligns with PhysioNet's guidelines, requiring that these artifacts be hosted on Redivis. If you create derived research artifacts based on MedAlign (such as additional annotations or synthetic data), please contact us to discuss hosting arrangements.
    • Sending MedAlign data over a non-HIPAA-compliant API is a violation of the DUA.

    %3C!-- --%3E

    Please allow 7-10 business days to process applications.

  17. d

    Health Plan Prior Authorization Data

    • catalog.data.gov
    • data.wa.gov
    • +1more
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2024). Health Plan Prior Authorization Data [Dataset]. https://catalog.data.gov/dataset/health-plan-prior-authorization-data
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    data.wa.gov
    Description

    In 2020, the Washington State Legislature enacted Engrossed Substitute Senate Bill (ESSB) 6404 (Chapter 316, Laws of 2020, codified at RCW 48.43.0161), which requires that health carriers with at least one percent of the market share in Washington State annually report certain aggregated and de-identified data related to prior authorization to the Office of the Insurance Commissioner (OIC). Prior authorization is a utilization review tool used by carriers to review the medical necessity of requested health care services for specific health plan enrollees. Carriers choose the services that are subject to prior authorization review. The reported data includes prior authorization information for the following categories of health services: • Inpatient medical/surgical • Outpatient medical/surgical • Inpatient mental health and substance use disorder • Outpatient mental health and substance use disorder • Diabetes supplies and equipment • Durable medical equipment The carriers must report the following information for the prior plan year (PY) for their individual and group health plans for each category of services: • The 10 codes with the highest number of prior authorization requests and the percent of approved requests. • The 10 codes with the highest percentage of approved prior authorization requests and the total number of requests. • The 10 codes with the highest percentage of prior authorization requests that were initially denied and then approved on appeal and the total number of such requests. Carriers also must include the average response time in hours for prior authorization requests and the number of requests for each covered service in the lists above for: • Expedited decisions. • Standard decisions. • Extenuating-circumstances decisions. Engrossed Second Substitute House Bill 1357 added additional prescription drug prior authorization reporting requirements for health carriers beginning in reporting year 2024. Carriers were provided the opportunity to submit voluntary prescription drug prior authorization data for the 2023 reporting period. Prescription drug reporting was required for the 2024 reporting period.

  18. NPPES Healthcare Providers Database Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). NPPES Healthcare Providers Database Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/nppes-healthcare-providers-database-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    The data package contains NPI related datasets. The NPI number of all the covered health care professionals, the deactivated NPI's and dfferent codes used within the NPI dataset

  19. h

    clinical-field-mappings

    • huggingface.co
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiago Silva (2025). clinical-field-mappings [Dataset]. https://huggingface.co/datasets/tsilva/clinical-field-mappings
    Explore at:
    Dataset updated
    May 8, 2025
    Authors
    Tiago Silva
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    đźš‘ Clinical Field Mappings for Healthcare Systems

    This synthetic dataset provides a wide variety of alternative names for clinical database fields, mapping them to standardized targets for healthcare data normalization.

    Using LLMs, we generated and validated thousands of plausible variations, including misspellings, abbreviations, country-specific nuances, and common real-world typos.

    This dataset is perfect for training models that need to standardize, clean, or map heterogeneous healthcare data schemas into unified, normalized formats.

    âś… Applications include: - Data cleaning and ETL pipelines for clinical databases - Fine-tuning LLMs for schema matching - Clinical data interoperability projects - Zero-shot field matching research

    The dataset is machine-generated and validated with LLM feedback loops to ensure high-quality mappings.

  20. Canadian Clinical Drug Data Set (CCDD)

    • open.canada.ca
    csv, pdf, txt
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health Canada (2025). Canadian Clinical Drug Data Set (CCDD) [Dataset]. https://open.canada.ca/data/dataset/3e0a7b9e-a5e9-4131-bde4-ac685a1f1a38
    Explore at:
    csv, txt, pdfAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    Health Canadahttp://www.hc-sc.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    The Canadian Clinical Drug Dataset is a drug terminology and coding system designed to allow the interchange of standardized drug and medical device information between diverse digital health systems. Some use cases include electronic prescribing, electronic medical records, medication reconciliation and analytics. It also provides for the classification and identification of defined groups of medications (called special groupings), such as narcotic and controlled drugs. It has the capacity to be used by knowledge-based vendors, clinicians, researchers, statistical users, government agencies, healthcare organisations and consumers. The data source for the CCDD is the Drug Product Database (DPD) which contains information on drugs approved by Health Canada. However, the data is modeled differently following the CCDD Editorial Guidelines which take into consideration international terminology standards. For example, DPD uses the dosage form, “tablet (delayed-release)”, whereas CCDD uses the equivalent term “gastro-resistant tablet.” The Canadian Clinical Drug Data Set does not replace the Health Canada Drug Product Database (DPD) but is published in addition to it. The scope of health products included in CCDD is limited to those classified as human in DPD (veterinary, radiopharmaceutical and disinfectant products are out of scope). Some exclusions apply within the human class but are subject to periodic review: For a full list of exclusions, please see the Scope section in the CCDD Editorial Guidelines. In addition, a limited number of medical devices that are commonly prescribed and dispensed at a community pharmacy are included. This data set was developed in collaboration with Canada Health Infoway and is also available in their Terminology Gateway at https://tgateway.infoway-inforoute.ca/ccdd.html (Free login required)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). Personalized Healthcare Treatment Plans Dataset [Dataset]. https://paperswithcode.com/dataset/personalized-healthcare-treatment-plans

Personalized Healthcare Treatment Plans Dataset

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 6, 2025
Description

Problem Statement

👉 Download the case studies here

Healthcare providers often rely on generalized treatment protocols that may not address the unique needs of individual patients. This approach led to variability in treatment outcomes, reduced efficacy, and limited patient satisfaction. A leading hospital sought a solution to develop personalized treatment plans tailored to each patient’s medical history, genetic profile, and current health status.

Challenge

Implementing a personalized healthcare treatment system involved overcoming the following challenges:

Integrating diverse patient data, including medical history, lab results, genetic information, and lifestyle factors.

Developing predictive models capable of identifying optimal treatment plans for individual patients.

Ensuring compliance with privacy regulations and maintaining data security throughout the process.

Solution Provided

An advanced healthcare treatment recommendation system was developed using machine learning models and predictive analytics. The solution was designed to:

Analyze patient data to identify patterns and predict treatment outcomes.

Recommend individualized treatment plans optimized for efficacy and patient preferences.

Continuously learn and adapt to improve recommendations based on new medical insights and patient feedback.

Development Steps

Data Collection

Aggregated data from electronic health records (EHR), genetic testing reports, and patient-provided health information.

Preprocessing

Standardized and anonymized data to ensure accuracy, consistency, and compliance with healthcare privacy regulations.

Model Development

Trained machine learning models to identify correlations between patient characteristics and treatment outcomes. Developed predictive algorithms to recommend personalized treatment plans for conditions like chronic diseases, cancer, and rare disorders.

Validation

Tested the system on historical patient data to evaluate its accuracy in predicting successful treatment outcomes.

Deployment

Integrated the solution into the hospital’s clinical decision support systems, enabling healthcare providers to access personalized treatment recommendations during consultations.

Continuous Monitoring & Improvement

Established a feedback mechanism to refine models using real-world treatment outcomes and patient satisfaction data.

Results

Improved Patient Outcomes

The system delivered personalized treatment recommendations that significantly improved recovery rates and health outcomes.

Increased Treatment Efficacy

Optimized treatment plans reduced trial-and-error approaches, leading to more effective interventions and fewer side effects.

Personalized Healthcare Experiences

Patients reported higher satisfaction levels due to treatment plans tailored to their individual needs and preferences.

Enhanced Decision-Making

Healthcare providers benefited from data-driven insights, enabling more informed and confident decisions.

Scalable and Future-Ready Solution

The system scaled seamlessly to support diverse medical specialties and adapted to incorporate emerging medical research.

Search
Clear search
Close search
Google apps
Main menu