25 datasets found
  1. m

    EHR Dataset for Patient Treatment Classification

    • data.mendeley.com
    • paperswithcode.com
    Updated May 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mujiono Sadikin (2020). EHR Dataset for Patient Treatment Classification [Dataset]. http://doi.org/10.17632/7kv3rctx7m.1
    Explore at:
    Dataset updated
    May 10, 2020
    Authors
    Mujiono Sadikin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.

  2. P

    EHR-Rel Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Jun 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudia Schulz; Josh Levy-Kramer; Camille Van Assel; Miklos Kepes; Nils Hammerla (2022). EHR-Rel Dataset [Dataset]. https://paperswithcode.com/dataset/ehr-rel
    Explore at:
    Dataset updated
    Jun 29, 2022
    Authors
    Claudia Schulz; Josh Levy-Kramer; Camille Van Assel; Miklos Kepes; Nils Hammerla
    Description

    EHR-RelB is a benchmark dataset for biomedical concept relatedness, consisting of 3630 concept pairs sampled from electronic health records (EHRs). EHR-RelA is a smaller dataset of 111 concept pairs, which are mainly unrelated.

  3. INSPECT EHR

    • redivis.com
    application/jsonl +7
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Lab (2025). INSPECT EHR [Dataset]. http://doi.org/10.57761/ak51-d519
    Explore at:
    avro, csv, parquet, stata, arrow, spss, application/jsonl, sasAvailable download formats
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Shah Lab
    Description

    Abstract

    The INSPECT dataset (Integrating Numerous Sources for Prognostic Evaluation of Clinical Timelines) contains de-identified longitudinal electronic health records (EHRs) from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. It includes 19,390 patients EHRs linked to 23,248 CTPA studies with paired radiology impressions.

    Methodology

    https://redivis.com/fileUploads/282601b3-2c4b-4de2-a84c-742037a916cd%3E" alt="inspect-logo.png">

    1. Overview

    INSPECT is a large-scale 3D multimodal medical imaging dataset:

    • 19,390 patients
    • 23,248 CT scans
    • 225+ million clinical events
    • 3 linked modalities

    %3C!-- --%3E

    2. CT Scans + Radiology Impression Notes

    Imaging data are available for download from the Stanford AIMI Center.

    3. EHR Data

    EHR data is sourced from Stanford’s STARR-OMOP database. Data are standardized in the OMOP CDM schema and are fully de-identified. Complete technical details are included in the paper, but key highlights:

    • Dates are jittered within patient to conceal real dates (but preserve deltas between dates)
    • Data for patients %3E= 90 years old are removed
    • Data for minors %3C18 are removed
    • Unstructured text fields not mappable to OMOP standard concepts are redacted
    • All clinical note text is redacted
    • HIV test result are redacted.
    • Provider names and NPIs are redacted

    %3C!-- --%3E

    Please see our Github repo to obtain code for loading the dataset, including a full data preprocessing pipeline for reproducibility, and running a set of pretrained baseline models

    Usage

    Access to the INSPECT dataset requires the following:

    • Verified Affiliation (Academic, Government, Industry Research Lab). Please use your verified email address when applying, do not use gmail or personal emails.
    • Encryption Verification / Attestation for Data Storage
    • Signing the terms of the INSPECT Data Set License 1.0
    • Providing a short description of your intended research use of INSPECT
    • CITI Training

    %3C!-- --%3E

    **These data must remain on your encrypted machine. Redistribution of data is FORBIDDEN and will result in immediate termination of access privileges. **

    IMPORTANT NOTES:

    • Our policy on derived works aligns with PhysioNet's guidelines, requiring that these artifacts be hosted on Redivis. If you create derived research artifacts based on INSPECT EHR (such as additional annotations or synthetic data), please contact us to discuss hosting arrangements.
    • Sending INSPECT data over a non-HIPAA-compliant API is a violation of the DUA.

    %3C!-- --%3E

    Please allow 7-10 business days to process applications.

  4. u

    Example (synthetic) electronic health record data

    • rdr.ucl.ac.uk
    application/csv
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Harris; Wai Shing Lai (2024). Example (synthetic) electronic health record data [Dataset]. http://doi.org/10.5522/04/25676298.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    University College London
    Authors
    Steve Harris; Wai Shing Lai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These data are modelled using the OMOP Common Data Model v5.3.Correlated Data SourceNG tube vocabulariesGeneration RulesThe patient’s age should be between 18 and 100 at the moment of the visit.Ethnicity data is using 2021 census data in England and Wales (Census in England and Wales 2021) .Gender is equally distributed between Male and Female (50% each).Every person in the record has a link in procedure_occurrence with the concept “Checking the position of nasogastric tube using X-ray”2% of person records have a link in procedure_occurrence with the concept of “Plain chest X-ray”60% of visit_occurrence has visit concept “Inpatient Visit”, while 40% have “Emergency Room Visit”NotesVersion 0Generated by man-made rule/story generatorStructural correct, all tables linked with the relationshipWe used national ethnicity data to generate a realistic distribution (see below)2011 Race Census figure in England and WalesEthnic Group : Population(%)Asian or Asian British: Bangladeshi - 1.1Asian or Asian British: Chinese - 0.7Asian or Asian British: Indian - 3.1Asian or Asian British: Pakistani - 2.7Asian or Asian British: any other Asian background -1.6Black or African or Caribbean or Black British: African - 2.5Black or African or Caribbean or Black British: Caribbean - 1Black or African or Caribbean or Black British: other Black or African or Caribbean background - 0.5Mixed multiple ethnic groups: White and Asian - 0.8Mixed multiple ethnic groups: White and Black African - 0.4Mixed multiple ethnic groups: White and Black Caribbean - 0.9Mixed multiple ethnic groups: any other Mixed or multiple ethnic background - 0.8White: English or Welsh or Scottish or Northern Irish or British - 74.4White: Irish - 0.9White: Gypsy or Irish Traveller - 0.1White: any other White background - 6.4Other ethnic group: any other ethnic group - 1.6Other ethnic group: Arab - 0.6

  5. EHRSHOT

    • redivis.com
    application/jsonl +7
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83
    Explore at:
    csv, application/jsonl, sas, parquet, stata, spss, arrow, avroAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Shah Lab
    Description

    Abstract

    👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

    ⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

    ⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

    Methodology

    1. 📖 Overview

    EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

    • **6,739 **patients
    • 41.6 million clinical events
    • 921,499 visits
    • 15 prediction tasks

    %3C!-- --%3E

    2. 💽 Dataset

    EHRSHOT is sourced from Stanford’s STARR-OMOP database.

    • Data follows the OMOP CDM and is fully de-identified.
    • Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.
    • EHRSHOT does not contain clinical notes or images.

    %3C!-- --%3E

    We provide two versions of the dataset:

    • EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.
    • EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

    %3C!-- --%3E

    To access the raw data, please see the "Tables" and "Files"** **tabs above:

    3. 💽 Data Files and Formats

    We provide EHRSHOT in two file formats:

    • OMOP CDM v5.4
    • Medical Event Data Standard (MEDS)

    %3C!-- --%3E

    Within the "Tables" tab...

    1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

    Within the "Files" tab...

    1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: FEMR 0.1.16

    * Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

    2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: MEDS 0.3.3

    * Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

    3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

    4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

    4. 🤖 Model

    We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

    **5. 🧑‍💻 Code **

    Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

    Usage

    **NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

    Access to the EHRSHOT dataset requires the following:

    • Verified Affiliation with an **Academic, Government, **o
  6. EMRBots: a 100,000-patient database

    • figshare.com
    zip
    Updated Sep 3, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uri Kartoun (2018). EMRBots: a 100,000-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040198.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Uri Kartoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A 100,000-patient database that contains in total 100,000 virtual patients, 361,760 admissions, and 107,535,387 lab observations.

  7. S

    EHR data from MIMIC-III

    • scidb.cn
    Updated Aug 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tingyi Wanyan; Hossein Honarvar; Ariful Azad; Ying Ding; Benjamin S. Glicksberg (2021). EHR data from MIMIC-III [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00094
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2021
    Dataset provided by
    Science Data Bank
    Authors
    Tingyi Wanyan; Hossein Honarvar; Ariful Azad; Ying Ding; Benjamin S. Glicksberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We conducted our experiments on de-identified EHR data from MIMIC-III. This data set contains various clinical data relating to patient admission to ICU, such as disease diagnoses in the form of International Classification of Diseases (ICD)-9 codes, and lab test results as detailed in Supplementary Materials. We collected data for 5,956 patients, extracting lab tests every hour from admission. There are a total of 409 unique lab tests and 3,387 unique disease diagnoses observed. The diagnoses were obtained as ICD-9 codes and they were represented using one-hot encoding where one represents patients with disease and zero indicates those without. We binned the lab test events into 6, 12, 24, and 48 hours prior to patient death or discharge from ICU. From these data, we performed mortality predictions that are 10-fold, cross validated.

  8. P

    Healthcare Patient Monitoring Dataset

    • paperswithcode.com
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Healthcare Patient Monitoring Dataset [Dataset]. https://paperswithcode.com/dataset/healthcare-patient-monitoring
    Explore at:
    Dataset updated
    Mar 7, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    Hospitals and healthcare providers faced challenges in ensuring continuous monitoring of patient vitals, especially for high-risk patients. Traditional monitoring methods often lacked real-time data processing and timely alerts, leading to delayed responses and increased hospital readmissions. The healthcare provider needed a solution to monitor patient health continuously and deliver actionable insights for improved care.

    Challenge

    Implementing an advanced patient monitoring system involved overcoming several challenges:

    Collecting and analyzing real-time data from multiple IoT-enabled medical devices.

    Ensuring accurate health insights while minimizing false alarms.

    Integrating the system seamlessly with hospital workflows and electronic health records (EHR).

    Solution Provided

    A comprehensive patient monitoring system was developed using IoT-enabled medical devices and AI-based monitoring systems. The solution was designed to:

    Continuously collect patient vital data such as heart rate, blood pressure, oxygen levels, and temperature.

    Analyze data in real-time to detect anomalies and provide early warnings for potential health issues.

    Send alerts to healthcare professionals and caregivers for timely interventions.

    Development Steps

    Data Collection

    Deployed IoT-enabled devices such as wearable monitors, smart sensors, and bedside equipment to collect patient data continuously.

    Preprocessing

    Cleaned and standardized data streams to ensure accurate analysis and integration with hospital systems.

    AI Model Development

    Built machine learning models to analyze vital trends and detect abnormalities in real-time

    Validation

    Tested the system in controlled environments to ensure accuracy and reliability in detecting health issues.

    Deployment

    Implemented the solution in hospitals and care facilities, integrating it with EHR systems and alert mechanisms for seamless operation.

    Continuous Monitoring & Improvement

    Established a feedback loop to refine models and algorithms based on real-world data and healthcare provider feedback.

    Results

    Enhanced Patient Care

    Real-time monitoring and proactive alerts enabled healthcare professionals to provide timely interventions, improving patient outcomes.

    Early Detection of Health Issues

    The system detected potential health complications early, reducing the severity of conditions and preventing critical events.

    Reduced Hospital Readmissions

    Continuous monitoring helped manage patient health effectively, leading to a significant decrease in readmission rates.

    Improved Operational Efficiency

    Automation and real-time insights reduced the burden on healthcare staff, allowing them to focus on critical cases.

    Scalable Solution

    The system adapted seamlessly to various healthcare settings, including hospitals, clinics, and home care environments.

  9. P

    MIMIC-III Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Apr 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark (2023). MIMIC-III Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iii
    Explore at:
    Dataset updated
    Apr 20, 2022
    Authors
    Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark
    Description

    The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.

    The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

  10. Diabetes 130-US hospitals for years 1999-2008 Data Set Raw

    • figshare.com
    txt
    Updated Mar 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Heumos; Eljas Roellin (2024). Diabetes 130-US hospitals for years 1999-2008 Data Set Raw [Dataset]. http://doi.org/10.6084/m9.figshare.25429204.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lukas Heumos; Eljas Roellin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    http://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008Details: https://github.com/theislab/ehrapy-datasetsThis is the original diabetic_data.csv file downloaded from the above link on 18 Mar 2024 under a CC BY 4.0 License.It is stored here for convenience of ehrapy users.

  11. o

    Synthetic Skin Cancer Detection Dataset

    • opendatabay.com
    .undefined
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Skin Cancer Detection Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/0536d52f-a9dd-4e31-9caf-e1a47fd836d9
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Synthetic Skin Cancer Detection Dataset is designed for educational and research purposes to analyze factors associated with skin cancer types, their diagnosis, and treatment options. The dataset includes anonymized, synthetic data on various clinical and demographic factors for individuals diagnosed with different types of skin cancer.

    Dataset Features

    • Participant ID: Unique identifier for each participant.
    • MEL: Melanoma (Yes/No).
    • NV: Melanocytic Nevus (Yes/No).
    • BCC: Basal Cell Carcinoma (Yes/No).
    • AKIEC: Actinic Keratoses (Yes/No).
    • BKL: Benign Keratosis (Yes/No).
    • DF: Dermatofibroma (Yes/No).
    • VASC: Vascular Lesions (Yes/No).

    Distribution

    https://storage.googleapis.com/opendatabay_public/0536d52f-a9dd-4e31-9caf-e1a47fd836d9/8511f90adc6c_yes_diagnosis_counts.png" alt="Distribution of Synthetic skin cancer dataset yes_diagnosis_counts.png">

    Usage

    This dataset can be used for the following applications:

    • Cancer Research: Investigate the relationship between various demographic, clinical, and tumour-type factors with the presence of skin cancer.
    • Predictive Modeling: Build machine learning models to predict the presence or progression of different skin cancer types based on participant data.
    • Clinical Research: Study the impact of various factors like lesion type, treatment options, and demographic data on skin cancer prognosis.
    • Educational Purposes: Provide a dataset for students and researchers in oncology, dermatology, and medical data science to analyze skin cancer detection and treatment patterns.

    Coverage

    This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a broad set of factors to support diverse research and analysis in the oncology and medical domains, particularly in dermatology.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Cancer Researchers: To explore correlations between clinical factors, lesion types, and treatment outcomes in skin cancer.
    • Dermatologists and Healthcare Providers: To analyze how skin lesions and treatments affect cancer progression and recovery.
    • Data Scientists and Machine Learning Practitioners: To develop predictive models for skin cancer diagnosis, classification, and prognosis.
    • Educators and Students: As a resource for studying skin cancer analytics and medical data science.
  12. R

    Emr Dataset

    • universe.roboflow.com
    zip
    Updated Nov 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ollie (2022). Emr Dataset [Dataset]. https://universe.roboflow.com/ollie-qqdet/emr-dm1j5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 16, 2022
    Dataset authored and provided by
    Ollie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Emr

    ## Overview
    
    Emr is a dataset for object detection tasks - it contains Vehicles annotations for 1,260 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. g

    INSPIRE Download Service (predefined ATOM) for Dataset Statutes II |...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSPIRE Download Service (predefined ATOM) for Dataset Statutes II | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_dfcb07b4-68ca-0002-1025-19dc2b9049d8/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Description of the INSPIRE Download Service (predefined Atom): Development plan "Statute II" of the municipality of Ehr - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface

  14. Data from: PDD Graph: Bridging Electronic Medical Records and Biomedical...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng Wang; Jiaheng Zhang; Jun Liu; Wei Hu; Sen Wang; Xue Li; Wenqiang Liu (2023). PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking [Dataset]. http://doi.org/10.6084/m9.figshare.5242138
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Meng Wang; Jiaheng Zhang; Jun Liu; Wei Hu; Sen Wang; Xue Li; Wenqiang Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Patient-drug-disease (PDD) Graph dataset, utilising Electronic medical records (EMRS) and biomedical Knowledge graphs. The novel framework to construct the PDD graph is described in the associated publication.PDD is an RDF graph consisting of PDD facts, where a PDD fact is represented by an RDF triple to indicate that a patient takes a drug or a patient is diagnosed with a disease. For instance, (pdd:274671, pdd:diagnosed, sepsis)Data files are in .nt N-Triple format, a line-based syntax for an RDF graph. These can be accessed via openly-available text edit software.diagnose_icd_information.nt - contains RDF triples mapping patients to diagnoses. For example:(pdd:18740, pdd:diagnosed, icd99592),where pdd:18740 is a patient entity, and icd99592 is the ICD-9 code of sepsis.drug_patients.nt- contains RDF triples mapping patients to drugs. For example:(pdd:18740, pdd:prescribed, aspirin),where pdd:18740 is a patient entity, and aspirin is the drug's name.Background:Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Faced with patients' symptoms, experienced caregivers make the right medical decisions based on their professional knowledge, which accurately grasps relationships between symptoms, diagnoses and corresponding treatments. In the associated paper, we aim to capture these relationships by constructing a large and high-quality heterogenous graph linking patients, diseases, and drugs (PDD) in EMRs. Specifically, we propose a novel framework to extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented in this paper is accessible on the Web via the SPARQL endpoint as well as in .nt format in this repository, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.De-identificationIt is necessary to mention that MIMIC-III contains clinical information of patients. Although the protected health information was de-identifed, researchers who seek to use more clinical data should complete an on-line training course and then apply for the permission to download the complete MIMIC-III dataset: https://mimic.physionet.org/

  15. o

    Synthetic Oral Cancer Prediction Dataset

    • opendatabay.com
    .csv
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Oral Cancer Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/09f348fc-a2e8-4132-9f1b-195765d80afc
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Synthetic Oral Cancer Prediction Dataset is designed for educational and research purposes to analyse factors associated with oral cancer risk, progression, and treatment outcomes. The dataset includes anonymised, synthetic data on various clinical, lifestyle, and demographic factors for individuals diagnosed with oral cancer.

    Dataset Features

    • ID: Unique identifier for each participant.
    • Country: Country of residence of the participant.
    • Age: Age of the participant (in years).
    • Gender: Gender of the participant (Male/Female).
    • Tobacco Use: History of tobacco use (Yes/No).
    • Alcohol Consumption: History of alcohol consumption (Yes/No).
    • HPV Infection: Presence of human papillomavirus infection (Yes/No).
    • Betel Quid Use: History of Betel quid use (Yes/No).
    • Chronic Sun Exposure: History of chronic sun exposure (Yes/No).
    • Poor Oral Hygiene: Poor oral hygiene habits (Yes/No).
    • Diet (Fruits & Vegetables Intake): Frequency of consuming fruits and vegetables (Yes/No).
    • Family History of Cancer: Family history of cancer (Yes/No).
    • Compromised Immune System: Whether the participant has a compromised immune system (Yes/No).
    • Oral Lesions: Presence of oral lesions (Yes/No).
    • Unexplained Bleeding: Presence of unexplained bleeding (Yes/No).
    • Difficulty Swallowing: Difficulty in swallowing (Yes/No).
    • White or Red Patches in Mouth: Presence of white or red patches in the mouth (Yes/No).
    • Tumor Size (cm): Size of the tumor in centimeters.
    • Cancer Stage: Stage of the oral cancer (1-4).
    • Treatment Type: Type of treatment received (e.g., Surgery, Radiation, Chemotherapy).
    • Survival Rate (5-Year, %): 5-year survival rate in percentage.
    • Cost of Treatment (USD): Total cost of treatment in USD.
    • Economic Burden (Lost Workdays per Year): Economic burden due to lost workdays each year.
    • Early Diagnosis: Whether early diagnosis was made (Yes/No).
    • Oral Cancer (Diagnosis): Diagnosis of oral cancer (Yes/No).

    Distribution

    https://storage.googleapis.com/opendatabay_public/09f348fc-a2e8-4132-9f1b-195765d80afc/622bf59174d1_plot_output.png" alt="Synthetic oral cancer dataset plot_output.png">

    Usage

    This dataset can be used for the following applications:

    • Cancer Research: Investigate the relationship between various lifestyle, clinical, and demographic factors with oral cancer risk and progression.
    • Predictive Modeling: Build machine learning models to predict cancer diagnosis, survival rate, or treatment outcomes based on participant data.
    • Healthcare and Public Health: Study the impact of lifestyle factors (e.g., tobacco, alcohol, diet) on the development and progression of oral cancer.
    • Educational Purposes: Provide a dataset for students and researchers in oncology, medical data science, and public health fields to analyze cancer risk factors and treatment outcomes.

    Coverage

    This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a wide array of factors that support diverse research and analysis in the oncology and public health domains.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Cancer Researchers: To explore correlations between lifestyle factors, clinical features, and treatment outcomes in oral cancer.
    • Oncologists and Healthcare Providers: To analyze the effectiveness of different treatments and factors that affect prognosis and survival.
    • Public Health Professionals: To study the broader societal and economic impacts of oral cancer and develop preventive measures.
    • Data Scientists and Machine Learning Practitioners: To develop predictive models for diagnosing oral cancer and improving treatment planning.
    • Educators and Students: As a resource for studying cancer risk analysis, healthcare data science, and public health analytics.
  16. M

    Data from: OLD-INSPECT: A Multimodal Dataset for Pulmonary Embolism...

    • stanfordaimi.azurewebsites.net
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Research (2024). INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis [Dataset]. https://stanfordaimi.azurewebsites.net/datasets/318f3464-c4b6-4006-9856-6f48ba40ad67
    Explore at:
    Dataset updated
    May 30, 2024
    Dataset authored and provided by
    Microsoft Research
    License

    https://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view

    Description

    Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,438 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. NOTE: this is the first part of release due to PHI review. This release has 20078 CT scans, 21,266 impression sections and the EHR modality data will be uploaded to Stanford Redivis website (https://redivis.com/Stanford)

  17. d

    Data from: Building the graph of medicine from millions of clinical...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Aug 28, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel G. Finlayson; Paea LePendu; Nigam H. Shah (2015). Building the graph of medicine from millions of clinical narratives [Dataset]. http://doi.org/10.5061/dryad.jp917
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2015
    Dataset provided by
    Dryad
    Authors
    Samuel G. Finlayson; Paea LePendu; Nigam H. Shah
    Time period covered
    2015
    Area covered
    California
    Description

    1_Cofrequency_Counts.tar.gzSee ReadMe.txt2_Singleton_Frequency_Counts.tar.gzSee ReadMe.txt provided with "1_Cofrequency_Counts.tar.gz"3_ID_Mappings.tar.gzSee ReadMe.txt provided with "1_Cofrequency_Counts.tar.gz"4_Scripts.tar.gzSee ReadMe.txt provided with "1_Cofrequency_Counts.tar.gz"

  18. o

    Synthetic Gastric Cancer Prediction Dataset

    • opendatabay.com
    .undefined
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Gastric Cancer Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/00da1b54-b118-40d4-b429-c3e12dbb6fc3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 24, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Synthetic Gastric Cancer Prediction Dataset has been generated for educational and research purposes to support the analysis of clinical, pathological, and demographic factors associated with gastric cancer. This synthetic, anonymized dataset provides valuable insights into staging, histological characteristics, and invasion patterns that influence disease progression and prognosis.

    Dataset Features

    • Patient: Unique identifier for each individual.
    • Sex: Biological sex of the patient (Male/Female).
    • Age: Age of the patient (in years).
    • T staging: Tumor size and extent of invasion into nearby tissues (e.g., T1, T2, T3).
    • N staging: Lymph node involvement status (e.g., N0, N1, N2).
    • M staging: Metastasis presence (M0: No metastasis, M1: Distant metastasis).
    • Comprehensive Staging: Combined TNM stage classification.
    • Histological Type: Cellular classification of the tumor (e.g., adenocarcinoma).
    • Lauren Classification: Intestinal (0), Mixed (1), or Diffuse (2) type.
    • Lymphovascular Invasion: Presence of cancer cells in lymphatic or blood vessels (0 = Negative, 1 = Positive).
    • Venous Invasion: Tumor cells detected in veins (0 = Negative, 1 = Positive).
    • Perineural Invasion: Cancer spread along or around nerves (0 = Negative, 1 = Positive).
    • Stroma Quantity: Tumor stroma characteristics (Medullary = 0, Intermediate = 1, Scirrhous = 2).
    • Tumor Infiltration Pattern: Qualitative assessment of the tumor's infiltration behavior.
    • HER-2: Human Epidermal Growth Factor Receptor 2 status (0 = Negative, 1 = 1+, 2 = 2+, 3 = 3+).

    Distribution

    https://storage.googleapis.com/opendatabay_public/00da1b54-b118-40d4-b429-c3e12dbb6fc3/4d500421b95e_gastric_cancer_visuals.png" alt="Synthetic dataset gastric_cancer_visuals.png">

    Usage

    This dataset can be used for the following applications:

    • Cancer Research: Study the relationship between staging, histological subtypes, and tumor invasiveness in gastric cancer.
    • Predictive Modeling: Train machine learning models to predict outcomes such as HER-2 status, metastatic potential, or overall stage.
    • Clinical Insight: Explore how combinations of clinical markers relate to patient prognosis and treatment planning.
    • Educational Purposes: Provide students and researchers with hands-on experience working with real-world-like oncology data.

    Coverage

    The dataset contains 100,000 synthetic entries with realistic variation in clinical and pathological features. It is fully anonymized and adheres to data privacy standards, supporting exploratory and predictive analysis across oncology and medical informatics.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Medical Researchers and Oncologists: To evaluate gastric cancer risk factors and invasion markers.
    • Data Scientists: To develop and benchmark predictive algorithms for cancer staging and HER-2 expression.
    • Healthcare Educators and Students: As a comprehensive tool for teaching cancer data analysis and medical modeling.
  19. R

    Estacionamiento Dataset

    • universe.roboflow.com
    zip
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ehr (2024). Estacionamiento Dataset [Dataset]. https://universe.roboflow.com/ehr-ba1xl/estacionamiento/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2024
    Dataset authored and provided by
    ehr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cars Bounding Boxes
    Description

    Estacionamiento

    ## Overview
    
    Estacionamiento is a dataset for object detection tasks - it contains Cars annotations for 1,025 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. g

    INSPIRE Download Service (predefined ATOM) for record field | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSPIRE Download Service (predefined ATOM) for record field | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_d74a5078-7687-0002-db75-d43244dbac97/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Description of the INSPIRE Download Service (predefined Atom): Development plan "Fields" of the municipality of Ehr - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mujiono Sadikin (2020). EHR Dataset for Patient Treatment Classification [Dataset]. http://doi.org/10.17632/7kv3rctx7m.1

EHR Dataset for Patient Treatment Classification

Explore at:
13 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 10, 2020
Authors
Mujiono Sadikin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.

Search
Clear search
Close search
Google apps
Main menu