đź‘‚đź’‰ EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.
⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab
⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.
1. đź“– Overview
EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:
%3C!-- --%3E
2. đź’˝ Dataset
EHRSHOT is sourced from Stanford’s STARR-OMOP database.
%3C!-- --%3E
We provide two versions of the dataset:
%3C!-- --%3E
To access the raw data, please see the "Tables" and "Files"** **tabs above:
3. đź’˝ Data Files and Formats
We provide EHRSHOT in two file formats:
%3C!-- --%3E
Within the "Tables" tab...
1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.
Within the "Files" tab...
1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E
* Dataset Version: EHRSHOT-Original
* Data Format: FEMR 0.1.16
* Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.
2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E
* Dataset Version: EHRSHOT-Original
* Data Format: MEDS 0.3.3
* Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.
3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8
* Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.
4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8
* Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.
4. 🤖 Model
We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base
**5. 🧑‍💻 Code **
Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/
**NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **
Access to the EHRSHOT dataset requires the following:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
đź‘‚đź’‰ EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.
⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab
⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.
1. đź“– Overview
EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:
%3C!-- --%3E
2. đź’˝ Dataset
EHRSHOT is sourced from Stanford’s STARR-OMOP database.
%3C!-- --%3E
We provide two versions of the dataset:
%3C!-- --%3E
To access the raw data, please see the "Tables" and "Files"** **tabs above:
3. đź’˝ Data Files and Formats
We provide EHRSHOT in two file formats:
%3C!-- --%3E
Within the "Tables" tab...
1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.
Within the "Files" tab...
1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E
* Dataset Version: EHRSHOT-Original
* Data Format: FEMR 0.1.16
* Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.
2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E
* Dataset Version: EHRSHOT-Original
* Data Format: MEDS 0.3.3
* Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.
3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8
* Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.
4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E
* Dataset Version: EHRSHOT-OMOP
* Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8
* Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.
4. 🤖 Model
We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base
**5. 🧑‍💻 Code **
Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/
**NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **
Access to the EHRSHOT dataset requires the following: