1 dataset found

EHRSHOT
redivis.com
application/jsonl +7
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83
Explore at:
avro, sas, parquet, spss, csv, stata, arrow, application/jsonlAvailable download formats
Unique identifier
https://doi.org/10.57761/0gv9-nd83
Dataset updated
Feb 13, 2025
Dataset provided by
Redivis Inc.
Authors
Shah Lab
Description
Abstract

👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

Methodology

1. 📖 Overview

EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

**6,739 **patients

41.6 million clinical events

921,499 visits

15 prediction tasks

%3C!-- --%3E

2. 💽 Dataset

EHRSHOT is sourced from Stanford’s STARR-OMOP database.

Data follows the OMOP CDM and is fully de-identified.

Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.

EHRSHOT does not contain clinical notes or images.

%3C!-- --%3E

We provide two versions of the dataset:

EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.

EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

%3C!-- --%3E

To access the raw data, please see the "Tables" and "Files"** **tabs above:

3. 💽 Data Files and Formats

We provide EHRSHOT in two file formats:

OMOP CDM v5.4

Medical Event Data Standard (MEDS)

%3C!-- --%3E

Within the "Tables" tab...

1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

Within the "Files" tab...

1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

* Dataset Version: EHRSHOT-Original

* Data Format: FEMR 0.1.16

* Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

* Dataset Version: EHRSHOT-Original

* Data Format: MEDS 0.3.3

* Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

* Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

* Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

4. 🤖 Model

We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

**5. 🧑‍💻 Code **

Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

Usage

**NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

Access to the EHRSHOT dataset requires the following:

Verified Affiliation with an **Academic, Government, **o
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83

EHRSHOT

Explore at:

43 scholarly articles cite this dataset (View in Google Scholar)

avro, sas, parquet, spss, csv, stata, arrow, application/jsonlAvailable download formats

Unique identifier

https://doi.org/10.57761/0gv9-nd83

Dataset updated

Feb 13, 2025

Dataset provided by

Redivis Inc.

Authors

Shah Lab

Description

Abstract

👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

Methodology

1. 📖 Overview

EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

**6,739 **patients
41.6 million clinical events
921,499 visits
15 prediction tasks

%3C!-- --%3E

2. 💽 Dataset

EHRSHOT is sourced from Stanford’s STARR-OMOP database.

Data follows the OMOP CDM and is fully de-identified.
Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.
EHRSHOT does not contain clinical notes or images.

%3C!-- --%3E

We provide two versions of the dataset:

EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.
EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

%3C!-- --%3E

To access the raw data, please see the "Tables" and "Files"** **tabs above:

3. 💽 Data Files and Formats

We provide EHRSHOT in two file formats:

OMOP CDM v5.4
Medical Event Data Standard (MEDS)

%3C!-- --%3E

Within the "Tables" tab...

1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

Within the "Files" tab...

1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

* Dataset Version: EHRSHOT-Original

* Data Format: FEMR 0.1.16

* Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

* Dataset Version: EHRSHOT-Original

* Data Format: MEDS 0.3.3

* Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

* Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

* Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

4. 🤖 Model

We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

**5. 🧑‍💻 Code **

Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

Usage

**NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

Access to the EHRSHOT dataset requires the following:

Verified Affiliation with an **Academic, Government, **o

Clear search

Close search

Google apps

Main menu