91 datasets found

p
MIMIC-III Clinical Database Demo
physionet.org
Updated Apr 24, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Tom Pollard; Roger Mark (2019). MIMIC-III Clinical Database Demo [Dataset]. http://doi.org/10.13026/C2HM2Q
Explore at:
Unique identifier
https://doi.org/10.13026/C2HM2Q
Dataset updated
Apr 24, 2019
Authors
Alistair Johnson; Tom Pollard; Roger Mark
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [1]. The MIMIC-III Clinical Database is available on PhysioNet (doi: 10.13026/C2XW26). Though deidentified, MIMIC-III contains detailed information regarding the care of real patients, and as such requires credentialing before access. To allow researchers to ascertain whether the database is suitable for their work, we have manually curated a demo subset, which contains information for 100 patients also present in the MIMIC-III Clinical Database. Notably, the demo dataset does not include free-text notes.
O
Clinical Admission Notes from MIMIC-III
opendatalab.com
paperswithcode.com
zip
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charité – Berlin University of Medicine (2022). Clinical Admission Notes from MIMIC-III [Dataset]. https://opendatalab.com/OpenDataLab/Clinical_Admission_Notes_from_etc
Explore at:
zip(282276 bytes)Available download formats
Dataset updated
Sep 21, 2022
Dataset provided by
Charité – Berlin University of Medicine
Beuth University of Applied Sciences Berlin
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset is created from MIMIC-III (Medical Information Mart for Intensive Care III) and contains simulated patient admission notes. The clinical notes contain information about a patient at admission time to the ICU and are labelled for four outcome prediction tasks: Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay. To obtain the data one first has to gain access to the MIMIC-III dataset and then run the scripts introduced in the linked repository.
P
MIMIC-IV v2.2 Dataset
paperswithcode.com
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). MIMIC-IV v2.2 Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-v2-2
Explore at:
Dataset updated
Feb 24, 2025
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
p
Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database...
physionet.org
Updated Jan 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Yue; Xinliang Frederick Zhang; Huan Sun (2021). Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database [Dataset]. http://doi.org/10.13026/j0y6-bw05
Explore at:
Unique identifier
https://doi.org/10.13026/j0y6-bw05
Dataset updated
Jan 15, 2021
Authors
Xiang Yue; Xinliang Frederick Zhang; Huan Sun
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Clinical question answering (QA) (or reading comprehension) aims to automatically answer questions from medical professionals based on clinical texts. We release this dataset, which contains 1287 annotated QA pairs on 36 sampled discharge summaries from MIMIC-III Clinical Notes, to facilitate the clinical question answering task. Questions in our dataset are either verified or directly generated by clinical experts.

Note that the primary purpose of this dataset is to test the generalizability of a QA model, i.e., whether a QA model that is trained on other datasets can answer questions on this dataset (which may have a different distribution compared with the training data), rather than to train a QA model. Hence the scale of our annotations is relatively small compared to some existing QA datasets.
MIMIC_III_IPI - Discharge Summaries from MIMIC-III with Indirect Personal...
zenodo.org
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahim Baroud; Ibrahim Baroud; Lisa Raithel; Lisa Raithel; Sebastian Möller; Sebastian Möller; Roland Roller; Roland Roller (2025). MIMIC_III_IPI - Discharge Summaries from MIMIC-III with Indirect Personal Identifiers Annotations [Dataset]. http://doi.org/10.5281/zenodo.15044596
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15044596
Dataset updated
Mar 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ibrahim Baroud; Ibrahim Baroud; Lisa Raithel; Lisa Raithel; Sebastian Möller; Sebastian Möller; Roland Roller; Roland Roller
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
MIMIC_III_IPI - Discharge Summaries from Medical Information Mart for Intensive Care-III with Indirect Personal Identifiers Annotations

The discharge summaries we use for demonstrating our Indirect Personal Identifiers (IPI) schema are randomly sampled from the Medical Information Mart for Intensive Care (MIMIC-III) dataset. MIMIC-III comprises health-related data from over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. Among other types of data, such as patient demographics, the database also includes various types of textual data, such as diagnostic reports and discharge summaries. We chose discharge summaries for our study, since these are richer in information than other notes in MIMIC-III. Details:

Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.

Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3, 160035. https://doi.org/10.1038/sdata.2016.35

This is the Discharge Summaries from MIMIC-III with Indirect Personal Identifiers Annotations as an external source of the paper accepted at the PrivateNLP workshop at NAACL 2025, a preprint can be found in:

Baroud, I., Raithel, L., Möller, S., & Roller, R. (2025). Beyond De-Identification: A Structured Approach for Defining and Detecting Indirect Identifiers in Medical Texts. arXiv preprint arXiv:2502.13342.

This repository contains the annotations in a CSV file and the annotation guidelines document. Inspecting the exact annotation texts requires access to the MIMIC-III Clinical Database, see https://physionet.org/content/mimiciii/1.4/. Each row in the CSV file has an ID together with a list of the IPI annotated spans, each in the format {"start": ,"end": ,"label": }. The ID in the ipi_annotations.csv table corresponds to the same ROW_ID in the MIMIC-III NOTEEVENTS.csv table and can be used for merging the tables to inspect the original documents and reconstruct the annotations using the offsets.

Please note that only authenticated users can request access to review and download the annotations and guidelines. If you encounter any issues, feel free to reach out to the contact person.
p
Data from: MIMIC-III and eICU-CRD: Feature Representation by FIDDLE...
physionet.org
Updated Apr 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shengpu Tang; Parmida Davarmanesh; Yanmeng Song; Danai Koutra; Michael Sjoding; Jenna Wiens (2021). MIMIC-III and eICU-CRD: Feature Representation by FIDDLE Preprocessing [Dataset]. http://doi.org/10.13026/2qtg-k467
Explore at:
Unique identifier
https://doi.org/10.13026/2qtg-k467
Dataset updated
Apr 28, 2021
Authors
Shengpu Tang; Parmida Davarmanesh; Yanmeng Song; Danai Koutra; Michael Sjoding; Jenna Wiens
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
This is a preprocessed dataset derived from patient records in MIMIC-III and eICU, two large-scale electronic health record (EHR) databases. It contains features and labels for 5 prediction tasks involving 3 adverse outcomes (prediction times listed in parentheses): in-hospital mortality (48h), acute respiratory failure (4h and 12h), and shock (4h and 12h). We extracted comprehensive, high-dimensional feature representations (up to ~8,000 features) using FIDDLE (FlexIble Data-Driven pipeLinE), an open-source preprocessing pipeline for structured clinical data. These 5 prediction tasks were designed in consultation with a critical care physician for their clinical importance, and were used as part of the proof-of-concept experiments in the original paper to demonstrate FIDDLE's utility in aiding the feature engineering step of machine learning model development. The intent of this release is to share preprocessed MIMIC-III and eICU datasets used in the experiments to support and enable reproducible machine learning research on EHR data.
f
Data_Sheet_1_Machine Learning Prediction Models for Mechanically Ventilated...
frontiersin.figshare.com
pdf
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yibing Zhu; Jin Zhang; Guowei Wang; Renqi Yao; Chao Ren; Ge Chen; Xin Jin; Junyang Guo; Shi Liu; Hua Zheng; Yan Chen; Qianqian Guo; Lin Li; Bin Du; Xiuming Xi; Wei Li; Huibin Huang; Yang Li; Qian Yu (2023). Data_Sheet_1_Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.pdf [Dataset]. http://doi.org/10.3389/fmed.2021.662340.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2021.662340.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers
Authors
Yibing Zhu; Jin Zhang; Guowei Wang; Renqi Yao; Chao Ren; Ge Chen; Xin Jin; Junyang Guo; Shi Liu; Hua Zheng; Yan Chen; Qianqian Guo; Lin Li; Bin Du; Xiuming Xi; Wei Li; Huibin Huang; Yang Li; Qian Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission.Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported.Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate.Conclusion: The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.
Additional file 1 of A novel nomogram to predict mortality in patients with...
figshare.com
springernature.figshare.com
txt
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiao-Dan Li; Min-Min Li (2023). Additional file 1 of A novel nomogram to predict mortality in patients with stroke: a survival analysis based on the MIMIC-III clinical database [Dataset]. http://doi.org/10.6084/m9.figshare.19533957.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19533957.v1
Dataset updated
Jun 5, 2023
Dataset provided by
figshare
Authors
Xiao-Dan Li; Min-Min Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Raw data of relevant clinical data of stroke patients.
Z
MIMIC PERform Datasets
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter H Charlton (2022). MIMIC PERform Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6807402
Explore at:
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Peter H Charlton
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Overview

The MIMIC PERform datasets contain physiological signals recorded from critically-ill patients during routine clinical care. Specifically, the datasets contain the following signals:

electrocardiogram (ECG)

photoplethysmogram (PPG)

impedance pneumography (imp), also known as respiratory (resp)

The datasets were extracted from the MIMIC III Waveform Database. Further details of the datasets are provided in the documentation accompanying the ppg-beats project, which is available at: https://ppg-beats.readthedocs.io/en/latest/ .

Datasets

The following datasets are available:

MIMIC PERform AF Dataset: Recordings from 35 critically-ill adults during routine clinical care, categorised as either AF (atrial fibrillation, 19 subjects) or non-AF (16 subjects).

Matlab format (AF subjects, non-AF subjects)

WFDB format (AF subjects, non-AF subjects)

CSV format (AF subjects, non-AF subjects)

MIMIC PERform Training Dataset: Recordings from 200 patients during routine clinical care, who are categorised as either adults (100 subjects) or neonates (100 subjects).

Matlab format (all data, adults, neonates)

WFDB format (all data, adults, neonates)

CSV format (all data, adults, neonates)

MIMIC PERform Testing Dataset: Recordings from 200 patients during routine clinical care, who are categorised as either adults (100 subjects) or neonates (100 subjects).

Matlab format (all data, adults, neonates)

WFDB format (all data, adults, neonates)

CSV format (all data, adults, neonates)

Citation

When using these datasets, please cite the following publication:

Charlton PH et al. Detecting beats in the photoplethysmogram: benchmarking open-source algorithms. Physiological Measurement 2022. DOI: 10.1088/1361-6579/ac826d

Acknowledgments

Each dataset is accompanied by a licence which acknowledges the source(s) of the data - please see the individual licenses for these acknowledgements.
p
MIMIC-III-Ext-tPatchGNN
physionet.org
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenlong Yin; Weijia Zhang (2025). MIMIC-III-Ext-tPatchGNN [Dataset]. http://doi.org/10.13026/ckn0-3868
Explore at:
Unique identifier
https://doi.org/10.13026/ckn0-3868
Dataset updated
Apr 9, 2025
Authors
Chenlong Yin; Weijia Zhang
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
This dataset is a curated subset of MIMIC-III (v1.4), specifically formatted to facilitate reproducibility of the experiments in the work t-PatchGNN. It serves as part of a benchmark designed for forecasting irregular multivariate clinical time series, that is, given a set of historical Irregular Multivariate Time Series (IMTS) observations and forecasting queries, the forecasting problem aims to accurately forecast the values in correspondence to these queries. This requires addressing key challenges such as missing data, variable sampling rates, and complex temporal dependencies. The dataset includes patient records with diverse physiological measurements, each sampled at irregular intervals, reflecting real-world clinical scenarios. It is structured to capture both short-term and long-term temporal patterns, making it well-suited for evaluating machine learning models in medical time series forecasting. By providing a standardized benchmark, this dataset aims to advance research in predictive modeling for healthcare, enabling the development of robust algorithms that can handle irregular and sparse clinical data. The dataset’s applications extend to critical areas such as early disease detection, patient risk stratification, and treatment outcome prediction, making it a valuable resource for the medical AI and machine learning communities.
p
MIMIC-III - SequenceExamples for TensorFlow modeling
physionet.org
Updated Sep 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Kemp; Kun Zhang; Andrew Dai (2020). MIMIC-III - SequenceExamples for TensorFlow modeling [Dataset]. http://doi.org/10.13026/n2v5-5b32
Explore at:
Unique identifier
https://doi.org/10.13026/n2v5-5b32
Dataset updated
Sep 29, 2020
Authors
Jonas Kemp; Kun Zhang; Andrew Dai
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
This dataset contains TensorFlow SequenceExamples derived from patient records in MIMIC-III, a freely available set of deidentified medical records from critical care patients at Beth Israel Deaconess Medical Center. Each SequenceExample converts data from an individual patient encounter and any previous encounters into a set of timestamped “feature lists” describing the patient history up to a certain time, beyond which predictions can be made. These data are suitable for direct input into TensorFlow modeling pipelines, and include labels for inpatient mortality and discharge diagnosis codes for each encounter. The intent of this release is to provide a preprocessed, ready-to-use version of MIMIC-III to support and enable reproducible machine learning research for electronic health records.
d
Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II)
catalog.data.gov
healthdata.gov
+3more
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (NIH) (2023). Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) [Dataset]. https://catalog.data.gov/dataset/multiparameter-intelligent-monitoring-in-intensive-care-ii-mimic-ii
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Institutes of Health (NIH)
Description
The objective of this Bioengineering Research Partnership is to focus the resources of a powerful interdisciplinary team from academia (MIT), industry (Philips Medical Systems) and clinical medicine (Beth Israel Deaconess Medical Center) to develop and evaluate advanced ICU patient monitoring systems that will substantially improve the efficiency, accuracy and timeliness of clinical decision making in intensive care.
f
SQL code.
plos.figshare.com
7z
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). SQL code. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.s001
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276835.s001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The code is about how to extract data from the MIMIC-III. (7Z)
P
MIMIC-IV ICD-9 Dataset
paperswithcode.com
Updated Apr 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Edin; Alexander Junge; Jakob D. Havtorn; Lasse Borgholt; Maria Maistro; Tuukka Ruotsalo; Lars Maaløe (2023). MIMIC-IV ICD-9 Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-icd-9
Explore at:
Dataset updated
Apr 20, 2023
Authors
Joakim Edin; Alexander Junge; Jakob D. Havtorn; Lasse Borgholt; Maria Maistro; Tuukka Ruotsalo; Lars Maaløe
Description
MIMIC-IV ICD-9 contains 209,326 discharge summaries—free-text medical documents—annotated with ICD-9 diagnosis and procedure codes. It contains data for patients admitted to the Beth Israel Deaconess Medical Center emergency department or ICU between 2008-2019. All codes with fewer than ten examples have been removed, and the train-val-test split was created using multi-label stratified sampling. The dataset is described further in Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study, and the code to use the dataset is found here.

The dataset is intended for medical code prediction and was created using MIMIC-IV v2.2 and MIMIC-IV-NOTE v2.2. Using the two datasets requires a license obtained in Physionet; this can take a couple of days.
p
Data from: MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief...
physionet.org
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip Chung; Akshay Swaminathan; Alex Goodell; Yeasul Kim; Momsen Reincke; Lichy Han; Ben Deverett; Mohammad Amin Sadeghi; Abdel badih El Ariss; Marc Ghanem; David Seong; Andrew Lee; Caitlin Coombes; Brad Bradshaw; Mahir Sufian; Hyo Jung Hong; Teresa Nguyen; Mohammad Rasouli; Komal Kamra; Mark Burbridge; James McAvoy; Roya Saffary; Stephen Parnell Ma; Dev Dash; James Xie; Ellen Wang; Cliff Schmiesing; Nigam Shah; Nima Aghaeepour (2025). MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation [Dataset]. http://doi.org/10.13026/abat-g475
Explore at:
Unique identifier
https://doi.org/10.13026/abat-g475
Dataset updated
Apr 9, 2025
Authors
Philip Chung; Akshay Swaminathan; Alex Goodell; Yeasul Kim; Momsen Reincke; Lichy Han; Ben Deverett; Mohammad Amin Sadeghi; Abdel badih El Ariss; Marc Ghanem; David Seong; Andrew Lee; Caitlin Coombes; Brad Bradshaw; Mahir Sufian; Hyo Jung Hong; Teresa Nguyen; Mohammad Rasouli; Komal Kamra; Mark Burbridge; James McAvoy; Roya Saffary; Stephen Parnell Ma; Dev Dash; James Xie; Ellen Wang; Cliff Schmiesing; Nigam Shah; Nima Aghaeepour
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The VeriFact-BHC dataset is designed to verify the factuality of long-form text written about a patient against their own electronic health record. There is increasing interest in using large language models (LLMs) to generate clinical text in patient care applications, yet this text needs to be evaluated for factual errors and hallucinations prior to committing text to a patient’s permanent medical record. Text written about a patient should be internally consistent with information already known about the patient, such as that stored in their medical records. VeriFact-BHC contains long-form Brief Hospital Course (BHC) clinical narratives typically found in a discharge summary that have been decomposed into text proposition statements. From 100 patients in the MIMIC-III Clinical Database v1.4, we consider two types of BHC text: a human-written BHC and a LLM-generated BHC. The original human clinician-written BHC is extracted from the discharge summary note. The LLM-generated BHC is composed by a LLM using the patient’s longitudinal clinical notes from the hospital admission. Each BHC is decomposed in two ways: sentence propositions and atomic claim propositions. The remaining electronic health record (EHR) notes for each patient serves as a patient-specific reference of facts that is used by clinicians and VeriFact to assign labels. A total of 13,070 propositions are annotated by multiple clinicians with a ground truth established via majority voting and manual adjudication. Also provided are labels assigned by the VeriFact artificial intelligence system and labels assessing whether propositions are valid from a first-order logic standpoint. The reference EHR for each patient is provided in both machine-readable and PDF formats. By offering this dataset, we hope to spur further investigation and creation of computational systems for automatic chart review and patient-specific fact verification. We invite the research community to utilize this dataset to develop better methods to guardrail patient-specific LLM-generated clinical text.
n
MIMIC II
neuinfo.org
dknet.org
+1more
Updated Sep 4, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2010). MIMIC II [Dataset]. http://identifiers.org/RRID:SCR_013237
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013237
Dataset updated
Sep 4, 2010
Description
MIMIC II (Multiparameter Intelligent Monitoring in Intensive Care) Database contains comprehensive clinical data from tens of thousands of Intensive Care Unit (ICU) patients. Data were collected between 2001 and 2008 from a variety of ICUs (medical, surgical, coronary care, and neonatal) in a single tertiary teaching hospital. The database contains clinical data from bedside workstations as well as hospital archives. The database also includes thousands of records of continuous high-resolution physiologic waveforms and minute-by-minute numeric time series (trends) of physiologic measurements.
PDD Graph
kaggle.com
Updated Jun 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xjtushilei (2017). PDD Graph [Dataset]. https://www.kaggle.com/xjtushilei/pdd-graph/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
xjtushilei
Description
Online

Website

Github

DataHub

SPARQL endpoint

You can query some of the data online there. There is also the download link. Of course you can download it here.

Context

Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Facing with patients symptoms, experienced caregivers make right medical decisions based on their professional knowledge that accurately grasps relationships between symptoms, diagnosis, and treatments. We aim to capture these relationships by constructing a large and high-quality heterogeneous graph linking patients, diseases, and drugs (PDD) in EMRs.

Content

Specifically, we extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented is accessible on the Web via the SPARQL endpoint, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.

A subgraph of PDD is illustrated in the followng figure to betterunderstand the PDD graph.

https://github.com/wangmengsd/pdd-graph/raw/master/example.png" alt="enter image description here">

Acknowledgements

Author

Data set belongs to Meng Wang, Jiaheng Zhang, Jun Liu,Wei Hu, Sen Wang, , Wenqiang Liu and Lei Shi

They come from： 1. MOEKLINNS lab, Xi’an Jiaotong University, Xi’an, China 2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 3. Griffith Universtiy, Gold Coast Campus, Australia

Some Email： - Meng Wang：wangmengsd@stu.xjtu.edu.cn - Lei Shi：xjtushilei@foxmail.com - Jun Liu：liukeen@xjtu.edu.cn

Research

The paper is being reviewed and is not easily disclosed.So it can't be linked here.

Inspiration

If you have any questions, please contact the email address above.

Do you have any suggestions ? And send them to an e-mail address above.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

### If your article needs to be reference our work , you can reference our github.
Z
Structure Annotations of Assessment and Plan Sections from MIMIC-III
data.niaid.nih.gov
zenodo.org
Updated Apr 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajkomar, Alvin (2022). Structure Annotations of Assessment and Plan Sections from MIMIC-III [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6413404
Explore at:
Dataset updated
Apr 17, 2022
Dataset provided by
Matias, Yossi
Hassidim, Avinatan
Benjamini, Ayelet
Barequet, Ronnie
Rajkomar, Alvin
Feder, Amir
Oren, Eyal
Ofek, Eran
Stupp, Doron
Lee, I-Ching
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the "assessment and plan". This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. To structure these sections we collected a dataset of annotations over assessment and plan sections from the publicly available and de-identified MIMIC-III dataset, and developed deep-learning based models to perform this task, described in the associated paper available as a pre-print at: https://www.medrxiv.org/content/10.1101/2022.04.13.22273438v1

When using this data please cite our paper:

@article {Stupp2022.04.13.22273438, author = {Stupp, Doron and Barequet, Ronnie and Lee, I-Ching and Oren, Eyal and Feder, Amir and Benjamini, Ayelet and Hassidim, Avinatan and Matias, Yossi and Ofek, Eran and Rajkomar, Alvin}, title = {Structured Understanding of Assessment and Plans in Clinical Documentation}, year = {2022}, doi = {10.1101/2022.04.13.22273438}, publisher = {Cold Spring Harbor Laboratory Press}, URL = {https://www.medrxiv.org/content/early/2022/04/17/2022.04.13.22273438}, journal = {medRxiv} }

The dataset, presented here, contains annotations of assessment and plan sections of notes from the publicly available and de-identified MIMIC-III dataset, marking the active problems, their assessment description, and plan action items. Action items are additionally marked as one of 8 categories (listed below). The dataset contains over 30,000 annotations of 579 notes from distinct patients, annotated by 6 medical residents and students.

The dataset is divided into 4 partitions - a training set (481 notes), validation set (50 notes), test set (48 notes) and an inter-rater set. The inter-rater set contains the annotations of each of the raters over the test set. Rater 1 in the inter-rater set should be regarded as an intra-rater comparison (details in the paper). The labels underwent automatic normalization to capture entire word boundaries and remove flanking non-alphanumeric characters.

Code for transforming labels into TensorFlow examples and training models as described in the paper will be made available at GitHub: https://github.com/google-research/google-research/tree/master/assessment_plan_modeling

In order to use these annotations, the user additionally needs to obtain the text of the notes which is found in the NOTE_EVENTS table from MIMIC-III, access to which is to be acquired independently (https://mimic.mit.edu/)

Annotations are given as character spans in a CSV file with the following schema:

Field Type Semantics partition categorical (one of [train, val, test, interrater] The set of ratings the span belongs to. rater_id int Unique id for each the raters note_id int The note’s unique note_id, links to the MIMIC-III notes table (as ROW-ID). span_type categorical (one of [PROBLEM_TITLE, PROBLEM_DESCRIPTION, ACTION_ITEM] Type of the span as annotated by raters. char_start int Character offsets from note start char_end int action_item_type categorical (one of [MEDICATIONS, IMAGING, OBSERVATIONS_LABS, CONSULTS, NUTRITION, THERAPEUTIC_PROCEDURES, OTHER_DIAGNOSTIC_PROCEDURES, OTHER]) Type of action item if the span is an action item (empty otherwise) as annotated by raters.
s
Medical Information Mart for Intensive Care-III
scicrunch.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Medical Information Mart for Intensive Care-III [Dataset]. http://identifiers.org/RRID:SCR_017384
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_017384
Description
Collection of comprising deidentified health related data associated with patients who stayed in critical care units of Beth Israel Deaconess Medical Center between 2001 and 2012. Database includes information such as demographics, vital sign measurements made at bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital).

Facebook

Twitter

Click to copy link

Link copied

Cite

Alistair Johnson; Tom Pollard; Roger Mark (2019). MIMIC-III Clinical Database Demo [Dataset]. http://doi.org/10.13026/C2HM2Q

MIMIC-III Clinical Database Demo

Explore at:

93 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.13026/C2HM2Q

Dataset updated

Apr 24, 2019

Authors

Alistair Johnson; Tom Pollard; Roger Mark

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [1]. The MIMIC-III Clinical Database is available on PhysioNet (doi: 10.13026/C2XW26). Though deidentified, MIMIC-III contains detailed information regarding the care of real patients, and as such requires credentialing before access. To allow researchers to ascertain whether the database is suitable for their work, we have manually curated a demo subset, which contains information for 100 patients also present in the MIMIC-III Clinical Database. Notably, the demo dataset does not include free-text notes.

Clear search

Close search

Google apps

Main menu

MIMIC-III Clinical Database Demo

Clinical Admission Notes from MIMIC-III

MIMIC-IV v2.2 Dataset

MIMIC-IV

Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database...

MIMIC_III_IPI - Discharge Summaries from MIMIC-III with Indirect Personal...

Data from: MIMIC-III and eICU-CRD: Feature Representation by FIDDLE...

Data_Sheet_1_Machine Learning Prediction Models for Mechanically Ventilated...

Additional file 1 of A novel nomogram to predict mortality in patients with...

MIMIC PERform Datasets

MIMIC-III-Ext-tPatchGNN

MIMIC-III - SequenceExamples for TensorFlow modeling

Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II)

SQL code.

MIMIC-IV ICD-9 Dataset

Data from: MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief...

MIMIC II

PDD Graph

Online

Context

Content

Acknowledgements

Author

Research

Inspiration

License

Structure Annotations of Assessment and Plan Sections from MIMIC-III

Medical Information Mart for Intensive Care-III

MIMIC-III Clinical Database DemoSee More Versions

MIMIC-III Clinical Database Demo