12 datasets found

p
Visual Question Answering evaluation dataset for MIMIC CXR
physionet.org
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil (2025). Visual Question Answering evaluation dataset for MIMIC CXR [Dataset]. http://doi.org/10.13026/cvsk-ny21
Explore at:
Unique identifier
https://doi.org/10.13026/cvsk-ny21
Dataset updated
Jan 28, 2025
Authors
Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
MIMIC CXR [1] is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. In addition, labels for the presence of 12 different chest-related pathologies, as well as of any support devices, and overall normal/abnormal status were made available via the MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) [2] labels, which were generated using the CheXpert and NegBio algorithms.

Based on these labels, we created a visual question answering dataset comprising 224 questions for 48 cases from the official test set, and 111 questions for 23 validation cases. A majority (68%) of the questions are close-ended (answerable with yes or no), and focus on the presence of one out of 15 chest pathologies, or any support device, or generically on any abnormality, whereas the remaining open-ended questions inquire about the location, size, severity or type of a pathology/device, if present in the specific case, indicated by the MIMIC-CXR-JPG labels.

For each question and case we also provide a reference answer, which was authored by a board-certified radiologist (with 17 years of post-residency experience) based on the chest X-ray and original radiology report
p
ReXPref-Prior: A MIMIC-CXR Preference Dataset for Reducing Hallucinated...
physionet.org
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oishi Banerjee; Hong-Yu Zhou; Subathra Adithan; Stephen Kwak; Kay Wu; Pranav Rajpurkar (2024). ReXPref-Prior: A MIMIC-CXR Preference Dataset for Reducing Hallucinated Prior Exams in Radiology Report Generation [Dataset]. http://doi.org/10.13026/t13x-4r94
Explore at:
Unique identifier
https://doi.org/10.13026/t13x-4r94
Dataset updated
Aug 14, 2024
Authors
Oishi Banerjee; Hong-Yu Zhou; Subathra Adithan; Stephen Kwak; Kay Wu; Pranav Rajpurkar
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Generative vision-language models have exciting potential implications for radiology report generation, but unfortunately such models are also known to produce hallucinations and other nonsensical statements. For example, radiology report generation models regularly hallucinate prior exams, making statements such as “The lungs are hyperinflated with emphysematous changes as seen on prior CT” despite not having access to any prior exam. To address this shortcoming, we propose ReXPref-Prior, an adapted version of MIMIC-CXR where GPT-4 has removed references to prior exams from both findings and impression sections of chest X-ray reports. We expect ReXPref-Prior will be useful for training models that hallucinate prior exams less frequently, through techniques such as direct preference optimization. Additionally, ReXPref-Prior’s validation and test sets can be used as a new benchmark for evaluating report generation models.
p
Pulmonary Edema Severity Grades Based on MIMIC-CXR
physionet.org
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruizhi Liao; Geeticka Chauhan; Polina Golland; Seth Berkowitz; Steven Horng (2021). Pulmonary Edema Severity Grades Based on MIMIC-CXR [Dataset]. http://doi.org/10.13026/rz5p-rc64
Explore at:
Unique identifier
https://doi.org/10.13026/rz5p-rc64
Dataset updated
Feb 9, 2021
Authors
Ruizhi Liao; Geeticka Chauhan; Polina Golland; Seth Berkowitz; Steven Horng
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Clinical management decisions for patients with acutely decompensated heart failure and many other diseases are often based on grades of pulmonary edema severity, rather than its mere absence or presence. Chest radiographs are commonly performed to assess pulmonary edema. The MIMIC-CXR dataset that consists of 377,110 chest radiographs with free-text radiology reports offers a tremendous opportunity to study this subject.

This dataset is curated based on MIMIC-CXR, containing 3 metadata files that consist of pulmonary edema severity grades extracted from the MIMIC-CXR dataset through different means: 1) by regular expression (regex) from radiology reports, 2) by expert labeling from radiology reports, and 3) by consensus labeling from chest radiographs.

This dataset aims to support the algorithmic development of pulmonary edema assessment from chest x-ray images and benchmark its performance. The metadata files have subject IDs, study IDs, DICOM IDs, and the numerical grades of pulmonary edema severity. The IDs listed in this dataset have the same mapping structure as in MIMIC-CXR.
h
mimic-cxr-rad-dino
huggingface.co
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziao Wang (2025). mimic-cxr-rad-dino [Dataset]. https://huggingface.co/datasets/wza/mimic-cxr-rad-dino
Explore at:
Dataset updated
Mar 19, 2025
Authors
Ziao Wang
Description
wza/mimic-cxr-rad-dino dataset hosted on Hugging Face and contributed by the HF Datasets community
p
Data from: Image-derived cardiomegaly biomarker values for 96K chest X-rays...
physionet.org
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Duvieusart; Felix Krones; Guy Parsons; Lionel Tarassenko; Bartlomiej W Papiez; Adam Mahdi (2024). Image-derived cardiomegaly biomarker values for 96K chest X-rays in MIMIC-CXR/MIMIC-CXR-JPG [Dataset]. http://doi.org/10.13026/kfpv-zm25
Explore at:
Unique identifier
https://doi.org/10.13026/kfpv-zm25
Dataset updated
Aug 23, 2024
Authors
Benjamin Duvieusart; Felix Krones; Guy Parsons; Lionel Tarassenko; Bartlomiej W Papiez; Adam Mahdi
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Cardiomegaly is a condition characterized by an abnormal enlargement of the heart, its identification is of paramount importance as it associate with a wide range of cardiac conditions. It is primary identified via the cardiothoracic ratio (CTR), however this metric can be inaccurate as it is affect by external factors such as breathing and body position. Multimodal approaches could mitigate these limitations by integrating non-imaging data, however reliable and explainable integration of imaging and non-imaging data remains a significant challenge. While this database does not directly use multimodal data, it hopes to tackle this challenge by extracting cardiomegaly biomarkers (CTR and cardiopulmonary area ratio) from chest X-rays. Thus encapsulating the relevant imaging information into individual datapoints, allowing easy integration of ‘imaging’ data with non-imaging data for more reliable diagnostic tools. The values were extracted from over 93,000 posterior-anterior MIMIC-CXR scans using detection and segmentation neural networks, tuned for cardiac and pulmonary identification.
mimic_cxr_train_test
kaggle.com
Updated May 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PARDHU KADAMBARI (2025). mimic_cxr_train_test [Dataset]. https://www.kaggle.com/datasets/pardhukadambari/mimic-cxr-train-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PARDHU KADAMBARI
Description
Dataset

This dataset was created by PARDHU KADAMBARI

Contents
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
O
RadGraph (RadGraph: Extracting Clinical Entities and Relations from...
opendatalab.com
zip
Updated Jun 3, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford University (2021). RadGraph (RadGraph: Extracting Clinical Entities and Relations from Radiology Reports) [Dataset]. https://opendatalab.com/OpenDataLab/RadGraph
Explore at:
zipAvailable download formats
Dataset updated
Jun 3, 2021
Dataset provided by
Harvard University
VinBrain
Stanford University
License
https://physionet.org/content/radgraph/view-license/1.0.0/https://physionet.org/content/radgraph/view-license/1.0.0/
Description
RadGraph is a dataset of entities and relations in radiology reports based on our novel information extraction schema, consisting of 600 reports with 30K radiologist annotations and 221K reports with 10.5M automatically generated annotations. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. We also release an inference dataset, which contains automatically generated annotations for 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with mappings to associated chest radiographs.
mimic-iv-clinical-database-demo-2.2
kaggle.com
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Montassar bellah (2025). mimic-iv-clinical-database-demo-2.2 [Dataset]. https://www.kaggle.com/datasets/montassarba/mimic-iv-clinical-database-demo-2-2/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Montassar bellah
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Abstract The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.

Background The increasing adoption of digital electronic health records has led to the existence of large datasets that could be used to carry out important research across many areas of medicine. Research progress has been limited, however, due to limitations in the way that the datasets are curated and made available for research. The MIMIC datasets allow credentialed researchers around the world unprecedented access to real world clinical data, helping to reduce the barriers to conducting important medical research. The public availability of the data allows studies to be reproduced and collaboratively improved in ways that would not otherwise be possible.

Methods First, the set of individuals to include in the demo was chosen. Each person in MIMIC-IV is assigned a unique subject_id. As the subject_id is randomly generated, ordering by subject_id results in a random subset of individuals. We only considered individuals with an anchor_year_group value of 2011 - 2013 or 2014 - 2016 to ensure overlap with MIMIC-CXR v2.0.0. The first 100 subject_id who satisfied the anchor_year_group criteria were selected for the demo dataset.

All tables from MIMIC-IV were included in the demo dataset. Tables containing patient information, such as emar or labevents, were filtered using the list of selected subject_id. Tables which do not contain patient level information were included in their entirety (e.g. d_items or d_labitems). Note that all tables which do not contain patient level information are prefixed with the characters 'd_'.

Deidentification was performed following the same approach as the MIMIC-IV database. Protected health information (PHI) as listed in the HIPAA Safe Harbor provision was removed. Patient identifiers were replaced using a random cipher, resulting in deidentified integer identifiers for patients, hospitalizations, and ICU stays. Stringent rules were applied to structured columns based on the data type. Dates were shifted consistently using a random integer removing seasonality, day of the week, and year information. Text fields were filtered by manually curated allow and block lists, as well as context-specific regular expressions. For example, columns containing dose values were filtered to only contain numeric values. If necessary, a free-text deidentification algorithm was applied to remove PHI from free-text. Results of this algorithm were manually reviewed and verified to remove identified PHI.

Data Description MIMIC-IV is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-IV Clinical Database page [1] or the MIMIC-IV online documentation [2]. The demo shares an identical schema and structure to the equivalent version of MIMIC-IV.

Data files are distributed in comma separated value (CSV) format following the RFC 4180 standard [3]. The dataset is also made available on Google BigQuery. Instructions to accessing the dataset on BigQuery are provided on the online MIMIC-IV documentation, under the cloud page [2].

An additional file is included: demo_subject_id.csv. This is a list of the subject_id used to filter MIMIC-IV to the demo subset.

Usage Notes The MIMIC-IV demo provides researchers with the opportunity to better understand MIMIC-IV data.

CSV files can be opened natively using any text editor or spreadsheet program. However, as some tables are large it may be preferable to navigate the data via a relational database. We suggest either working with the data in Google BigQuery (see the "Files" section for access details) or creating an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.

Code is made available for use with MIMIC-IV on the MIMIC-IV code repository [4]. Code provided includes derivation of clinical concepts, tutorials, and reproducible analyses.

Release Notes Release notes for the demo follow the release notes for the MIMIC-IV database.

Ethics This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the pr...
p
Data from: RadGraph: Extracting Clinical Entities and Relations from...
physionet.org
Updated Jun 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saahil Jain; Ashwin Agrawal; Adriel Saporta; Steven QH Truong; Du Nguyen Duong; Tan Bui; Pierre Chambon; Matthew Lungren; Andrew Ng; Curtis Langlotz; Pranav Rajpurkar (2021). RadGraph: Extracting Clinical Entities and Relations from Radiology Reports [Dataset]. http://doi.org/10.13026/hm87-5p47
Explore at:
Unique identifier
https://doi.org/10.13026/hm87-5p47
Dataset updated
Jun 3, 2021
Authors
Saahil Jain; Ashwin Agrawal; Adriel Saporta; Steven QH Truong; Du Nguyen Duong; Tan Bui; Pierre Chambon; Matthew Lungren; Andrew Ng; Curtis Langlotz; Pranav Rajpurkar
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
RadGraph is a dataset of entities and relations in full-text radiology reports. We designed a novel information extraction (IE) schema to structure clinical information in a radiology report with four entities and three relations. Our train set consists of 500 MIMIC-CXR radiology reports annotated according to our schema by board-certified radiologists. Our test set consists of 50 MIMIC-CXR and 50 CheXpert reports, which are independently annotated by two board-certified radiologists. Additionally, we release annotations generated by a benchmark deep learning model that achieves a micro F1 of 0.82 (MIMIC-CXR test set) and 0.73 (CheXpert test set) on an evaluation metric for end-to-end relation extraction, where entity boundaries, entity types, and relation type must be correct. We use our model to automatically generate entity and relation labels across 220,763 MIMIC-CXR reports and 500 CheXpert reports, where annotations can be mapped to associated chest radiographs in the MIMIC-CXR and CheXpert datasets respectively. The dataset, which includes reports, entities, and relations, is de-identified according to the US Health Insurance Portability Act (HIPAA). This dataset is intended to support the development of natural language processing (NLP) methods for entity and relation extraction in radiology as well as enable multi-modal use cases that can leverage entities, relations, and associated radiographs.
p
Data from: Eye Gaze Data for Chest X-rays
physionet.org
Updated Sep 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandros Karargyris; Satyananda Kashyap; Ismini Lourentzou; Joy Wu; Matthew Tong; Arjun Sharma; Shafiq Abedin; David Beymer; Vandana Mukherjee; Elizabeth Krupinski; Mehdi Moradi (2020). Eye Gaze Data for Chest X-rays [Dataset]. http://doi.org/10.13026/qfdz-zr67
Explore at:
Unique identifier
https://doi.org/10.13026/qfdz-zr67
Dataset updated
Sep 12, 2020
Authors
Alexandros Karargyris; Satyananda Kashyap; Ismini Lourentzou; Joy Wu; Matthew Tong; Arjun Sharma; Shafiq Abedin; David Beymer; Vandana Mukherjee; Elizabeth Krupinski; Mehdi Moradi
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
We created a rich multimodal dataset for the Chest X-Ray (CXR) domain. The data was collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data. We hope this dataset can contribute to various fields of research with applications in machine learning such as deep learning explainability, multi-modal fusion, disease classification, and automated radiology report generation to name a few. The images were selected from the MIMIC-CXR Database and were associated with studies from 1,038 subjects (female: 495, male: 543) who had age range 20 - 80 years old.
p
Data from: RadCoref: Fine-tuning coreference resolution for different styles...
physionet.org
Updated Jan 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxiang Liao; Hantao Liu; Irena Spasic (2024). RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives [Dataset]. http://doi.org/10.13026/z67q-xy65
Explore at:
Unique identifier
https://doi.org/10.13026/z67q-xy65
Dataset updated
Jan 30, 2024
Authors
Yuxiang Liao; Hantao Liu; Irena Spasic
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. The dataset is annotated by a panel of three cross-disciplinary experts with experience in clinical data processing following the i2b2 annotation scheme with minimum modification. The dataset consists of Findings and Impression sections extracted from full radiology reports. The dataset has 950, 25 and 200 section documents for training, validation, and testing, respectively. The training and validation sets are annotated by one annotator. The test set is annotated by two human annotators independently, of which the results are merged manually by the third annotator. The dataset aims to support the task of coreference resolution on radiology reports. Given that the MIMIC-CXR has been de-identified already, no protected health information (PHI) is included.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil (2025). Visual Question Answering evaluation dataset for MIMIC CXR [Dataset]. http://doi.org/10.13026/cvsk-ny21

Visual Question Answering evaluation dataset for MIMIC CXR

Explore at:

Unique identifier

https://doi.org/10.13026/cvsk-ny21

Dataset updated

Jan 28, 2025

Authors

Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil

License

https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Description

MIMIC CXR [1] is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. In addition, labels for the presence of 12 different chest-related pathologies, as well as of any support devices, and overall normal/abnormal status were made available via the MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) [2] labels, which were generated using the CheXpert and NegBio algorithms.

Based on these labels, we created a visual question answering dataset comprising 224 questions for 48 cases from the official test set, and 111 questions for 23 validation cases. A majority (68%) of the questions are close-ended (answerable with yes or no), and focus on the presence of one out of 15 chest pathologies, or any support device, or generically on any abnormality, whereas the remaining open-ended questions inquire about the location, size, severity or type of a pathology/device, if present in the specific case, indicated by the MIMIC-CXR-JPG labels.

For each question and case we also provide a reference answer, which was authored by a board-certified radiologist (with 17 years of post-residency experience) based on the chest X-ray and original radiology report

Clear search

Close search

Google apps

Main menu

Visual Question Answering evaluation dataset for MIMIC CXR

ReXPref-Prior: A MIMIC-CXR Preference Dataset for Reducing Hallucinated...

Pulmonary Edema Severity Grades Based on MIMIC-CXR

mimic-cxr-rad-dino

Data from: Image-derived cardiomegaly biomarker values for 96K chest X-rays...

mimic_cxr_train_test

Dataset

Contents

MIMIC-IV

RadGraph (RadGraph: Extracting Clinical Entities and Relations from...

mimic-iv-clinical-database-demo-2.2

Data from: RadGraph: Extracting Clinical Entities and Relations from...

Data from: Eye Gaze Data for Chest X-rays

Data from: RadCoref: Fine-tuning coreference resolution for different styles...

Visual Question Answering evaluation dataset for MIMIC CXR