11 datasets found

p
Data from: MIMIC-CXR-JPG - chest radiographs with structured labels
physionet.org
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng (2024). MIMIC-CXR-JPG - chest radiographs with structured labels [Dataset]. http://doi.org/10.13026/jsn5-t979
Explore at:
Unique identifier
https://doi.org/10.13026/jsn5-t979
Dataset updated
Mar 12, 2024
Authors
Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) Database v2.0.0 is a large publicly available dataset of chest radiographs in JPG format with structured labels derived from free-text radiology reports. The MIMIC-CXR-JPG dataset is wholly derived from MIMIC-CXR, providing JPG format files derived from the DICOM images and structured labels derived from the free-text reports. The aim of MIMIC-CXR-JPG is to provide a convenient processed version of MIMIC-CXR, as well as to provide a standard reference for data splits and image labels. The dataset contains 377,110 JPG format images and structured labels derived from the 227,827 free-text radiology reports associated with these images. The dataset is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements. Protected health information (PHI) has been removed. The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support.
p
Data from: MIMIC-CXR Database
physionet.org
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Tom Pollard; Roger Mark; Seth Berkowitz; Steven Horng (2024). MIMIC-CXR Database [Dataset]. http://doi.org/10.13026/4jqj-jw95
Explore at:
Unique identifier
https://doi.org/10.13026/4jqj-jw95
Dataset updated
Jul 23, 2024
Authors
Alistair Johnson; Tom Pollard; Roger Mark; Seth Berkowitz; Steven Horng
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The MIMIC Chest X-ray (MIMIC-CXR) Database v2.0.0 is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. The dataset is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements. Protected health information (PHI) has been removed. The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support.
t
MNIST and MIMIC-CXR-JPG datasets
service.tib.eu
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). MNIST and MIMIC-CXR-JPG datasets [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-and-mimic-cxr-jpg-datasets
Explore at:
Dataset updated
Jan 3, 2025
Description
The MNIST dataset is a large dataset of handwritten digits, and the MIMIC-CXR-JPG dataset is a large dataset of chest x-ray images.
p
Visual Question Answering evaluation dataset for MIMIC CXR
physionet.org
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil (2025). Visual Question Answering evaluation dataset for MIMIC CXR [Dataset]. http://doi.org/10.13026/cvsk-ny21
Explore at:
Unique identifier
https://doi.org/10.13026/cvsk-ny21
Dataset updated
Jan 28, 2025
Authors
Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
MIMIC CXR [1] is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. In addition, labels for the presence of 12 different chest-related pathologies, as well as of any support devices, and overall normal/abnormal status were made available via the MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) [2] labels, which were generated using the CheXpert and NegBio algorithms.

Based on these labels, we created a visual question answering dataset comprising 224 questions for 48 cases from the official test set, and 111 questions for 23 validation cases. A majority (68%) of the questions are close-ended (answerable with yes or no), and focus on the presence of one out of 15 chest pathologies, or any support device, or generically on any abnormality, whereas the remaining open-ended questions inquire about the location, size, severity or type of a pathology/device, if present in the specific case, indicated by the MIMIC-CXR-JPG labels.

For each question and case we also provide a reference answer, which was authored by a board-certified radiologist (with 17 years of post-residency experience) based on the chest X-ray and original radiology report
Curated CXR report generation dataset
kaggle.com
Updated Feb 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FinanceKim (2023). Curated CXR report generation dataset [Dataset]. https://www.kaggle.com/datasets/financekim/curated-cxr-report-generation-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
FinanceKim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Original Dataset Reference: OpenI, MIMIC-CXR

Curated by authors of MediViLL (https://github.com/SuperSupermoon/MedViLL)

Image files are saved seperately. The corresponding report is saved in the MediVILL folder (in dataset subdir), with jsonl extension -> read files with pandas will do the trick (https://stackoverflow.com/questions/50475635/loading-jsonl-file-as-json-objects)
p
Data from: Image-derived cardiomegaly biomarker values for 96K chest X-rays...
physionet.org
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Duvieusart; Felix Krones; Guy Parsons; Lionel Tarassenko; Bartlomiej W Papiez; Adam Mahdi (2024). Image-derived cardiomegaly biomarker values for 96K chest X-rays in MIMIC-CXR/MIMIC-CXR-JPG [Dataset]. http://doi.org/10.13026/kfpv-zm25
Explore at:
Unique identifier
https://doi.org/10.13026/kfpv-zm25
Dataset updated
Aug 23, 2024
Authors
Benjamin Duvieusart; Felix Krones; Guy Parsons; Lionel Tarassenko; Bartlomiej W Papiez; Adam Mahdi
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Cardiomegaly is a condition characterized by an abnormal enlargement of the heart, its identification is of paramount importance as it associate with a wide range of cardiac conditions. It is primary identified via the cardiothoracic ratio (CTR), however this metric can be inaccurate as it is affect by external factors such as breathing and body position. Multimodal approaches could mitigate these limitations by integrating non-imaging data, however reliable and explainable integration of imaging and non-imaging data remains a significant challenge. While this database does not directly use multimodal data, it hopes to tackle this challenge by extracting cardiomegaly biomarkers (CTR and cardiopulmonary area ratio) from chest X-rays. Thus encapsulating the relevant imaging information into individual datapoints, allowing easy integration of ‘imaging’ data with non-imaging data for more reliable diagnostic tools. The values were extracted from over 93,000 posterior-anterior MIMIC-CXR scans using detection and segmentation neural networks, tuned for cardiac and pulmonary identification.
p
Code for generating the HAIM multimodal dataset of MIMIC-IV clinical data...
physionet.org
Updated Aug 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis R Soenksen; Yu Ma; Cynthia Zeng; Leonard David Jean Boussioux; Kimberly Villalobos Carballo; Liangyuan Na; Holly Wiberg; Michael Li; Ignacio Fuentes; Dimitris Bertsimas (2022). Code for generating the HAIM multimodal dataset of MIMIC-IV clinical data and x-rays [Dataset]. http://doi.org/10.13026/3f8d-qe93
Explore at:
Unique identifier
https://doi.org/10.13026/3f8d-qe93
Dataset updated
Aug 23, 2022
Authors
Luis R Soenksen; Yu Ma; Cynthia Zeng; Leonard David Jean Boussioux; Kimberly Villalobos Carballo; Liangyuan Na; Holly Wiberg; Michael Li; Ignacio Fuentes; Dimitris Bertsimas
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
A multimodal combination of the MIMIC-IV v1.0.0 and MIMIC Chest X-ray (MIMIC-CXR-JPG) v2.0.0 databases filtered to only include patients that have at least one chest X-ray performed with the goal of validating multi-modal predictive analytics in healthcare operations can be generated with the present resource. This multimodal dataset generated through this code contains 34,540 individual patient files in the form of "pickle" Python object structures, which covers a total of 7,279 hospitalization stays involving 6,485 unique patients. Additionally, code to extract feature embeddings as well as the list of pre-processed features are included in this repository.
h
GEMeX-CoT
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelvin Liu (2025). GEMeX-CoT [Dataset]. https://huggingface.co/datasets/BoKelvin/GEMeX-CoT
Explore at:
Dataset updated
Jun 1, 2025
Authors
Kelvin Liu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
For images, please refer to MIMIC-CXR-JPG(https://physionet.org/content/mimic-cxr-jpg/2.1.0/). After downloading, pad the shorter side with zeros and then resize the image to 336 × 336.
p
Data from: CheXmask Database: a large-scale dataset of anatomical...
physionet.org
Updated Jan 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Gaggion; Candelaria Mosquera; Martina Aineseder; Lucas Mansilla; Diego Milone; Enzo Ferrante (2025). CheXmask Database: a large-scale dataset of anatomical segmentation masks for chest x-ray images [Dataset]. http://doi.org/10.13026/3705-zg36
Explore at:
Unique identifier
https://doi.org/10.13026/3705-zg36
Dataset updated
Jan 22, 2025
Authors
Nicolas Gaggion; Candelaria Mosquera; Martina Aineseder; Lucas Mansilla; Diego Milone; Enzo Ferrante
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.
h
GEMeX-ThinkVG
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelvin Liu (2025). GEMeX-ThinkVG [Dataset]. https://huggingface.co/datasets/BoKelvin/GEMeX-ThinkVG
Explore at:
Dataset updated
Jun 26, 2025
Authors
Kelvin Liu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
For images, please refer to MIMIC-CXR-JPG(https://physionet.org/content/mimic-cxr-jpg/2.1.0/). After downloading, pad the shorter side with zeros and then resize the image to 336 × 336. (Full data will be released soon)

Reference

If you find ThinkVG useful in your research, please consider citing the following paper: @misc{liu2025gemexthinkvg, title={GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning}, author={Bo Liu and Xiangyu… See the full description on the dataset page: https://huggingface.co/datasets/BoKelvin/GEMeX-ThinkVG.
p
Data from: RadGraph2: Tracking Findings Over Time in Radiology Reports
physionet.org
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Dejl; Sameer Khanna; Patricia Therese Pile; Kibo Yoon; Steven QH Truong; Hanh Duong; Agustina Saenz; Pranav Rajpurkar (2024). RadGraph2: Tracking Findings Over Time in Radiology Reports [Dataset]. http://doi.org/10.13026/q65y-9688
Explore at:
Unique identifier
https://doi.org/10.13026/q65y-9688
Dataset updated
Aug 8, 2024
Authors
Adam Dejl; Sameer Khanna; Patricia Therese Pile; Kibo Yoon; Steven QH Truong; Hanh Duong; Agustina Saenz; Pranav Rajpurkar
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
RadGraph2 is a dataset of 800 chest radiology reports annotated using a fine-grained entity-relationship schema, which is an expanded version of the previously introduced RadGraph dataset. In contrast with the previous approaches and the original RadGraph, the new version of the used information extraction schema is designed to capture not only the key findings and their context but also the mentions of changes that occurred between the prior radiology examinations and the more recent study. These changes may include the appearance of new conditions affecting the patient, their progression, or the differences in the setup of the observed supporting devices. The information extracted from each report is represented in the form of a knowledge graph composed of clinically relevant entities and relations, which makes it easily amenable to automated processing. In addition to the dataset of manually labeled reports, we release more than 220,000 reports automatically annotated by our benchmark model. This model achieved an F1 micro performance of 0.88 and 0.74 on two differently sourced withheld test sets (from MIMIC-CXR-JPG and CheXpert, respectively). We believe that RadGraph2 could facilitate the development of clinically useful systems for the automated processing of radiology reports, particularly those reasoning about the evolution of a patient’s state over time.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng (2024). MIMIC-CXR-JPG - chest radiographs with structured labels [Dataset]. http://doi.org/10.13026/jsn5-t979

Data from: MIMIC-CXR-JPG - chest radiographs with structured labels

Explore at:

108 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.13026/jsn5-t979

Dataset updated

Mar 12, 2024

Authors

Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng

License

https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Description

The MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) Database v2.0.0 is a large publicly available dataset of chest radiographs in JPG format with structured labels derived from free-text radiology reports. The MIMIC-CXR-JPG dataset is wholly derived from MIMIC-CXR, providing JPG format files derived from the DICOM images and structured labels derived from the free-text reports. The aim of MIMIC-CXR-JPG is to provide a convenient processed version of MIMIC-CXR, as well as to provide a standard reference for data splits and image labels. The dataset contains 377,110 JPG format images and structured labels derived from the 227,827 free-text radiology reports associated with these images. The dataset is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements. Protected health information (PHI) has been removed. The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support.

Clear search

Close search

Google apps

Main menu

Data from: MIMIC-CXR-JPG - chest radiographs with structured labels

Data from: MIMIC-CXR Database

MNIST and MIMIC-CXR-JPG datasets

Visual Question Answering evaluation dataset for MIMIC CXR

Curated CXR report generation dataset

Data from: Image-derived cardiomegaly biomarker values for 96K chest X-rays...

Code for generating the HAIM multimodal dataset of MIMIC-IV clinical data...

GEMeX-CoT

Data from: CheXmask Database: a large-scale dataset of anatomical...

GEMeX-ThinkVG

Data from: RadGraph2: Tracking Findings Over Time in Radiology Reports

Data from: MIMIC-CXR-JPG - chest radiographs with structured labels