2 datasets found

RAD-ChestCT Dataset
zenodo.org
data.niaid.nih.gov
Updated Apr 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachel Lea Draelos; Rachel Lea Draelos; David Dov; Maciej A Mazurowski; Joseph Y. Lo; Joseph Y. Lo; Ricardo Henao; Geoffrey D. Rubin; Lawrence Carin; David Dov; Maciej A Mazurowski; Ricardo Henao; Geoffrey D. Rubin; Lawrence Carin (2023). RAD-ChestCT Dataset [Dataset]. http://doi.org/10.5281/zenodo.6406114
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6406114
Dataset updated
Apr 4, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rachel Lea Draelos; Rachel Lea Draelos; David Dov; Maciej A Mazurowski; Joseph Y. Lo; Joseph Y. Lo; Ricardo Henao; Geoffrey D. Rubin; Lawrence Carin; David Dov; Maciej A Mazurowski; Ricardo Henao; Geoffrey D. Rubin; Lawrence Carin
Description
Overview

The RAD-ChestCT dataset is a large medical imaging dataset developed by Duke MD/PhD student Rachel Draelos during her Computer Science PhD supervised by Lawrence Carin. The full dataset includes 35,747 chest CT scans from 19,661 adult patients. This Zenodo repository contains an initial release of 3,630 chest CT scans, approximately 10% of the dataset. This dataset is of significant interest to the machine learning and medical imaging research communities.

Papers

The following published paper includes a description of how the RAD-ChestCT dataset was created: Draelos et al., "Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes," Medical Image Analysis 2021. DOI: 10.1016/j.media.2020.101857 https://pubmed.ncbi.nlm.nih.gov/33129142/

Two additional papers leveraging the RAD-ChestCT dataset are available as preprints:

"Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks" (https://arxiv.org/abs/2011.08891)

"Explainable multiple abnormality classification of chest CT volumes with deep learning" (https://arxiv.org/abs/2111.12215)

Details about the files included in this data release

Metadata Files (4)

CT_Scan_Metadata_Complete_35747.csv: includes metadata about the whole dataset, with information extracted from DICOM headers.

Extrema_5747.csv: includes coordinates for lung bounding boxes for the whole dataset. Coordinates were derived computationally using a morphological image processing lung segmentation pipeline.

Indications_35747.csv: includes scan indications for the whole dataset. Indications were extracted from the free-text reports.

Summary_3630.csv: includes a listing of the 3,630 scans that are part of this repository.

Label Files (3)

The label files contain abnormality x location labels for the 3,630 shared CT volumes. Each CT volume is annotated with a matrix of 84 abnormality labels x 52 location labels. Labels were extracted from the free text reports using the Sentence Analysis for Radiology Label Extraction (SARLE) framework. For each CT scan, the label matrix has been flattened and the abnormalities and locations are separated by an asterisk in the CSV column headers (e.g. "mass*liver"). The labels can be used as the ground truth when training computer vision classifiers on the CT volumes. Label files include: imgtrain_Abnormality_and_Location_Labels.csv (for the training set)

imgvalid_Abnormality_and_Location_Labels.csv (for the validation set)

imgtest_Abnormality_and_Location_Labels.csv (for the test set)

CT Volume Files (3,630)

Each CT scan is provided as a compressed 3D numpy array (npz format). The CT scans can be read using the Python package numpy, version 1.14.5 and above.

Related Code

Code related to RAD-ChestCT is publicly available on GitHub at https://github.com/rachellea.

Repositories of interest include:

https://github.com/rachellea/ct-net-models contains PyTorch code to load the RAD-ChestCT dataset and train convolutional neural network models for multiple abnormality prediction from whole CT volumes.

https://github.com/rachellea/ct-volume-preprocessing contains an end-to-end Python framework to convert CT scans from DICOM to numpy format. This code was used to prepare the RAD-ChestCT volumes.

https://github.com/rachellea/sarle-labeler contains the Python implementation of the SARLE label extraction framework used to generate the abnormality and location label matrix from the free text reports. SARLE has minimal dependencies and the abnormality and location vocabulary terms can be easily modified to adapt SARLE to different radiologic modalities, abnormalities, and anatomical locations.
i
Bibliography and analysis on studies of institutional DMP support services -...
rdm.inesctec.pt
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Bibliography and analysis on studies of institutional DMP support services - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2023-005
Explore at:
Dataset updated
Feb 2, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The dataset is an .xls file that combines the bibliographic information selected to Identify global studies on DMP support in different institutions and understand how institutions provide, implement, and organize DMP support for researchers. The search was conducted on 14 May 2022 in the Scopus Database, and the dataset contains DMP related bibliography, collected with the following criteria: Application of inclusion; Application of exclusion; Addition of other sources, Application of exclusion criteria to other sources, Dimensions of analysis. This dataset is one of the outputs of the PhD thesis "Research data description in multiple domains: supporting researchers with Data Management Plans", financed by National Funds through the Portuguese funding agency, FCT (Fundação para a Ciência e a Tecnologia) - SFRH/BD/136332/2018, defended by Karimova Yulia on the X of X, 2023, published on X, and cited as: Karimova, Y. "Research data description in multiple domains: supporting researchers with Data Management Plans", PhD thesis, URL: XX. Namely, this dataset includes data related to the systematic review described in Chapter 4 of the thesis.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rachel Lea Draelos; Rachel Lea Draelos; David Dov; Maciej A Mazurowski; Joseph Y. Lo; Joseph Y. Lo; Ricardo Henao; Geoffrey D. Rubin; Lawrence Carin; David Dov; Maciej A Mazurowski; Ricardo Henao; Geoffrey D. Rubin; Lawrence Carin (2023). RAD-ChestCT Dataset [Dataset]. http://doi.org/10.5281/zenodo.6406114

RAD-ChestCT Dataset

Explore at:

18 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.5281/zenodo.6406114

Dataset updated

Apr 4, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Description

Overview

The RAD-ChestCT dataset is a large medical imaging dataset developed by Duke MD/PhD student Rachel Draelos during her Computer Science PhD supervised by Lawrence Carin. The full dataset includes 35,747 chest CT scans from 19,661 adult patients. This Zenodo repository contains an initial release of 3,630 chest CT scans, approximately 10% of the dataset. This dataset is of significant interest to the machine learning and medical imaging research communities.

Papers

The following published paper includes a description of how the RAD-ChestCT dataset was created: Draelos et al., "Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes," Medical Image Analysis 2021. DOI: 10.1016/j.media.2020.101857 https://pubmed.ncbi.nlm.nih.gov/33129142/

Two additional papers leveraging the RAD-ChestCT dataset are available as preprints:

"Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks" (https://arxiv.org/abs/2011.08891)

"Explainable multiple abnormality classification of chest CT volumes with deep learning" (https://arxiv.org/abs/2111.12215)

Details about the files included in this data release

Metadata Files (4)

CT_Scan_Metadata_Complete_35747.csv: includes metadata about the whole dataset, with information extracted from DICOM headers.

Extrema_5747.csv: includes coordinates for lung bounding boxes for the whole dataset. Coordinates were derived computationally using a morphological image processing lung segmentation pipeline.

Indications_35747.csv: includes scan indications for the whole dataset. Indications were extracted from the free-text reports.

Summary_3630.csv: includes a listing of the 3,630 scans that are part of this repository.

Label Files (3)

The label files contain abnormality x location labels for the 3,630 shared CT volumes. Each CT volume is annotated with a matrix of 84 abnormality labels x 52 location labels. Labels were extracted from the free text reports using the Sentence Analysis for Radiology Label Extraction (SARLE) framework. For each CT scan, the label matrix has been flattened and the abnormalities and locations are separated by an asterisk in the CSV column headers (e.g. "mass*liver"). The labels can be used as the ground truth when training computer vision classifiers on the CT volumes. Label files include: imgtrain_Abnormality_and_Location_Labels.csv (for the training set)

imgvalid_Abnormality_and_Location_Labels.csv (for the validation set)

imgtest_Abnormality_and_Location_Labels.csv (for the test set)

CT Volume Files (3,630)

Each CT scan is provided as a compressed 3D numpy array (npz format). The CT scans can be read using the Python package numpy, version 1.14.5 and above.

Related Code

Code related to RAD-ChestCT is publicly available on GitHub at https://github.com/rachellea.

Repositories of interest include:

https://github.com/rachellea/ct-net-models contains PyTorch code to load the RAD-ChestCT dataset and train convolutional neural network models for multiple abnormality prediction from whole CT volumes.

https://github.com/rachellea/ct-volume-preprocessing contains an end-to-end Python framework to convert CT scans from DICOM to numpy format. This code was used to prepare the RAD-ChestCT volumes.

https://github.com/rachellea/sarle-labeler contains the Python implementation of the SARLE label extraction framework used to generate the abnormality and location label matrix from the free text reports. SARLE has minimal dependencies and the abnormality and location vocabulary terms can be easily modified to adapt SARLE to different radiologic modalities, abnormalities, and anatomical locations.

Clear search

Close search

Google apps

Main menu

RAD-ChestCT Dataset

Bibliography and analysis on studies of institutional DMP support services -...

RAD-ChestCT Dataset