10 datasets found

c
A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis
cancerimagingarchive.net
dicom, n/a, xlsx, xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive, A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis [Dataset]. http://doi.org/10.7937/TCIA.2020.NNC2-0461
Explore at:
xml, n/a, xlsx, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/TCIA.2020.NNC2-0461
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Dec 22, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
R
Lung Pet Ct Dx Data Augmentation Dataset
universe.roboflow.com
zip
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yolo (2025). Lung Pet Ct Dx Data Augmentation Dataset [Dataset]. https://universe.roboflow.com/yolo-31k0y/lung-pet-ct-dx-data-augmentation
Explore at:
zipAvailable download formats
Dataset updated
Feb 23, 2025
Dataset authored and provided by
yolo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Tumors Bounding Boxes
Description
Lung PET CT Dx Data Augmentation

## Overview Lung PET CT Dx Data Augmentation is a dataset for object detection tasks - it contains Tumors annotations for 15,185 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
c
RIDER Lung PET-CT
cancerimagingarchive.net
dicom, n/a
Updated Dec 29, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2015). RIDER Lung PET-CT [Dataset]. http://doi.org/10.7937/k9/tcia.2015.ofip7tvm
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/k9/tcia.2015.ofip7tvm
Dataset updated
Dec 29, 2015
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Dec 29, 2015
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The RIDER Lung PET-CT collection was shared to facilitate the RIDER PET/CT subgroup activities. The PET/CT subgroup was responsible for: (1) archiving de-identified DICOM serial PET/CT phantom and lung cancer patient data in a public database to provide a resource for the testing and development of algorithms and imaging tools used for assessing response to therapy, (2) conducting multiple serial imaging studies of a long half-life phantom to assess systemic variance in serial PET/CT scans that is unrelated to response, and (3) identifying and recommending methods for quantifying sources of variance in PET/CT imaging with the goal of defining the change in PET measurements that may be unrelated to response to therapy, thus defining the absolute minimum effect size that should be used in the design of clinical trials using PET measurements as end points.
About the RIDER project
The Reference Image Database to Evaluate Therapy Response (RIDER) is a targeted data collection used to generate an initial consensus on how to harmonize data collection and analysis for quantitative imaging methods applied to measure the response to drug or radiation therapy. The National Cancer Institute (NCI) has exercised a series of contracts with specific academic sites for collection of repeat "coffee break," longitudinal phantom, and patient data for a range of imaging modalities (currently computed tomography [CT] positron emission tomography [PET] CT, dynamic contrast-enhanced magnetic resonance imaging [DCE MRI], diffusion-weighted [DW] MRI) and organ sites (currently lung, breast, and neuro). The methods for data collection, analysis, and results are described in the new Combined RIDER White Paper Report (Sept 2008):
RIDER White Paper: Combined contracts report (Sept 2008) PDF
The long term goal is to provide a resource to permit harmonized methods for data collection and analysis across different commercial imaging platforms to support multi-site clinical trials, using imaging as a biomarker for therapy response. Thus, the database should permit an objective comparison of methods for data collection and analysis as a national and international resource as described in the first RIDER white paper report (2006):
RIDER White Paper: Executive Summary PDF
RIDER White Paper: Editorial in Nature.com
R
Ct For Lung Cancer Diagnosis (lung Pet Ct Dx) Pascal Voc Annotions Dataset
universe.roboflow.com
zip
Updated Jun 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehmet Fatih AKCA (2021). Ct For Lung Cancer Diagnosis (lung Pet Ct Dx) Pascal Voc Annotions Dataset [Dataset]. https://universe.roboflow.com/mehmet-fatih-akca/yolotransfer/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 26, 2021
Dataset authored and provided by
Mehmet Fatih AKCA
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Cancer Bounding Boxes
Description
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.

The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.

Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.

The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.

Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.

Dataset link: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70224216
i
An example of a multimodal lung tumor dataset
ieee-dataport.org
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pei dang (2024). An example of a multimodal lung tumor dataset [Dataset]. https://ieee-dataport.org/documents/example-multimodal-lung-tumor-dataset
Explore at:
Dataset updated
Jan 18, 2024
Authors
pei dang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CT
f
Findings of performance evaluation.
plos.figshare.com
xls
Updated Dec 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Rahaman Wahab Sait; Eid AlBalawi; Ramprasad Nagaraj (2024). Findings of performance evaluation. [Dataset]. http://doi.org/10.1371/journal.pone.0313386.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313386.t002
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Abdul Rahaman Wahab Sait; Eid AlBalawi; Ramprasad Nagaraj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Early Lung Cancer (LC) detection is essential for reducing the global mortality rate. The limitations of traditional diagnostic techniques cause challenges in identifying LC using medical imaging data. In this study, we aim to develop a robust LC detection model. Positron Emission Tomography / Computed Tomography (PET / CT) images are utilized to comprehend the metabolic and anatomical data, leading to optimal LC diagnosis. In order to extract multiple LC features, we enhance MobileNet V3 and LeViT models. The weighted sum feature fusion technique is used to generate unique LC features. The extracted features are classified using spline functions, including linear, cubic, and B-spline of Kolmogorov–Arnold Networks (KANs). We ensemble the outcomes using the soft-voting approach. The model is generalized using the Lung—PET–CT–DX dataset. Five–fold cross-validation is used to evaluate the model. The proposed LC detection model achieves an impressive accuracy of 99.0% with a minimal loss of 0.07. In addition, limited resources are required to classify PET / CT images. The high performance underscores the potential of the proposed LC detection model in providing valuable and optimal results. The study findings can significantly improve clinical practice by presenting sophisticated and interpretable outcomes. The proposed model can be enhanced by integrating advanced feature fusion techniques.
c
SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset...
cancerimagingarchive.net
csv, n/a +1
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2023). SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data [Dataset]. http://doi.org/10.25737/SZ96-ZG60
Explore at:
csv, n/a, nifti and zipAvailable download formats
Unique identifier
https://doi.org/10.25737/SZ96-ZG60
Dataset updated
Oct 29, 2023
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Mar 7, 2024
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (References: Koitka 2020 and Haubold 2023). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255). 900 CT series from 882 patients were randomly selected from the following TCIA collections (number of CTs per collection in parenthesis): ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET/CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1). A script to download and resample the images is provided in our GitHub repository: https://github.com/UMEssen/saros-dataset The annotations are provided in NIfTI format and were performed on 5mm slice thickness. The annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file.
Body Regions

Subcutaneous Tissue

Muscle

Abdominal Cavity

Thoracic Cavity

Bones

Parotid Glands

Pericardium

Breast Implant

Mediastinum

Brain

Spinal Cord

Thyroid Glands

Submandibular Glands

Body Parts

Torso

Head

Right Leg

Left Leg

Right Arm

Left Arm

The labels which were modified or require further commentary are listed and explained below:

Subcutaneous Adipose Tissue: The cutis was included into this label due to its limited differentiation in 5mm-CT.

Muscle: All muscular tissue was segmented contiguously and not separated into single muscles. Thus, fascias and intermuscular fat were included into the label. Inter- and intramuscular fat is subtracted automatically in the process.

Abdominal Cavity: This label includes the pelvis. The label does not separate between the positional relationships of the peritoneum.

Mediastinum: The International Thymic Malignancy Group (ITMIG) scheme was used for the segmentation guidelines.

Head + Neck: The neck is confined by the base of the trapezius muscle.

Right + Left Leg: The legs are separated from the torso by the line between the two lowest points of the Rami ossa pubis.

Right + Left Arm: The arms are separated from the torso by the diagonal between the most lateral point of the acromion and the tuberculum infraglenoidale.

For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency.

Image segmentations produced by the AIMI Annotations initiative

zenodo.org

zip

Updated Oct 11, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni (2023). Image segmentations produced by the AIMI Annotations initiative [Dataset]. http://doi.org/10.5281/zenodo.8400869

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8400869

Dataset updated

Oct 11, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provides an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.

To validate the models performance, roughly 10% of the predictions were manually reviewed and corrected by both a board certified radiologist and a medical student (non-expert). Additionally, this non-expert looked at all the ai predictions and rated them on a 5 point Likert scale .

This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.

List of all tasks and IDC collections analyzed.

File	Segmentation Task	IDC Collections	LInks
breast-fdg-pet-ct.zip	FDG-avid lesions in breast from FDG PET/CT scans	QIN-Breast	model weights github
kidney-ct.zip	Kidney, Tumor, and Cysts from contrast enhanced CT scans	TCGA-KIRC	model weights github
liver-ct.zip	Liver from CT scans	TCGA-LIHC	model weights github
liver-mr.zip	Liver from T1 MRI scans	TCGA-LIHC	model weights github
lung-ct.zip	Lung and Nodules (3mm-30mm) from CT scans	ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC	model weights 1 model weights 2 github
lung-fdg-pet-ct.zip	Lungs and FDG-avid lesions in the lung from FDG PET/CT scans	ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC	model weights github
prostate-mr.zip	Prostate from T2 MRI scans	ProstateX	model weights github

Likert Score	Definition
5	Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)
4	Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable
3	Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome
2	Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch
1	Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.

Each zip file in the collection correlates to a specific segmentation task. The common folder structure is

ai-segmentations-dcm	This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files
qa-segmentations-dcm	This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad) and non-experts (ne)
qa-results.csv	CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.

AI-derived and Manually corrected segmentations for various IDC Collections

zenodo.org

zip

Updated Oct 11, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni (2023). AI-derived and Manually corrected segmentations for various IDC Collections [Dataset]. http://doi.org/10.5281/zenodo.8350738

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8350738

Dataset updated

Oct 11, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.

List of all tasks and IDC collections analyzed.

File	Segmentation Task	IDC Collections	LInks
breast-fdg-pet-ct.zip	FDG-avid lesions in breast from FDG PET/CT scans	QIN-Breast	model weights github
kidney-ct.zip	Kidney, Tumor, and Cysts from contrast enhanced CT scans	TCGA-KIRC	model weights github
liver-ct.zip	Liver from CT scans	TCGA-LIHC	model weights github
liver-mr.zip	Liver from T1 MRI scans	TCGA-LIHC	model weights github
lung-ct.zip	Lung and Nodules (3mm-30mm) from CT scans	ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC	model weights 1 model weights 2 github
lung-fdg-pet-ct.zip	Lungs and FDG-avid lesions in the lung from FDG PET/CT scans	ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC	model weights github
prostate-mr.zip	Prostate from T2 MRI scans	ProstateX	model weights github

Likert Score	Definition
5	Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)
4	Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable
3	Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome
2	Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch
1	Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.

Each zip file in the collection correlates to a specific segmentation task. The common folder structure is

ai-segmentations-dcm	This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files
qa-segmentations-dcm	This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad) and non-experts (ne)
qa-results.csv	CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.

Z
Data from: Image segmentations produced by BAMF under the AIMI Annotations...
data.niaid.nih.gov
zenodo.org
Updated Sep 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murugesan, Gowtham Krishnan (2024). Image segmentations produced by BAMF under the AIMI Annotations initiative [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8345959
Explore at:
Dataset updated
Sep 27, 2024
Dataset provided by
Soni, Rahul
Murugesan, Gowtham Krishnan
Van Oss, Jeff
McCrumb, Diana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provide an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.

To validate the model's performance, roughly 10% of the AI predictions were assigned to a validation set. For this set, a board-certified radiologist graded the quality of AI predictions on a Likert scale. If they did not 'strongly agree' with the AI output, the reviewer corrected the segmentation.

This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.

Only 10% of the AI-derived annotations provided in this dataset are verified by expert radiologists . More details, on model training and annotations are provided within the associated manuscript to ensure transparency and reproducibility.

This work was done in two stages. Versions 1.x of this record were from the first stage. Versions 2.x added additional records. In the Version 1.x collections, a medical student (non-expert) reviewed all the AI predictions and rated them on a 5-point Likert Scale, for any AI predictions in the validation set that they did not 'strongly agree' with, the non-expert provided corrected segmentations. This non-expert was not utilized for the Version 2.x additional records.

Likert Score Definition:

Guidelines for reviewers to grade the quality of AI segmentations.

5 Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)

4 Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable

3 Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome

2 Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch

1 Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.

Zip File Folder Structure

Each zip file in the collection correlates to a specific segmentation task. The common folder structure is

ai-segmentations-dcm This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files

qa-segmentations-dcm This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*)

qa-results.csv CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.

qa-results.csv Columns

The qa-results.csv file contains metadata about the segmentations, their related IDC case image, as well as the Likert ratings and comments by the reviewers.

Column

Description

Collection

The name of the IDC collection for this case

PatientID

PatientID in DICOM metadata of scan. Also called Case ID in the IDC

StudyInstanceUID

StudyInstanceUID in the DICOM metadata of the scan

SeriesInstanceUID

SeriesInstanceUID in the DICOM metadata of the scan

Validation

true/false if this scan was manually reviewed

Reviewer

Coded ID of the reviewer. Radiologist IDs start with ‘rad’ non-expect IDs start with ‘ne’

AimiProjectYear

2023 or 2024, This work was split over two years. The main methodology difference between the two is that in 2023, a non-expert also reviewed the AI output, but a non-expert was not utilized in 2024.

AISegmentation

The filename of the AI prediction file in DICOM-seg format. This file is in the ai-segmentations-dcm folder.

CorrectedSegmentation

The filename of the reviewer-corrected prediction file in DICOM-seg format. This file is in the qa-segmentations-dcm folder. If the reviewer strongly agreed with the AI for all segments, they did not provide any correction file.

Was the AI predicted ROIs accurate?

This column appears one for each segment in the task for images from AimiProjectYear 2023. The reviewer rates segmentation quality on a Likert scale. In tasks that have multiple labels in the output, there is only one rating to cover them all.

Was the AI predicted {SEGMENT_NAME} label accurate?

This column appears one for each segment in the task for images from AimiProjectYear 2024. The reviewer rates each segment for its quality on a Likert scale.

Do you have any comments about the AI predicted ROIs?

Open ended question for the reviewer

Do you have any comments about the findings from the study scans?

Open ended question for the reviewer

File Overview

brain-mr.zip

Segment Description: brain tumor regions: necrosis, edema, enhancing

IDC Collection: UPENN-GBM

Links: model weights, github

breast-fdg-pet-ct.zip

Segment Description: FDG-avid lesions in breast from FDG PET/CT scans QIN-Breast

IDC Collection: QIN-Breast

Links: model weights, github

breast-mr.zip

Segment Description: Breast, Fibroglandular tissue, structural tumor

IDC Collection: duke-breast-cancer-mri

Links: model weights, github

kidney-ct.zip

Segment Description: Kidney, Tumor, and Cysts from contrast enhanced CT scans

IDS Collection: TCGA-KIRC, TCGA-KIRP, TCGA-KICH, CPTAC-CCRCC

Links: model weights, github

liver-ct.zip

Segment Description: Liver from CT scans

IDC Collection: TCGA-LIHC

Links: model weights, github

liver2-ct.zip

Segment Description: Liver and Lesions from CT scans

IDC Collection: HCC-TACE-SEG, COLORECTAL-LIVER-METASTASES

Links: model weights, github

liver-mr.zip

Segment Description: Liver from T1 MRI scans

IDC Collection: TCGA-LIHC

Links: model weights, github

lung-ct.zip

Segment Description: Lung and Nodules (3mm-30mm) from CT scans

IDC Collections:

Anti-PD-1-Lung

LUNG-PET-CT-Dx

NSCLC Radiogenomics

RIDER Lung PET-CT

TCGA-LUAD

TCGA-LUSC

Links: model weights 1, model weights 2, github

lung2-ct.zip

Improved model version

Segment Description: Lung and Nodules (3mm-30mm) from CT scans

IDC Collections:

QIN-LUNG-CT, SPIE-AAPM Lung CT Challenge

Links: model weights, github

lung-fdg-pet-ct.zip

Segment Description: Lungs and FDG-avid lesions in the lung from FDG PET/CT scans

IDC Collections:

ACRIN-NSCLC-FDG-PET

Anti-PD-1-Lung

LUNG-PET-CT-Dx

NSCLC Radiogenomics

RIDER Lung PET-CT

TCGA-LUAD

TCGA-LUSC

Links: model weights, github

prostate-mr.zip

Segment Description: Prostate from T2 MRI scans

IDC Collection: ProstateX, Prostate-MRI-US-Biopsy

Links: model weights, github

Changelog

2.0.2 - Fix the brain-mr segmentations to be transformed correctly

2.0.1 - added AIMI 2024 radiologist comments to qa-results.csv

2.0.0 - added AIMI 2024 segmentations

1.X - AIMI 2023 segmentations and reviewer scores
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive, A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis [Dataset]. http://doi.org/10.7937/TCIA.2020.NNC2-0461

A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

Lung-PET-CT-Dx

Explore at:

102 scholarly articles cite this dataset (View in Google Scholar)

xml, n/a, xlsx, dicomAvailable download formats

Unique identifier

https://doi.org/10.7937/TCIA.2020.NNC2-0461

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

Dec 22, 2020

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.

The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.

Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.

The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.

Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.

Clear search

Close search

Google apps

Main menu

A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

Lung Pet Ct Dx Data Augmentation Dataset

Lung PET CT Dx Data Augmentation

RIDER Lung PET-CT

About the RIDER project

Ct For Lung Cancer Diagnosis (lung Pet Ct Dx) Pascal Voc Annotions Dataset

An example of a multimodal lung tumor dataset

Findings of performance evaluation.

SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset...

Body Regions

Body Parts

Image segmentations produced by the AIMI Annotations initiative

AI-derived and Manually corrected segmentations for various IDC Collections

Data from: Image segmentations produced by BAMF under the AIMI Annotations...

A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

Lung-PET-CT-Dx