https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Lung PET CT Dx Data Augmentation is a dataset for object detection tasks - it contains Tumors annotations for 15,185 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The RIDER Lung PET-CT collection was shared to facilitate the RIDER PET/CT subgroup activities. The PET/CT subgroup was responsible for: (1) archiving de-identified DICOM serial PET/CT phantom and lung cancer patient data in a public database to provide a resource for the testing and development of algorithms and imaging tools used for assessing response to therapy, (2) conducting multiple serial imaging studies of a long half-life phantom to assess systemic variance in serial PET/CT scans that is unrelated to response, and (3) identifying and recommending methods for quantifying sources of variance in PET/CT imaging with the goal of defining the change in PET measurements that may be unrelated to response to therapy, thus defining the absolute minimum effect size that should be used in the design of clinical trials using PET measurements as end points.
The Reference Image Database to Evaluate Therapy Response (RIDER) is a targeted data collection used to generate an initial consensus on how to harmonize data collection and analysis for quantitative imaging methods applied to measure the response to drug or radiation therapy. The National Cancer Institute (NCI) has exercised a series of contracts with specific academic sites for collection of repeat "coffee break," longitudinal phantom, and patient data for a range of imaging modalities (currently computed tomography [CT] positron emission tomography [PET] CT, dynamic contrast-enhanced magnetic resonance imaging [DCE MRI], diffusion-weighted [DW] MRI) and organ sites (currently lung, breast, and neuro). The methods for data collection, analysis, and results are described in the new Combined RIDER White Paper Report (Sept 2008):
The long term goal is to provide a resource to permit harmonized methods for data collection and analysis across different commercial imaging platforms to support multi-site clinical trials, using imaging as a biomarker for therapy response. Thus, the database should permit an objective comparison of methods for data collection and analysis as a national and international resource as described in the first RIDER white paper report (2006):
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
Dataset link: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70224216
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Early Lung Cancer (LC) detection is essential for reducing the global mortality rate. The limitations of traditional diagnostic techniques cause challenges in identifying LC using medical imaging data. In this study, we aim to develop a robust LC detection model. Positron Emission Tomography / Computed Tomography (PET / CT) images are utilized to comprehend the metabolic and anatomical data, leading to optimal LC diagnosis. In order to extract multiple LC features, we enhance MobileNet V3 and LeViT models. The weighted sum feature fusion technique is used to generate unique LC features. The extracted features are classified using spline functions, including linear, cubic, and B-spline of Kolmogorov–Arnold Networks (KANs). We ensemble the outcomes using the soft-voting approach. The model is generalized using the Lung—PET–CT–DX dataset. Five–fold cross-validation is used to evaluate the model. The proposed LC detection model achieves an impressive accuracy of 99.0% with a minimal loss of 0.07. In addition, limited resources are required to classify PET / CT images. The high performance underscores the potential of the proposed LC detection model in providing valuable and optimal results. The study findings can significantly improve clinical practice by presenting sophisticated and interpretable outcomes. The proposed model can be enhanced by integrating advanced feature fusion techniques.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provides an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.
To validate the models performance, roughly 10% of the predictions were manually reviewed and corrected by both a board certified radiologist and a medical student (non-expert). Additionally, this non-expert looked at all the ai predictions and rated them on a 5 point Likert scale .
This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.
List of all tasks and IDC collections analyzed.
File | Segmentation Task | IDC Collections | LInks |
---|---|---|---|
breast-fdg-pet-ct.zip | FDG-avid lesions in breast from FDG PET/CT scans | QIN-Breast | |
kidney-ct.zip | Kidney, Tumor, and Cysts from contrast enhanced CT scans | TCGA-KIRC | |
liver-ct.zip | Liver from CT scans | TCGA-LIHC | |
liver-mr.zip | Liver from T1 MRI scans | TCGA-LIHC | |
lung-ct.zip | Lung and Nodules (3mm-30mm) from CT scans | ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC | |
lung-fdg-pet-ct.zip | Lungs and FDG-avid lesions in the lung from FDG PET/CT scans | ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC | |
prostate-mr.zip | Prostate from T2 MRI scans | ProstateX |
Likert Score | Definition |
---|---|
5 | Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change) |
4 | Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable |
3 | Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome |
2 | Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch |
1 | Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable. |
Each zip file in the collection correlates to a specific segmentation task. The common folder structure is
ai-segmentations-dcm | This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files |
qa-segmentations-dcm | This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*) |
qa-results.csv | CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provides an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.
To validate the models performance, roughly 10% of the predictions were manually reviewed and corrected by both a board certified radiologist and a medical student (non-expert). Additionally, this non-expert looked at all the ai predictions and rated them on a 5 point Likert scale .
This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.
List of all tasks and IDC collections analyzed.
File | Segmentation Task | IDC Collections | LInks |
---|---|---|---|
breast-fdg-pet-ct.zip | FDG-avid lesions in breast from FDG PET/CT scans | QIN-Breast | |
kidney-ct.zip | Kidney, Tumor, and Cysts from contrast enhanced CT scans | TCGA-KIRC | |
liver-ct.zip | Liver from CT scans | TCGA-LIHC | |
liver-mr.zip | Liver from T1 MRI scans | TCGA-LIHC | |
lung-ct.zip | Lung and Nodules (3mm-30mm) from CT scans | ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC | |
lung-fdg-pet-ct.zip | Lungs and FDG-avid lesions in the lung from FDG PET/CT scans | ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC | |
prostate-mr.zip | Prostate from T2 MRI scans | ProstateX |
Likert Score | Definition |
---|---|
5 | Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change) |
4 | Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable |
3 | Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome |
2 | Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch |
1 | Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable. |
Each zip file in the collection correlates to a specific segmentation task. The common folder structure is
ai-segmentations-dcm | This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files |
qa-segmentations-dcm | This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*) |
qa-results.csv | CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provide an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.
To validate the model's performance, roughly 10% of the AI predictions were assigned to a validation set. For this set, a board-certified radiologist graded the quality of AI predictions on a Likert scale. If they did not 'strongly agree' with the AI output, the reviewer corrected the segmentation.
This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.
Only 10% of the AI-derived annotations provided in this dataset are verified by expert radiologists . More details, on model training and annotations are provided within the associated manuscript to ensure transparency and reproducibility.
This work was done in two stages. Versions 1.x of this record were from the first stage. Versions 2.x added additional records. In the Version 1.x collections, a medical student (non-expert) reviewed all the AI predictions and rated them on a 5-point Likert Scale, for any AI predictions in the validation set that they did not 'strongly agree' with, the non-expert provided corrected segmentations. This non-expert was not utilized for the Version 2.x additional records.
Likert Score Definition:
Guidelines for reviewers to grade the quality of AI segmentations.
5 Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)
4 Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable
3 Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome
2 Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch
1 Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.
Zip File Folder Structure
Each zip file in the collection correlates to a specific segmentation task. The common folder structure is
ai-segmentations-dcm This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files
qa-segmentations-dcm This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*)
qa-results.csv CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.
qa-results.csv Columns
The qa-results.csv file contains metadata about the segmentations, their related IDC case image, as well as the Likert ratings and comments by the reviewers.
Column
Description
Collection
The name of the IDC collection for this case
PatientID
PatientID in DICOM metadata of scan. Also called Case ID in the IDC
StudyInstanceUID
StudyInstanceUID in the DICOM metadata of the scan
SeriesInstanceUID
SeriesInstanceUID in the DICOM metadata of the scan
Validation
true/false if this scan was manually reviewed
Reviewer
Coded ID of the reviewer. Radiologist IDs start with ‘rad’ non-expect IDs start with ‘ne’
AimiProjectYear
2023 or 2024, This work was split over two years. The main methodology difference between the two is that in 2023, a non-expert also reviewed the AI output, but a non-expert was not utilized in 2024.
AISegmentation
The filename of the AI prediction file in DICOM-seg format. This file is in the ai-segmentations-dcm folder.
CorrectedSegmentation
The filename of the reviewer-corrected prediction file in DICOM-seg format. This file is in the qa-segmentations-dcm folder. If the reviewer strongly agreed with the AI for all segments, they did not provide any correction file.
Was the AI predicted ROIs accurate?
This column appears one for each segment in the task for images from AimiProjectYear 2023. The reviewer rates segmentation quality on a Likert scale. In tasks that have multiple labels in the output, there is only one rating to cover them all.
Was the AI predicted {SEGMENT_NAME} label accurate?
This column appears one for each segment in the task for images from AimiProjectYear 2024. The reviewer rates each segment for its quality on a Likert scale.
Do you have any comments about the AI predicted ROIs?
Open ended question for the reviewer
Do you have any comments about the findings from the study scans?
Open ended question for the reviewer
File Overview
brain-mr.zip
Segment Description: brain tumor regions: necrosis, edema, enhancing
IDC Collection: UPENN-GBM
Links: model weights, github
breast-fdg-pet-ct.zip
Segment Description: FDG-avid lesions in breast from FDG PET/CT scans QIN-Breast
IDC Collection: QIN-Breast
Links: model weights, github
breast-mr.zip
Segment Description: Breast, Fibroglandular tissue, structural tumor
IDC Collection: duke-breast-cancer-mri
Links: model weights, github
kidney-ct.zip
Segment Description: Kidney, Tumor, and Cysts from contrast enhanced CT scans
IDS Collection: TCGA-KIRC, TCGA-KIRP, TCGA-KICH, CPTAC-CCRCC
Links: model weights, github
liver-ct.zip
Segment Description: Liver from CT scans
IDC Collection: TCGA-LIHC
Links: model weights, github
liver2-ct.zip
Segment Description: Liver and Lesions from CT scans
IDC Collection: HCC-TACE-SEG, COLORECTAL-LIVER-METASTASES
Links: model weights, github
liver-mr.zip
Segment Description: Liver from T1 MRI scans
IDC Collection: TCGA-LIHC
Links: model weights, github
lung-ct.zip
Segment Description: Lung and Nodules (3mm-30mm) from CT scans
IDC Collections:
Anti-PD-1-Lung
LUNG-PET-CT-Dx
NSCLC Radiogenomics
RIDER Lung PET-CT
TCGA-LUAD
TCGA-LUSC
Links: model weights 1, model weights 2, github
lung2-ct.zip
Improved model version
Segment Description: Lung and Nodules (3mm-30mm) from CT scans
IDC Collections:
QIN-LUNG-CT, SPIE-AAPM Lung CT Challenge
Links: model weights, github
lung-fdg-pet-ct.zip
Segment Description: Lungs and FDG-avid lesions in the lung from FDG PET/CT scans
IDC Collections:
ACRIN-NSCLC-FDG-PET
Anti-PD-1-Lung
LUNG-PET-CT-Dx
NSCLC Radiogenomics
RIDER Lung PET-CT
TCGA-LUAD
TCGA-LUSC
Links: model weights, github
prostate-mr.zip
Segment Description: Prostate from T2 MRI scans
IDC Collection: ProstateX, Prostate-MRI-US-Biopsy
Links: model weights, github
Changelog
2.0.2 - Fix the brain-mr segmentations to be transformed correctly
2.0.1 - added AIMI 2024 radiologist comments to qa-results.csv
2.0.0 - added AIMI 2024 segmentations
1.X - AIMI 2023 segmentations and reviewer scores
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.