10 datasets found
  1. c

    A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

    • cancerimagingarchive.net
    dicom, n/a, xlsx, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis [Dataset]. http://doi.org/10.7937/TCIA.2020.NNC2-0461
    Explore at:
    xml, n/a, xlsx, dicomAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Dec 22, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.

    The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.

    Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.

    The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.

    Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.

  2. R

    Lung Pet Ct Dx Data Augmentation Dataset

    • universe.roboflow.com
    zip
    Updated Feb 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yolo (2025). Lung Pet Ct Dx Data Augmentation Dataset [Dataset]. https://universe.roboflow.com/yolo-31k0y/lung-pet-ct-dx-data-augmentation
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 23, 2025
    Dataset authored and provided by
    yolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Tumors Bounding Boxes
    Description

    Lung PET CT Dx Data Augmentation

    ## Overview
    
    Lung PET CT Dx Data Augmentation is a dataset for object detection tasks - it contains Tumors annotations for 15,185 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. c

    RIDER Lung PET-CT

    • cancerimagingarchive.net
    dicom, n/a
    Updated Dec 29, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2015). RIDER Lung PET-CT [Dataset]. http://doi.org/10.7937/k9/tcia.2015.ofip7tvm
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    Dec 29, 2015
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Dec 29, 2015
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The RIDER Lung PET-CT collection was shared to facilitate the RIDER PET/CT subgroup activities. The PET/CT subgroup was responsible for: (1) archiving de-identified DICOM serial PET/CT phantom and lung cancer patient data in a public database to provide a resource for the testing and development of algorithms and imaging tools used for assessing response to therapy, (2) conducting multiple serial imaging studies of a long half-life phantom to assess systemic variance in serial PET/CT scans that is unrelated to response, and (3) identifying and recommending methods for quantifying sources of variance in PET/CT imaging with the goal of defining the change in PET measurements that may be unrelated to response to therapy, thus defining the absolute minimum effect size that should be used in the design of clinical trials using PET measurements as end points.

    About the RIDER project

    The Reference Image Database to Evaluate Therapy Response (RIDER) is a targeted data collection used to generate an initial consensus on how to harmonize data collection and analysis for quantitative imaging methods applied to measure the response to drug or radiation therapy. The National Cancer Institute (NCI) has exercised a series of contracts with specific academic sites for collection of repeat "coffee break," longitudinal phantom, and patient data for a range of imaging modalities (currently computed tomography [CT] positron emission tomography [PET] CT, dynamic contrast-enhanced magnetic resonance imaging [DCE MRI], diffusion-weighted [DW] MRI) and organ sites (currently lung, breast, and neuro). The methods for data collection, analysis, and results are described in the new Combined RIDER White Paper Report (Sept 2008):

    The long term goal is to provide a resource to permit harmonized methods for data collection and analysis across different commercial imaging platforms to support multi-site clinical trials, using imaging as a biomarker for therapy response. Thus, the database should permit an objective comparison of methods for data collection and analysis as a national and international resource as described in the first RIDER white paper report (2006):

  4. R

    Ct For Lung Cancer Diagnosis (lung Pet Ct Dx) Pascal Voc Annotions Dataset

    • universe.roboflow.com
    zip
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehmet Fatih AKCA (2021). Ct For Lung Cancer Diagnosis (lung Pet Ct Dx) Pascal Voc Annotions Dataset [Dataset]. https://universe.roboflow.com/mehmet-fatih-akca/yolotransfer/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 26, 2021
    Dataset authored and provided by
    Mehmet Fatih AKCA
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Cancer Bounding Boxes
    Description

    This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.

    The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.

    Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.

    The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.

    Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.

    Dataset link: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70224216

  5. i

    An example of a multimodal lung tumor dataset

    • ieee-dataport.org
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pei dang (2024). An example of a multimodal lung tumor dataset [Dataset]. https://ieee-dataport.org/documents/example-multimodal-lung-tumor-dataset
    Explore at:
    Dataset updated
    Jan 18, 2024
    Authors
    pei dang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CT

  6. f

    Findings of performance evaluation.

    • plos.figshare.com
    xls
    Updated Dec 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdul Rahaman Wahab Sait; Eid AlBalawi; Ramprasad Nagaraj (2024). Findings of performance evaluation. [Dataset]. http://doi.org/10.1371/journal.pone.0313386.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 31, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Abdul Rahaman Wahab Sait; Eid AlBalawi; Ramprasad Nagaraj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Early Lung Cancer (LC) detection is essential for reducing the global mortality rate. The limitations of traditional diagnostic techniques cause challenges in identifying LC using medical imaging data. In this study, we aim to develop a robust LC detection model. Positron Emission Tomography / Computed Tomography (PET / CT) images are utilized to comprehend the metabolic and anatomical data, leading to optimal LC diagnosis. In order to extract multiple LC features, we enhance MobileNet V3 and LeViT models. The weighted sum feature fusion technique is used to generate unique LC features. The extracted features are classified using spline functions, including linear, cubic, and B-spline of Kolmogorov–Arnold Networks (KANs). We ensemble the outcomes using the soft-voting approach. The model is generalized using the Lung—PET–CT–DX dataset. Five–fold cross-validation is used to evaluate the model. The proposed LC detection model achieves an impressive accuracy of 99.0% with a minimal loss of 0.07. In addition, limited resources are required to classify PET / CT images. The high performance underscores the potential of the proposed LC detection model in providing valuable and optimal results. The study findings can significantly improve clinical practice by presenting sophisticated and interpretable outcomes. The proposed model can be enhanced by integrating advanced feature fusion techniques.

  7. c

    SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset...

    • cancerimagingarchive.net
    csv, n/a +1
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2023). SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data [Dataset]. http://doi.org/10.25737/SZ96-ZG60
    Explore at:
    csv, n/a, nifti and zipAvailable download formats
    Dataset updated
    Oct 29, 2023
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Mar 7, 2024
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description
    Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (References: Koitka 2020 and Haubold 2023). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255). 900 CT series from 882 patients were randomly selected from the following TCIA collections (number of CTs per collection in parenthesis): ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET/CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1). A script to download and resample the images is provided in our GitHub repository: https://github.com/UMEssen/saros-dataset The annotations are provided in NIfTI format and were performed on 5mm slice thickness. The annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file.

    Body Regions

    1. Subcutaneous Tissue
    2. Muscle
    3. Abdominal Cavity
    4. Thoracic Cavity
    5. Bones
    6. Parotid Glands
    7. Pericardium
    8. Breast Implant
    9. Mediastinum
    10. Brain
    11. Spinal Cord
    12. Thyroid Glands
    13. Submandibular Glands

    Body Parts

    1. Torso
    2. Head
    3. Right Leg
    4. Left Leg
    5. Right Arm
    6. Left Arm
    The labels which were modified or require further commentary are listed and explained below:
    • Subcutaneous Adipose Tissue: The cutis was included into this label due to its limited differentiation in 5mm-CT.
    • Muscle: All muscular tissue was segmented contiguously and not separated into single muscles. Thus, fascias and intermuscular fat were included into the label. Inter- and intramuscular fat is subtracted automatically in the process.
    • Abdominal Cavity: This label includes the pelvis. The label does not separate between the positional relationships of the peritoneum.
    • Mediastinum: The International Thymic Malignancy Group (ITMIG) scheme was used for the segmentation guidelines.
    • Head + Neck: The neck is confined by the base of the trapezius muscle.
    • Right + Left Leg: The legs are separated from the torso by the line between the two lowest points of the Rami ossa pubis.
    • Right + Left Arm: The arms are separated from the torso by the diagonal between the most lateral point of the acromion and the tuberculum infraglenoidale.
    For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency.

  8. Image segmentations produced by the AIMI Annotations initiative

    • zenodo.org
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni (2023). Image segmentations produced by the AIMI Annotations initiative [Dataset]. http://doi.org/10.5281/zenodo.8400869
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provides an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.

    To validate the models performance, roughly 10% of the predictions were manually reviewed and corrected by both a board certified radiologist and a medical student (non-expert). Additionally, this non-expert looked at all the ai predictions and rated them on a 5 point Likert scale .

    This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.

    List of all tasks and IDC collections analyzed.

    FileSegmentation TaskIDC CollectionsLInks
    breast-fdg-pet-ct.zipFDG-avid lesions in breast from FDG PET/CT scansQIN-Breast

    model weights

    github

    kidney-ct.zipKidney, Tumor, and Cysts from contrast enhanced CT scansTCGA-KIRC

    model weights

    github

    liver-ct.zipLiver from CT scansTCGA-LIHC

    model weights

    github

    liver-mr.zipLiver from T1 MRI scansTCGA-LIHC

    model weights

    github

    lung-ct.zipLung and Nodules (3mm-30mm) from CT scansACRIN-NSCLC-FDG-PET
    Anti-PD-1-Lung
    LUNG-PET-CT-Dx
    NSCLC Radiogenomics
    RIDER Lung PET-CT
    TCGA-LUAD
    TCGA-LUSC

    model weights 1

    model weights 2

    github

    lung-fdg-pet-ct.zipLungs and FDG-avid lesions in the lung from FDG PET/CT scansACRIN-NSCLC-FDG-PET
    Anti-PD-1-Lung
    LUNG-PET-CT-Dx
    NSCLC Radiogenomics
    RIDER Lung PET-CT
    TCGA-LUAD
    TCGA-LUSC

    model weights

    github

    prostate-mr.zipProstate from T2 MRI scansProstateX

    model weights

    github

    Likert ScoreDefinition
    5Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)
    4Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable
    3Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome
    2Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch
    1Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.

    Each zip file in the collection correlates to a specific segmentation task. The common folder structure is

    ai-segmentations-dcmThis directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files
    qa-segmentations-dcmThis directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*)
    qa-results.csvCSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.

  9. AI-derived and Manually corrected segmentations for various IDC Collections

    • zenodo.org
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni (2023). AI-derived and Manually corrected segmentations for various IDC Collections [Dataset]. http://doi.org/10.5281/zenodo.8350738
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeff Van Oss; Jeff Van Oss; Gowtham Krishnan Murugesan; Gowtham Krishnan Murugesan; Diana McCrumb; Diana McCrumb; Rahul Soni; Rahul Soni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provides an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.

    To validate the models performance, roughly 10% of the predictions were manually reviewed and corrected by both a board certified radiologist and a medical student (non-expert). Additionally, this non-expert looked at all the ai predictions and rated them on a 5 point Likert scale .

    This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.

    List of all tasks and IDC collections analyzed.

    FileSegmentation TaskIDC CollectionsLInks
    breast-fdg-pet-ct.zipFDG-avid lesions in breast from FDG PET/CT scansQIN-Breast

    model weights

    github

    kidney-ct.zipKidney, Tumor, and Cysts from contrast enhanced CT scansTCGA-KIRC

    model weights

    github

    liver-ct.zipLiver from CT scansTCGA-LIHC

    model weights

    github

    liver-mr.zipLiver from T1 MRI scansTCGA-LIHC

    model weights

    github

    lung-ct.zipLung and Nodules (3mm-30mm) from CT scansACRIN-NSCLC-FDG-PET
    Anti-PD-1-Lung
    LUNG-PET-CT-Dx
    NSCLC Radiogenomics
    RIDER Lung PET-CT
    TCGA-LUAD
    TCGA-LUSC

    model weights 1

    model weights 2

    github

    lung-fdg-pet-ct.zipLungs and FDG-avid lesions in the lung from FDG PET/CT scansACRIN-NSCLC-FDG-PET
    Anti-PD-1-Lung
    LUNG-PET-CT-Dx
    NSCLC Radiogenomics
    RIDER Lung PET-CT
    TCGA-LUAD
    TCGA-LUSC

    model weights

    github

    prostate-mr.zipProstate from T2 MRI scansProstateX

    model weights

    github

    Likert ScoreDefinition
    5Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)
    4Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable
    3Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome
    2Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch
    1Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.

    Each zip file in the collection correlates to a specific segmentation task. The common folder structure is

    ai-segmentations-dcmThis directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files
    qa-segmentations-dcmThis directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*)
    qa-results.csvCSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.

  10. Z

    Data from: Image segmentations produced by BAMF under the AIMI Annotations...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murugesan, Gowtham Krishnan (2024). Image segmentations produced by BAMF under the AIMI Annotations initiative [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8345959
    Explore at:
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    Soni, Rahul
    Murugesan, Gowtham Krishnan
    Van Oss, Jeff
    McCrumb, Diana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provide an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections.

    To validate the model's performance, roughly 10% of the AI predictions were assigned to a validation set. For this set, a board-certified radiologist graded the quality of AI predictions on a Likert scale. If they did not 'strongly agree' with the AI output, the reviewer corrected the segmentation.

    This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images.

    Only 10% of the AI-derived annotations provided in this dataset are verified by expert radiologists . More details, on model training and annotations are provided within the associated manuscript to ensure transparency and reproducibility.

    This work was done in two stages. Versions 1.x of this record were from the first stage. Versions 2.x added additional records. In the Version 1.x collections, a medical student (non-expert) reviewed all the AI predictions and rated them on a 5-point Likert Scale, for any AI predictions in the validation set that they did not 'strongly agree' with, the non-expert provided corrected segmentations. This non-expert was not utilized for the Version 2.x additional records.

    Likert Score Definition:

    Guidelines for reviewers to grade the quality of AI segmentations.

    5 Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change)

    4 Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable

    3 Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome

    2 Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch

    1 Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable.

    Zip File Folder Structure

    Each zip file in the collection correlates to a specific segmentation task. The common folder structure is

    ai-segmentations-dcm This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files

    qa-segmentations-dcm This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*)

    qa-results.csv CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance.

    qa-results.csv Columns

    The qa-results.csv file contains metadata about the segmentations, their related IDC case image, as well as the Likert ratings and comments by the reviewers.

    Column

    Description

    Collection

    The name of the IDC collection for this case

    PatientID

    PatientID in DICOM metadata of scan. Also called Case ID in the IDC

    StudyInstanceUID

    StudyInstanceUID in the DICOM metadata of the scan

    SeriesInstanceUID

    SeriesInstanceUID in the DICOM metadata of the scan

    Validation

    true/false if this scan was manually reviewed

    Reviewer

    Coded ID of the reviewer. Radiologist IDs start with ‘rad’ non-expect IDs start with ‘ne’

    AimiProjectYear

    2023 or 2024, This work was split over two years. The main methodology difference between the two is that in 2023, a non-expert also reviewed the AI output, but a non-expert was not utilized in 2024.

    AISegmentation

    The filename of the AI prediction file in DICOM-seg format. This file is in the ai-segmentations-dcm folder.

    CorrectedSegmentation

    The filename of the reviewer-corrected prediction file in DICOM-seg format. This file is in the qa-segmentations-dcm folder. If the reviewer strongly agreed with the AI for all segments, they did not provide any correction file.

    Was the AI predicted ROIs accurate?

    This column appears one for each segment in the task for images from AimiProjectYear 2023. The reviewer rates segmentation quality on a Likert scale. In tasks that have multiple labels in the output, there is only one rating to cover them all.

    Was the AI predicted {SEGMENT_NAME} label accurate?

    This column appears one for each segment in the task for images from AimiProjectYear 2024. The reviewer rates each segment for its quality on a Likert scale.

    Do you have any comments about the AI predicted ROIs?

    Open ended question for the reviewer

    Do you have any comments about the findings from the study scans?

    Open ended question for the reviewer

    File Overview

    brain-mr.zip

    Segment Description: brain tumor regions: necrosis, edema, enhancing

    IDC Collection: UPENN-GBM

    Links: model weights, github

    breast-fdg-pet-ct.zip

    Segment Description: FDG-avid lesions in breast from FDG PET/CT scans QIN-Breast

    IDC Collection: QIN-Breast

    Links: model weights, github

    breast-mr.zip

    Segment Description: Breast, Fibroglandular tissue, structural tumor

    IDC Collection: duke-breast-cancer-mri

    Links: model weights, github

    kidney-ct.zip

    Segment Description: Kidney, Tumor, and Cysts from contrast enhanced CT scans

    IDS Collection: TCGA-KIRC, TCGA-KIRP, TCGA-KICH, CPTAC-CCRCC

    Links: model weights, github

    liver-ct.zip

    Segment Description: Liver from CT scans

    IDC Collection: TCGA-LIHC

    Links: model weights, github

    liver2-ct.zip

    Segment Description: Liver and Lesions from CT scans

    IDC Collection: HCC-TACE-SEG, COLORECTAL-LIVER-METASTASES

    Links: model weights, github

    liver-mr.zip

    Segment Description: Liver from T1 MRI scans

    IDC Collection: TCGA-LIHC

    Links: model weights, github

    lung-ct.zip

    Segment Description: Lung and Nodules (3mm-30mm) from CT scans

    IDC Collections:

    Anti-PD-1-Lung

    LUNG-PET-CT-Dx

    NSCLC Radiogenomics

    RIDER Lung PET-CT

    TCGA-LUAD

    TCGA-LUSC

    Links: model weights 1, model weights 2, github

    lung2-ct.zip

    Improved model version

    Segment Description: Lung and Nodules (3mm-30mm) from CT scans

    IDC Collections:

    QIN-LUNG-CT, SPIE-AAPM Lung CT Challenge

    Links: model weights, github

    lung-fdg-pet-ct.zip

    Segment Description: Lungs and FDG-avid lesions in the lung from FDG PET/CT scans

    IDC Collections:

    ACRIN-NSCLC-FDG-PET

    Anti-PD-1-Lung

    LUNG-PET-CT-Dx

    NSCLC Radiogenomics

    RIDER Lung PET-CT

    TCGA-LUAD

    TCGA-LUSC

    Links: model weights, github

    prostate-mr.zip

    Segment Description: Prostate from T2 MRI scans

    IDC Collection: ProstateX, Prostate-MRI-US-Biopsy

    Links: model weights, github

    Changelog

    2.0.2 - Fix the brain-mr segmentations to be transformed correctly

    2.0.1 - added AIMI 2024 radiologist comments to qa-results.csv

    2.0.0 - added AIMI 2024 segmentations

    1.X - AIMI 2023 segmentations and reviewer scores

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Cancer Imaging Archive, A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis [Dataset]. http://doi.org/10.7937/TCIA.2020.NNC2-0461

A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

Lung-PET-CT-Dx

Explore at:
102 scholarly articles cite this dataset (View in Google Scholar)
xml, n/a, xlsx, dicomAvailable download formats
Dataset authored and provided by
The Cancer Imaging Archive
License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered
Dec 22, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.

The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.

Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.

The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.

Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.

Search
Clear search
Close search
Google apps
Main menu