Facebook
Twitterhttps://github.com/Larxel/Assets/blob/main/splash.png?raw=true" alt="">
Computed tomography (CT) use has increased dramatically over the past several decades. The total number of CT examinations performed annually in the United States has risen from approximately 3 million in 1980 to nearly 70 million in 2007. Integrating CT into routine care has improved patient health care dramatically, and CT is widely considered among the most important advances in medicine.
However, CT delivers much higher radiation doses than do conventional diagnostic x-rays. For example, a chest CT scan typically delivers more than 100 times the radiation dose of a routine frontal and lateral chest radiograph. Furthermore, radiation exposure from CT examinations has also increased, in part due to the increased speed of image acquisition allowing vascular, cardiac, and multiphase examinations, all associated with higher doses. Thus, greater use of CT has resulted in a concurrent increase in the medical exposure to ionizing radiation
In light of this matter, the present dataset contains 10 unique CT scans with different slice thickness (1mm and 3mm) and using two different reconstruction kernels (B30 and D45). Slice thickness determines the depth of each image in a CT scan bundle, the lower, the better for diagnosis, prognosis and other measurements. The reconstruction kernel is applied after the CT scan doses the patient and impacts the smoothness/sharpness of the resulting image, as seen below.
https://github.com/Larxel/Assets/blob/main/splash2.png?raw=true" alt="">
- Create a deep learning model (i.e. Unet, GAN) to denoise low dose ct scans into what a routine dose would look like.
- Your kernel can be featured here!
- More datasets
If you use this dataset in your research, please credit the authors.
Citation
McCollough, C.H., Bartley, A.C., Carter, R.E., Chen, B., Drees, T.A., Edwards, P., Holmes, D.R., III, Huang, A.E., Khan, F., Leng, S., McMillan, K.L., Michalak, G.J., Nunez, K.M., Yu, L. and Fletcher, J.G. (2017), Low-dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 Low Dose CT Grand Challenge. Med. Phys., 44: e339-e352. https://doi.org/10.1002/mp.12345
License
License was not specified, yet data is public and available to download without any login.
Splash banner
Icon by Wanicon.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The respective data is comprised of 5 different datasets of medical images collected by the contributors, which can be used for classifying Lung Cancer, Bone Fracture, Brain tumor, Skin Lesions, and Renal Malignancy, respectively. The data also includes multiple disease and malignancy images for the respective dataset. The classification for the diseases can be done by using ResNet50 CNN architecture and other DCNN models. This data is also been used in a research article by the contributor.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains a collection of multimodal medical images, specifically CT (Computed Tomography) and MRI (Magnetic Resonance Imaging) scans, for brain tumor detection and analysis. It is designed to assist researchers and healthcare professionals in developing AI models for the automatic detection, classification, and segmentation of brain tumors. The dataset features images from both modalities, providing comprehensive insight into the structural and functional variations in the brain associated with various types of tumors.
The dataset includes high-resolution CT and MRI images captured from multiple patients, with each image labeled with the corresponding tumor type (e.g., glioma, meningioma, etc.) and its location within the brain. This combination of CT and MRI images aims to leverage the strengths of both imaging techniques: CT scans for clear bone structure visualization and MRI for soft tissue details, enabling a more accurate analysis of brain tumors.
I collected these data from different sources and modified data for maximum accuracy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set consisting of axial sections of abdominal CT scans taken as part of a diagnostic procedure to detect stomach cancer.
Facebook
TwitterOverview
The RAD-ChestCT dataset is a large medical imaging dataset developed by Duke MD/PhD student Rachel Draelos during her Computer Science PhD supervised by Lawrence Carin. The full dataset includes 35,747 chest CT scans from 19,661 adult patients. This Zenodo repository contains an initial release of 3,630 chest CT scans, approximately 10% of the dataset. This dataset is of significant interest to the machine learning and medical imaging research communities.
Papers
The following published paper includes a description of how the RAD-ChestCT dataset was created: Draelos et al., "Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes," Medical Image Analysis 2021. DOI: 10.1016/j.media.2020.101857 https://pubmed.ncbi.nlm.nih.gov/33129142/
Two additional papers leveraging the RAD-ChestCT dataset are available as preprints:
"Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks" (https://arxiv.org/abs/2011.08891)
"Explainable multiple abnormality classification of chest CT volumes with deep learning" (https://arxiv.org/abs/2111.12215)
Details about the files included in this data release
Metadata Files (4)
CT_Scan_Metadata_Complete_35747.csv: includes metadata about the whole dataset, with information extracted from DICOM headers.
Extrema_5747.csv: includes coordinates for lung bounding boxes for the whole dataset. Coordinates were derived computationally using a morphological image processing lung segmentation pipeline.
Indications_35747.csv: includes scan indications for the whole dataset. Indications were extracted from the free-text reports.
Summary_3630.csv: includes a listing of the 3,630 scans that are part of this repository.
Label Files (3)
The label files contain abnormality x location labels for the 3,630 shared CT volumes. Each CT volume is annotated with a matrix of 84 abnormality labels x 52 location labels. Labels were extracted from the free text reports using the Sentence Analysis for Radiology Label Extraction (SARLE) framework. For each CT scan, the label matrix has been flattened and the abnormalities and locations are separated by an asterisk in the CSV column headers (e.g. "mass*liver"). The labels can be used as the ground truth when training computer vision classifiers on the CT volumes. Label files include: imgtrain_Abnormality_and_Location_Labels.csv (for the training set)
imgvalid_Abnormality_and_Location_Labels.csv (for the validation set)
imgtest_Abnormality_and_Location_Labels.csv (for the test set)
CT Volume Files (3,630)
Each CT scan is provided as a compressed 3D numpy array (npz format). The CT scans can be read using the Python package numpy, version 1.14.5 and above.
Related Code
Code related to RAD-ChestCT is publicly available on GitHub at https://github.com/rachellea.
Repositories of interest include:
https://github.com/rachellea/ct-net-models contains PyTorch code to load the RAD-ChestCT dataset and train convolutional neural network models for multiple abnormality prediction from whole CT volumes.
https://github.com/rachellea/ct-volume-preprocessing contains an end-to-end Python framework to convert CT scans from DICOM to numpy format. This code was used to prepare the RAD-ChestCT volumes.
https://github.com/rachellea/sarle-labeler contains the Python implementation of the SARLE label extraction framework used to generate the abnormality and location label matrix from the free text reports. SARLE has minimal dependencies and the abnormality and location vocabulary terms can be easily modified to adapt SARLE to different radiologic modalities, abnormalities, and anatomical locations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
DATASET STRUCTURE
The dataset can be downloaded from https://doi.org/10.5281/zenodo.7260705 and a detailed description is offered at "synthRAD2023_dataset_description.pdf".
The training datasets for Task1 is in Task1.zip, while for Task2 in Task2.zip. After unzipping, each Task is organized according to the following folder structure:
Task1.zip/
├── Task1
├── brain
├── 1Bxxxx
├── mr.nii.gz
├── ct.nii.gz
└── mask.nii.gz
├── ...
└── overview
├── 1_brain_train.xlsx
├── 1Bxxxx_train.png
└── ...
└── pelvis
├── 1Pxxxx
├── mr.nii.gz
├── ct.nii.gz
├── mask.nii.gz
├── ...
└── overview
├── 1_pelvis_train.xlsx
├── 1Pxxxx_train.png
└── ....
Task2.zip/
├──Task2
├── brain
├── 2Bxxxx
├── cbct.nii.gz
├── ct.nii.gz
└── mask.nii.gz
├── ...
└── overview
├── 2_brain_train.xlsx
├── 2Bxxxx_train.png
└── ...
└── pelvis
├── 2Pxxxx
├── cbct.nii.gz
├── ct.nii.gz
├── mask.nii.gz
├── ...
└── overview
├── 2_pelvis_train.xlsx
├── 2Pxxxx_train.png
└── ....
Each patient folder has a unique name that contains information about the task, anatomy, center and a patient ID. The naming follows the convention below:
[Task] [Anatomy] [Center] [PatientID]
1 B A 001
In each patient folder, three files can be found:
ct.nii.gz: CT image
mr.nii.gz or cbct.nii.gz (depending on the task): CBCT/MR image
mask.nii.gz:image containing a binary mask of the dilated patient outline
For each task and anatomy, an overview folder is provided which contains the following files:
[task]_[anatomy]_train.xlsx: This file contains information about the image acquisition protocol for each patient.
[task][anatomy][center][PatientID]_train.png: For each patient a png showing axial, coronal and sagittal slices of CBCT/MR, CT, mask and the difference between CBCT/MR and CT is provided. These images are meant to provide a quick visual overview of the data.
DATASET DESCRIPTION
This challenge dataset contains imaging data of patients who underwent radiotherapy in the brain or pelvis region. Overall, the population is predominantly adult and no gender restrictions were considered during data collection. For Task 1, the inclusion criteria were the acquisition of a CT and MRI during treatment planning while for task 2, acquisitions of a CT and CBCT, used for patient positioning, were required. Datasets for task 1 and 2 do not necessarily contain the same patients, given the different image acquisitions for the different tasks.
Data was collected at 3 Dutch university medical centers:
Radboud University Medical Center
University Medical Center Utrecht
University Medical Center Groningen
For anonymization purposes, from here on, institution names are substituted with A, B and C, without specifying which institute each letter refers to.
The following number of patients is available in the training set.
Training
|
Brain |
Pelvis | |||||||
|
Center A |
Center B |
Center C |
Total |
Center A |
Center B |
Center C |
Total | |
|
Task 1 |
60 |
60 |
60 |
180 |
120 |
0 |
60 |
180 |
|
Task 2 |
60 |
60 |
60 |
180 |
60 |
60 |
60 |
180 |
Each subset generally contains equal amounts of patients from each center, except for task 1 brain, where center B had no MR scans available. To compensate for this, center A provided twice the number of patients than in other subsets.
Validation
|
Brain |
Pelvis | |||||||
|
Center A |
Center B |
Center C |
Total |
Center A |
Center B |
Center C |
Total | |
|
Task 1 |
10 |
10 |
10 |
30 |
20 |
0 |
10 |
30 |
|
Task 2 |
10 |
10 |
10 |
30 |
10 |
10 |
10 |
30 |
Testing
|
Brain |
Pelvis | |||||||
|
Center A |
Center B |
Center C |
Total |
Center A |
Center B |
Center C |
Total | |
|
Task 1 |
20 |
20 |
20 |
60 |
40 |
0 |
20 |
60 |
|
Task 2 |
20 |
20 |
20 |
60 |
20 |
20 |
20 |
60 |
In total, for all tasks and anatomies combined, 1080 image pairs (720 training, 120 validation, 240 testing) are available in this dataset. This repository only contains the training data.
All images were acquired with the clinically used scanners and imaging protocols of the respective centers and reflect typical images found in clinical routine. As a result, imaging protocols and scanner can vary between patients. A detailed description of the imaging protocol for each image, can be found in spreadsheets that are part of the dataset release (see dataset structure).
Data was acquired with the following scanners:
Center A:
MRI: Philips Ingenia 1.5T/3.0T
CT: Philips Brilliance Big Bore or Siemens Biograph20 PET-CT
CBCT: Elekta XVI
Center B:
MRI: Siemens MAGNETOM Aera 1.5T or MAGNETOM Avanto_fit 1.5T
CT: Siemens SOMATOM Definition AS
CBCT: IBA Proteus+ or Elekta XVI
Center C:
MRI: Siemens Avanto fit 1.5T or Siemens MAGNETOM Vida fit 3.0T
CT: Philips Brilliance Big Bore
CBCT: Elekta XVI
For task 1, MRIs were acquired with a T1-weighted gradient echo or an inversion prepared - turbo field echo (TFE) sequence and collected along with the corresponding planning CTs for all subjects. The exact acquisition parameters vary between patients and centers. For centers B and C, selected MRIs were acquired with Gadolinium contrast, while the selected MRIs of center A were acquired without contrast.
For task 2, the CBCTs used for image-guided radiotherapy ensuring accurate patient position were selected for all subjects along with the corresponding
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
NIH Visible Human Project Home page: https://www.nlm.nih.gov/research/visible/visible_human.html The NLM Visible Human Project has created publicly-available complete, anatomically detailed, three-dimensional representations of a human male body and a human female body. Specifically, the VHP provides a public-domain library of cross-sectional cryosection, CT, and MRI images obtained from one male cadaver and one female cadaver. The Visible Man data set was publicly released in 1994 and the Visible Woman in 1995. The data sets were designed to serve as (1) a reference for the study of human anatomy, (2) public-domain data for testing medical imaging algorithms, and (3) a test bed and model for the construction of network-accessible image libraries. The VHP data sets have been applied to a wide range of educational, diagnostic, treatment planning, virtual reality, artistic, mathematical, and industrial uses. About 4,000 licensees from 66 countries were authorized to access the datasets. As of 2019, a license is no longer required to access the VHP datasets.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This data set includes low-dose whole body CT images and tissue segmentations of thirty healthy adult research participants who underwent PET/CT imaging on the uEXPLORER total-body PET/CT system at UC Davis. Participants included in this study were healthy adults, 18 years of age or older, who were able to provide informed written consent. The participants' age, sex, weight, height, and body mass index are also provided.
Fifteen participants underwent PET/CT imaging at three timepoints during a 3-hour period (0 minutes, 90 minutes, and 180 minutes) after PET radiotracer injection, while the remaining 15 participants were imaged at six timepoints during a 12-hour period (additionally at 360 minutes, 540 minutes, and 720 minutes). The imaging timepoint is indicated in the Series Description DICOM tag, with a value of either 'dyn', '90min', '3hr', '6hr', '9hr', or '12hr', corresponding to the delay after PET tracer injection. CT images were acquired immediately before PET image acquisition. Currently, only CT images are included in the data set from either three or six timepoints. The tissue segmentations include 37 tissues consisting of 13 abdominal organs, 20 different bones, subcutaneous and visceral fat, skeletal and psoas muscle. Segmentations were automatically generated at the 90 minute timepoint for each participant using MOOSE, an AI segmentation tool for whole body data. The segmentations are provided in NIFTI format and may need to be re-oriented to correctly match the CT image data in DICOM format.
The uEXPLORER CT scanner is an 80-row, 160 slice CT scanner typically used for anatomical imaging and attenuation correction for PET/CT. The CT scan obtained at 90 minutes was performed with 140 kVp and an average of 50 mAs for all subjects. At all other time-points (0 minutes, 180 minutes, etc.) the CT scan was obtained with 140 kVp and an average of 5 mAs. CT images were reconstructed into a 512x512x828 image matrix with 0.9766x0.9766x2.344 mm3 voxel size.
A key is provided along with the segmentations download in the Data Access table which details the organ values.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The RADCURE dataset was collected clinically for radiation therapy treatment planning and retrospectively reconstructed for quantitative imaging research.
Inclusion: The dataset used for this study consists of 3,346 head and neck cancer CT image volumes collected from 2005-2017 treated with definitive RT at the University Health Network (UHN) in Toronto, Canada
Acquisition and Validation Methods: RADCURE contains computed tomography (CT) images with corresponding normal and non-normal tissue contours. CT scans were collected using systems from three different manufacturers. Standard clinical imaging protocols were followed, and contours were generated and reviewed at weekly quality assurance rounds. RADCURE imaging and structure set data was extracted from our institution’s radiation treatment planning and oncology systems using an in-house data mining and processing system. Furthermore, images are linked to clinical data for each patient and include demographic, clinical and treatment information based on the 7th edition TNM staging system. The median patient age is 63, with the final dataset including 80% males. Oropharyngeal cancer makes up 50% of the population with larynx, nasopharynx, and hypopharynx cancer, comprising 25, 12, and 5% respectively. Median follow-up was 5 years with 60% of the patients alive at last follow-up.
Data Format and Usage Notes: During extraction of images and contours from our institution’s radiation treatment planning and oncology systems, the data was converted to DICOM and RTSTRUCT formats, respectively. To improve the usability of the RTSTRUCT files, individual contour names were standardized for primary tumor volumes and 19 organs-at-risk. Demographic, clinical, and treatment information is provided as a comma-separated values (csv) file. This dataset is a superset of the Radiomic Biomarkers in Oropharyngeal Carcinoma (OPC-Radiomics) dataset and fully encapsulates all previous data; this dataset replaces the OPC-Radiomics dataset. The RTSTRUCTs from OPC-Radiomics have been standardized to adhere to the TG263 nomenclature. Age of 90 years or greater is considered PHI and set to 90 years to minimize impact to privacy. Both radiological and clinical metadata were offset by an undisclosed number of days for anonymization and should be noted for downstream analysis. The TG263-standardized RTSTRUCTs include only the GTVp (primary gross tumor volume) contours. Patients without corresponding GTVp contours will not have RTSTRUCTs.
Potential Applications: The availability of imaging, clinical, demographic and treatment data in RADCURE makes it a viable option for a variety of quantitative image analysis research initiatives. This includes the application of machine learning or artificial intelligence methods to expedite routine clinical practices, discover new non-invasive biomarkers, or develop prognostic models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
nodule < 3 mm
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Benchmark Dataset for Low-Dose CT Reconstruction Methods.
The following Data Descriptor article provides full documentation:
Leuschner, J., Schmidt, M., Baguer, D.O. et al. LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction. Sci Data 8, 109 (2021). https://www.nature.com/articles/s41597-021-00893-z
The python library DIVαℓ (github.com/jleuschn/dival) can be used to download and access the dataset.
Reconstructions from the LIDC/IDRI dataset are used as a basis for this dataset.
The ZIP files included in the LoDoPaB dataset contain multiple HDF5 files. Each HDF5 file contains one HDF5 dataset named "data", that provides a number of samples (128 except for the last file in each ZIP file). For example, the n-th training sample pair is stored in the files "observation_train_%03d.hdf5" and "ground_truth_train_%03d.hdf5" where "%03d" is floor(n / 128), at row (n mod 128) of "data".
Note: each last ground truth file (i.e. ground_truth_train_279.hdf5, ground_truth_validation_027.hdf5 and ground_truth_test_027.hdf5) still contains a HDF5 dataset of shape (128, 362, 362), although it contains less than 128 valid samples. Thus, the number of valid samples needs to be determined from the total samples numbers in the part (i.e. "train": 35820, "validation": 3522, "test": 3553), or from the corresponding observation file, for which the first dimension of the HDF5 dataset matches the number of valid samples in the file.
The randomized patient IDs of the samples are provided as CSV files. The patient IDs of the train, validation and test parts are integers in the range of 0–631, 632–691 and 692–751, respectively. The ID of each sample is stored in a single row.
Acknowledgements
Johannes Leuschner, Maximilian Schmidt and Daniel Otero Baguer acknowledge the support by the Deutsche
Forschungsgemeinschaft (DFG) within the framework of GRK 2224/1 “π3: Parameter Identification – Analysis,
Algorithms, Applications”. We thank Simon Arridge, Ozan Öktem, Carola-Bibiane Schönlieb and Christian
Etmann for the fruitful discussion about the procedure, and Felix Lucka and Jonas Adler for their ideas and
helpful feedback on the simulation setup. The authors acknowledge the National Cancer Institute and the Foundation for the National Institutes of Health, and their critical role in the creation of the free publicly available LIDC/IDRI Database used in this study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains cough sound images (CSI), chest X-rays (CXR), and CT scan images of several chest diseases such as COVID-19, lung cancer (LC), consolidation lung (COL), atelectasis (ATE), tuberculosis (TB), pneumothorax (PNEUTH), edema (EDE), pneumonia (PNEU) and normal (NOR). The dataset was collected from Al-Shafa Hospital, Multan, Pakistan in collaboration with Apple Pharma medicine dealer company. The statistics of the dataset are presented in Table 1. The meta-information of the patient is not provided. However, the dataset is about 20% of female and 80% of male CSI, CXR, and CT scan images. In addition, CXR, CSI, and CT scans of individuals aged 41 to 62 years were collected.
Table 1. Statistics of the chest diseases dataset.
| Sr# | Chest Diseases | CSI | CXR | CT |
|---|---|---|---|---|
| 1 | COVID-19 | 28 | 5 | 17 |
| 2 | Lungs Cancer | 30 | 7 | 5 |
| 3 | Consolidation Lung | 18 | 5 | 3 |
| 4 | Atelectasis | 25 | 4 | 4 |
| 5 | Tuberculosis | 27 | 4 | 6 |
| 6 | Pneumothorax | 18 | 5 | 4 |
| 7 | Edema | 18 | 5 | 9 |
| 8 | Pneumonia | 16 | 10 | 15 |
| 9 | Normal | 24 | 6 | 8 |
Malik, Hassaan; Anees, Tayyaba (2023), “Chest Diseases Using Different Medical Imaging and Cough Sounds”, Mendeley Data, V1, doi: 10.17632/y6dvssx73b.1
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Note - This is part 1 of the dataset.
Part 1 can be found at : https://zenodo.org/records/13799069
Part 2 can be found at : https://zenodo.org/records/12784601
Part 3 can be found at : https://zenodo.org/records/14659131
Background: Lung cancer risk classification is an increasingly important area of research as low-dose thoracic CT screening programs have become standard of care for patients at high risk for lung cancer. There is limited availability of large, annotated public databases for the training and testing of algorithms for lung nodule classification.
Methods: Screening chest CT scans done between January 1, 2015 and June 30, 2021 at Duke University Health System were considered for this study. Efficient nodule annotation was performed semi-automatically by using a publicly available deep learning nodule detection algorithm trained on the LUNA16 dataset to identify initial candidates, which were then accepted based on nodule location in the radiology text report or manually annotated by a medical student and a fellowship-trained cardiothoracic radiologist.
Results: The dataset contains 1613 CT volumes with 2487 annotated nodules, selected from a total dataset of 2061 patients, with the remaining data reserved for future testing. Radiologist spot-checking confirmed the semi-automated annotation had an accuracy rate of >90%.
Conclusions: The Duke Lung Cancer Screening Dataset 2024 is the first large dataset for CT screening for lung cancer reflecting the use of current CT technology. This represents a useful resource of lung cancer risk classification research, and the efficient annotation methods described for its creation may be used to generate similar databases for research in the future.
Dataset part Details:
Part 1: DLCS subset 1 to 7 and, metadata and Annotations.
Part 2: DLCS subset 8,9 and CT image info metadata.
Part 3: DLCS subset 10.
Updates and Versions:
Code Repository:
To support reproducible open-access research and benchmarking, we have shared several pre-trained models and baseline results in a GitHub and GitLab repository.
GitLab: https://gitlab.oit.duke.edu/cvit-public/ai_lung_health_benchmarking
GitHub: https://github.com/fitushar/AI-in-Lung-Health-Benchmarking-Detection-and-Diagnostic-Models-Across-Multiple-CT-Scan-Datasets
Funding:
This work was supported by the Duke Department of Radiology Charles E. Putman Vision Award, NIH/NIBIB P41-EB028744, and NIH/NCI R01-CA261457.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The National Institutes of Health Clinical Center performed 82 abdominal contrast enhanced 3D CT scans (~70 seconds after intravenous contrast injection in portal-venous) from 53 male and 27 female subjects. Seventeen of the subjects are healthy kidney donors scanned prior to nephrectomy. The remaining 65 patients were selected by a radiologist from patients who neither had major abdominal pathologies nor pancreatic cancer lesions. Subjects' ages range from 18 to 76 years with a mean age of 46.8 ± 16.7. The CT scans have resolutions of 512x512 pixels with varying pixel sizes and slice thickness between 1.5 − 2.5 mm, acquired on Philips and Siemens MDCT scanners (120 kVp tube voltage).
A medical student manually performed slice-by-slice segmentations of the pancreas as ground-truth and these were verified/modified by an experienced radiologist.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Purpose: Expert selected landmark points on clinical image pairs provide a basis for rigid registration validation. Using combinatorial rigid registration optimization (CORRO) we provide a statistically characterized reference data set for image registration of the pelvis by estimating the optimal ground truth.
Methods: Landmark points for each CT/CBCT image pair for 58 pelvic cases were identified. From the identified landmark pairs, combination subsets of k-number of landmark pairs were generated without repeat, to form a k-set for k=4, 8, &12. An affine registration between the image pairs was calculated for each k-combination set (2,000-8,000,000). The mean and the standard deviation of the registration were used as the final registration for each image pair. Joint entropy was employed to measure and compare the quality of CORRO to commercially available software.
Results: An average of 154 (range: 91-212) landmark pairs were selected for each CT/CBCT image pair. The mean standard deviation of the registration output decreased as the k-size increased for all cases. In general the joint entropy evaluated was found to be lower than results from commercially available software. Of all 58 cases 58.3% of the k=4, 15% of k=8 and 18.3% of k=12 resulted in the better registration using CORRO as compared to 8.3% from a commercial registration software. The minimum joint entropy was determined for one case and found to exist at the estimated registration mean in agreement with the CORRO approach.
Conclusion: The results demonstrate that CORRO works even in the extreme case of the pelvic anatomy where the CBCT suffers from reduced quality due to increased noise levels. The estimated ground truth using CORRO was found to be better than commercially available software for all k-sets tested. Additionally, the k-set of 4 resulted in overall best outcomes when compared to k=8 and 12, which is anticipated because k=8 and 12 are more likely to have combinations that affected the accuracy of the registration.
Figure 1. Content of planning CT shifted to the machine isocenter. Top left to right - axial, coronal and sagittal planning CT image with machine isocenter in red and image isocenter in blue. Middle left to right - axial, coronal and sagittal planning CT at machine isocenter. Bottom left to right - axial, coronal and sagittal CBCT at machine isocenter.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PENGWIN segmentation challenge is designed to advance the development of automated pelvic fracture segmentation techniques in both 3D CT scans (Task 1) and 2D X-ray images (Task 2), aiming to enhance their accuarcy and robustness. The full 3D dataset comprises CT scans from 150 patients scheduled for pelvic reduction surgery, collected from multiple institutions using a variety of scanning devices. This dataset represents a diverse range of patient cohorts and fracture types. Ground-truth segmentations for sacrum and hipbone fragments have been semi-automatically annotated and subsequently validated by medical experts, and are available here. From this 3D data, we have generated high-quality, realistic X-ray images and corresponding 2D labels from the CT data using DeepDRR, incorporating a range of virtual C-arm camera positions and surgical tools. This dataset contains the training set for fragment segmentation in synthetic X-ray (task 2).
The training set is derived from 100 CTs, with 500 images each, for a total of 50,000 training images and segmentations. The C-arm geometry is randomly sampled for each CT within reasonable parameters for a full-size C-arm. The virtual patient is assumed to be in a head-first supine position. Imaging centers are randomly sampled within 50 mm of a fragment, ensuring good visibility. Viewing directions are sampled uniformly on the sphere within 45 degrees of vertical. Half of the images (IDs XXX_0250 - XXX_0500) contain up to 10 simulated K-wires and/or orthopaedic screws oriented randomly in the field of view.
The input images are raw intensity images without any windowing or normalization applied. It is standard practice to first apply the negative log transformation and then window each image appropriately for feeding into a model. See the included augmentation pipeline in pengwin_utils.py for one approach. For viewing raw images, the FIJI image viewer is a viable option, but it is recommended to use the included visualization functions in pengwin_utilities.py to first apply CLAHE normalization and save to a universally readable PNG (see example usage below).
Because X-ray images feature overlapping segmentation maks, the segmentations have been encoded as multi-label uint32 images, where each pixel should be treated as a binary vector with bits 1 - 10 for SA fragments, 11 - 20 for LI, and 21 - 30 for RI. Thus, the raw segmentation files are not viewable with standard image viewing software. pengwin_utilities.py includes functions for converting to and from this format and for visualizing masks overlaid onto the original image (see below).
To use the utilities, first install dependencies with pip install -r requirement.txt. Then, to visualize an image with its segmentation, you can do the following (assuming the training set has been downloaded and unzipped in the same folder):
import pengwin_utils from PIL import Image
image_path = "train/input/images/x-ray/001_0000.tif" seg_path = "train/output/images/x-ray/001_0000.tif"
image = pengwin_utils.load_image(image_path) # raw intensity image masks, category_ids, fragment_ids = pengwin_utils.load_masks(seg_path)
vis_image = pengwin_utils.visualize_sample(image, masks, category_ids, fragment_ids) vis_path = "vis_image.png" Image.fromarray(vis_image).save(vis_path) print(f"Wrote visualization to {vis_path}")
pred_masks, pred_category_ids, pred_fragment_ids = masks, category_ids, fragment_ids # replace with your model
pred_seg = pengwin_utils.masks_to_seg(pred_masks, pred_category_ids, pred_fragment_ids) pred_seg_path = "pred/train/output/images/x-ray/001_0000.tif" # ensure dir exists! Image.fromarray(pred_seg).save(pred_seg_path) print(f"Wrote segmentation to {pred_seg_path}")
The pengwin_utils.Dataset class is provided as an example of a Pytorch dataset, with strong domain randomization included to facilitate sim-to-real performance, but it is recommended to write your own as needed.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The National CT Colonography Trial (ACRIN 6664) collection contains 825 cases of CT colonography imaging with accompanying spreadsheets that provide polyp descriptions and their location within the colon segments. Additional information about the trial is available in the Study Protocol and Case Report Forms.
Main Objective: To clinically validate widespread use of computerized tomographic colonography (CTC) in a screening population for the detection of colorectal neoplasia.
Participants: Male and female outpatients, aged 50 years or older, scheduled for screening colonoscopy, who have not had a colonoscopy in the past five years.
Study Design Summary: The study addresses aspects of central importance to the clinical application of CTC in several interrelated but independent parts that will be conducted in parallel. In Part I, the clinical performance of the CTC examination will be prospectively compared in a blinded fashion to colonoscopy. In Part II, optimization of the CT technique will be performed in view of new technological advances in CT technology. In Part III, lesion detection will be optimized by studying the morphologic features of critical lesion types and in the development of a database for computer-assisted diagnosis. In Part IV, patient preferences and cost-effectiveness implications of observed performance outcomes will be evaluated using a predictive model.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.
Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.
For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.
Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.
For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.
Facebook
Twitterhttps://github.com/Larxel/Assets/blob/main/splash.png?raw=true" alt="">
Computed tomography (CT) use has increased dramatically over the past several decades. The total number of CT examinations performed annually in the United States has risen from approximately 3 million in 1980 to nearly 70 million in 2007. Integrating CT into routine care has improved patient health care dramatically, and CT is widely considered among the most important advances in medicine.
However, CT delivers much higher radiation doses than do conventional diagnostic x-rays. For example, a chest CT scan typically delivers more than 100 times the radiation dose of a routine frontal and lateral chest radiograph. Furthermore, radiation exposure from CT examinations has also increased, in part due to the increased speed of image acquisition allowing vascular, cardiac, and multiphase examinations, all associated with higher doses. Thus, greater use of CT has resulted in a concurrent increase in the medical exposure to ionizing radiation
In light of this matter, the present dataset contains 10 unique CT scans with different slice thickness (1mm and 3mm) and using two different reconstruction kernels (B30 and D45). Slice thickness determines the depth of each image in a CT scan bundle, the lower, the better for diagnosis, prognosis and other measurements. The reconstruction kernel is applied after the CT scan doses the patient and impacts the smoothness/sharpness of the resulting image, as seen below.
https://github.com/Larxel/Assets/blob/main/splash2.png?raw=true" alt="">
- Create a deep learning model (i.e. Unet, GAN) to denoise low dose ct scans into what a routine dose would look like.
- Your kernel can be featured here!
- More datasets
If you use this dataset in your research, please credit the authors.
Citation
McCollough, C.H., Bartley, A.C., Carter, R.E., Chen, B., Drees, T.A., Edwards, P., Holmes, D.R., III, Huang, A.E., Khan, F., Leng, S., McMillan, K.L., Michalak, G.J., Nunez, K.M., Yu, L. and Fletcher, J.G. (2017), Low-dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 Low Dose CT Grand Challenge. Med. Phys., 44: e339-e352. https://doi.org/10.1002/mp.12345
License
License was not specified, yet data is public and available to download without any login.
Splash banner
Icon by Wanicon.