Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Links to code and bioRxiv pre-print:
1. Multi-lens Neural Machine (MLNM) Code
2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)
Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.
Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.
Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.
This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:
Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.
|
#Slides |
|
|
|
|
Train |
Valid |
Test |
Total |
Prostatectomy |
17 |
8 |
15 |
40 |
Biopsy |
26 |
13 |
20 |
59 |
Total |
43 |
21 |
35 |
99 |
|
#Patches |
|
|
|
|
Train |
Valid |
Test |
Total |
Prostatectomy |
7795 |
3753 |
7224 |
18772 |
Biopsy |
5559 |
4028 |
5981 |
15568 |
Total |
13354 |
7781 |
13205 |
34340 |
Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.
|
#Slides (GS 3+3:3+4:4+3) |
|
|
|
|
Train |
Valid |
Test |
Total |
Biopsy |
10:9:1 |
3:7:0 |
6:10:0 |
19:26:1 |
|
#Patches (B:M) |
|
|
|
|
Train |
Valid |
Test |
Total |
Biopsy |
1557:2277 |
1216:1341 |
1543:2718 |
4316:6336 |
NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Large set of whole-slide-images (WSI) of prostatectomy specimens with various grades of prostate cancer (PCa). More information can be found in the corresponding paper: https://doi.org/10.1038/s41598-018-37257-4
The WSIs in this dataset can be viewed using the open-source software ASAP or Open Slide.
Due to the large size of the complete dataset, the data has been split up in to multiple archives.
The data from the training set:
The data from the test set:
This study was financed by a grant from the Dutch Cancer Society (KWF), grant number KUN 2015-7970.
If you make use of this dataset please cite both the dataset itself and the corresponding paper: https://doi.org/10.1038/s41598-018-37257-4
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This collection provides public access to a 3D pathology dataset of prostate cancer, allowing researchers to further investigate various 3D tissue structures and their correlation with prostate cancer patient outcomes (biochemical recurrence). These 3D tissue structures are revealed through: (1) a H&E-analog stain, (2) synthetically generated immunofluorescence staining of CK8 (targeting the luminal epithelial cells of all prostate glands), and (3) 3D segmentation masks of the gland lumen, epithelium, and stromal regions of prostate biopsies. This data collection will promote research in the field of computational 3D pathology for clinical decision support.
In this TCIA collection, we provide the 2x down-sampled fused OTLS-imaged images (H&E-analog staining), the synthetic cytokeratin-8 (CK8) immunofluorescent images at 2x-downsampled resolution, the 3D semantic segmentation masks of glands at 4x down-sampled resolution, the clinical data for patient outcomes (biochemical recurrence), and the coordinates for the cancer-enriched regions of each biopsy. All datasets are from the 50 patient cases studied in this publication: [W. Xie et al., Cancer Research, 2022]. The Python code for the deep-learning models, and for 3D glandular segmentations based on synthetic-CK8 datasets, are available on GitHub at https://github.com/WeisiX/ITAS3D.
Note that the 3D pathology datasets provided in this collection were generated in Dr. Jonathan Liu’s lab at the University of Washington with a custom open-top light-sheet (OTLS) microscope developed by the lab [A.K. Glaser et al., Nature Communications, 2019]. There is no clinical metadata within the imaging files and all patients are referred to with coded identifiers. All of the clinical outcomes data provided in this collection have already been published within the supplement of [W. Xie et al., Cancer Research, 2022].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contributes DICOM-converted annotations to the publicly available National Cancer Institute Imaging Data Commons [1] Prostate-MRI-US-Biopsy collection (https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=Community&collection_id=prostate_mri_us_biopsy). Prostate-MRI-US-Biopsy collection was initially released by The Cancer Imaging Archive (TCIA) [2,3,4]. While the images in this collection are stored in the standard DICOM format, the collection is also accompanied by 1017 semi-automatic segmentations of the prostate and 1317 manual segmentations of target lesions in the STL format. Although STL is a common and practical format for 3D printing, it is not interoperable with many visualization and analysis tools commonly used in medical imaging research and does not provide any standard means to communicate metadata, among other limitations. This dataset contains segmentations of the prostate and target lesions harmonized into DICOM representation. Specifically, we created DICOM Encapsulated 3D Manufacturing Model objects (M3D modality) that includes the original STL content enriched with the DICOM metadata. Furthermore, we created an alternative encoding of the surface segmentations by rasterizing them and saving the result as a DICOM Segmentation object (SEG modality). As a result, the contributed DICOM objects can be stored in any DICOM server that supports those objects (including Google Healthcare DICOM stores), and the DICOM Segmentations can be visualized using off-the-shelf tools, such as OHIF Viewer. Conversion from STL to DICOM M3D modality was performed using PixelMed toolkit (https://www.pixelmed.com/dicomtoolkit.html). Conversion from STL to DICOM SEG was done in 2 steps. We used Slicer (https://www.slicer.org/) to rasterize the surface segmentation to the matrix of the segmented image, which were next converted to DICOM SEGs using dcmqi (https://github.com/QIICR/dcmqi) [5]. Resulting objects were validated using dicom3tools dciodvfy (https://www.dclunie.com/dicom3tools.html). Details describing the conversion process as well as the details on how to access the encapsulated STL content from the DICOM m3D files are provided in this GitHub repository: https://github.com/ImagingDataCommons/prostate_mri_us_biopsy_dcm_conversion. Specific files included in the record are:
Prostate-MRI-US-Biopsy-DICOM-Annotations.zip: DICOM M3D and SEG files, organized into the folder hierarchy following this pattern: Prostate-MRI-US-Biopsy/%PatientID/%StudyInstanceUID/%SeriesNumber-%Modality-%SeriesDescription.dcm referenced_images_sorted-idc_file_manifest.s5cmd: IDC manifest for downloading the T2W MRI images corresponding to the annotations. To download the files in this manifest, first install s5cmd (https://github.com/peak/s5cmd), and run the following command: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run referenced_images_sorted-idc_file_manifest.s5cmd. Files will be organized in the Prostate-MRI-US-Biopsy/%PatientID/%StudyInstanceUID/ folder hierarchy upon download. References [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). doi: 10.1158/0008-5472.CAN-21-0950. [2] Natarajan, S., Priester, A., Margolis, D., Huang, J., & Marks, L. (2020). Prostate MRI and Ultrasound With Pathology and Coordinates of Tracked Biopsy (Prostate-MRI-US-Biopsy) (version 2) [Data set]. The Cancer Imaging Archive. DOI: 10.7937/TCIA.2020.A61IOC1A [3] Sonn GA, Natarajan S, Margolis DJ, MacAiran M, Lieu P, Huang J, Dorey FJ, Marks LS. Targeted biopsy in the detection of prostate cancer using an office based magnetic resonance ultrasound fusion device. Journal of Urology 189, no. 1 (2013): 86-91. DOI: 10.1016/j.juro.2012.08.095 [4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7 [5] Herz, C., Fillion-Robin, J.-C., Onken, M., Riesmeier, J., Lasso, A., Pinter, C., Fichtinger, G., Pieper, S., Clunie, D., Kikinis, R. & Fedorov, A. dcmqi: An Open Source Library for Standardized Communication of Quantitative Image Analysis Results Using DICOM. Cancer Res. 77, e87–e90 (2017). DOI: 10.1158/0008-5472.CAN-17-0336.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Prostate cancer T1- and T2-weighted magnetic resonance images (MRIs) were acquired on a 1.5 T Philips Achieva by combined surface and endorectal coil, including dynamic contrast-enhanced images obtained prior to, during and after I.V. administration of 0.1 mmol/kg body weight of Gadolinium-DTPA (pentetic acid). Corresponding clinical metadata (XLS format) and 3D segmentation files (NRRD format) are offered as a supplement to this image collection. The XLS file contains pathology biopsy and excised gland tissue reports and the MRI radiology report for most subjects.
The Multi-component NRRD Segmentations allow visualization and downstream analysis in 3D Slicer of the following prostate components: prostate gland boundary; internal capsule; central gland, peripheral zone; seminal vesicles; urethra; cancer – dominant nodule; neurovascular bundle; penile bulb; ejaculatory duct; veru-montanum; and rectum. See our tutorial on Using 3D Slicer with the Prostate-Diagnosis data if you are not familiar with using this kind of data.
The Seminal vesicles (SV) and neurovascular bundle (NVB) Segmentations delineate the neurovascular bundle and seminal vessicles as MHA files. These were provided as part of a planned challenge competition that did not materialize.
The Third Party Analysis dataset mentioned beneath the Data Access table was added later as part of the NCI-ISBI 2013 Challenge - Automated Segmentation of Prostate Structures. It includes segmentations for 30 Prostate-Diagnosis subjects in NRRD format which mark the boundaries of the central gland and peripheral zone were also provided
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This is a dataset with multiparametric prostate MRI applied in a test-retest setting, allowing to evaluate repeatability of the MRI-based measurements in the prostate. There is very limited data about the repeatability in mpMRI of the prostate, while such information is critical for establishing technical characteristics of mpMRI as imaging biomarker of prostate cancer.
Data was provided by the Brigham and Women's Hospital team. Data collection was supported by U01 CA151261 (PI Fiona Fennessy). Preparation of data for public sharing was supported by U24 CA180918 (http://qiicr.org) (MPI Andrey Fedorov and Ron Kikinis).
Type of cancer: Confirmed or suspected prostate cancer
Acquisition Protocol: Standard prostate mpMRI protocol implemented at Brigham and Women's Hospital was used in this study. For a given patient, we aimed to maintain similar protocol settings, and used the same scanner hardware and software configurations for both the baseline and repeat examinations, which were acquired within 2 weeks of time. All of the imaging studies were acquired at 3 Tesla magnet strength. Due to the scanner hardware upgrade in the middle of the study, 6 of the patients had baseline and repeat study performed on a GE Signa HDxt platform, software release 15.0_M4A_097.a, while the remaining 7 patients were scanned on a GE Discovery MR750w, software release DV24.0_R01_1344 (General Electric Healthcare, Milwaukee, WI). Transrectal coil within an air-filled balloon (Medrad Inc., Warrendale, PA) was used in all imaging studies. mpMRI protocol included T2-weighted, Diffusion Weighted (DW) (b-values of 0 and 1400 mm/s2) and Dynamic Contrast Enhanced (DCE) sequences. Detailed acquisition parameters are listed in Table 1 of [1]. DWI Apparent Diffusion Coefficient (ADC) and DCE subtract maps (further referred to as SUB; computed as the difference between the phase corresponding to the contrast bolus arrival and the baseline phase) were generated using the scanner software.
The imaging data is accompanied by the following types of derived data:
Both segmentations and segmentation-based measurements are stored as DICOM objects (DICOM Segmentation images and DICOM Structured Reports that follow DICOM SR TID 1500). For the details about data representation and tools available to convert and visualize the data see [2].
In the future we plan to augment this dataset with the parametric maps obtained using that analysis (in DICOM), and potentially (pending IRB clearance) clinical data (demographics, PSA), pathology sampling data (biopsy Gleason score) and results of PI-RADS interpretation.
References:
[1] Fedorov A, Vangel MG, Tempany CM, Fennessy FM. Multiparametric Magnetic Resonance Imaging of the Prostate: Repeatability of Volume and Apparent Diffusion Coefficient Quantification. Investigative Radiology. 52, 538–546 (2017). DOI: 10.1097/RLI.0000000000000382
[2] Fedorov, A., Schwier, M., Clunie, D., Herz, C., Pieper, S., Kikinis,R., Tempany, C. & Fennessy, F. An annotated test-retest collection of prostate multiparametric MRI. Scientific Data 5, 180281 (2018). DOI: 10.1038/sdata.2018.281
The mission of the QIN is to improve the role of quantitative imaging for clinical decision making in oncology by developing and validating data acquisition, analysis methods, and tools to tailor treatment for individual patients and predict or monitor the response to drug or radiation therapy. More information is available on the Quantitative Imaging Network Collections page. Interested investigators can apply to the QIN at: Quantitative Imaging for Evaluation of Responses to Cancer Therapies (U01) PAR-11-150.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Links to code and bioRxiv pre-print:
1. Multi-lens Neural Machine (MLNM) Code
2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)
Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.
Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.
Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.
This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:
Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.
|
#Slides |
|
|
|
|
Train |
Valid |
Test |
Total |
Prostatectomy |
17 |
8 |
15 |
40 |
Biopsy |
26 |
13 |
20 |
59 |
Total |
43 |
21 |
35 |
99 |
|
#Patches |
|
|
|
|
Train |
Valid |
Test |
Total |
Prostatectomy |
7795 |
3753 |
7224 |
18772 |
Biopsy |
5559 |
4028 |
5981 |
15568 |
Total |
13354 |
7781 |
13205 |
34340 |
Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.
|
#Slides (GS 3+3:3+4:4+3) |
|
|
|
|
Train |
Valid |
Test |
Total |
Biopsy |
10:9:1 |
3:7:0 |
6:10:0 |
19:26:1 |
|
#Patches (B:M) |
|
|
|
|
Train |
Valid |
Test |
Total |
Biopsy |
1557:2277 |
1216:1341 |
1543:2718 |
4316:6336 |
NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.