6 datasets found
  1. Digital Pathology Dataset for Prostate Cancer Diagnosis

    • zenodo.org
    zip
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Umit Oner; Mustafa Umit Oner; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Malay Singh; Malay Singh; Weimiao Yu; Weimiao Yu; Wing-Kin Sung; Wing-Kin Sung; Chin Fong Wong; Hwee Kuan Lee; Hwee Kuan Lee; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Chin Fong Wong (2022). Digital Pathology Dataset for Prostate Cancer Diagnosis [Dataset]. http://doi.org/10.5281/zenodo.5971764
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mustafa Umit Oner; Mustafa Umit Oner; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Malay Singh; Malay Singh; Weimiao Yu; Weimiao Yu; Wing-Kin Sung; Wing-Kin Sung; Chin Fong Wong; Hwee Kuan Lee; Hwee Kuan Lee; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Chin Fong Wong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Links to code and bioRxiv pre-print:

    1. Multi-lens Neural Machine (MLNM) Code

    2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)

    Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.

    Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.

    Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.

    This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:

    1. gland_segmentation_dataset.zip
    2. gland_classification_dataset.zip

    Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.

    #Slides

    Train

    Valid

    Test

    Total

    Prostatectomy

    17

    8

    15

    40

    Biopsy

    26

    13

    20

    59

    Total

    43

    21

    35

    99

    #Patches

    Train

    Valid

    Test

    Total

    Prostatectomy

    7795

    3753

    7224

    18772

    Biopsy

    5559

    4028

    5981

    15568

    Total

    13354

    7781

    13205

    34340

    Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.

    #Slides (GS 3+3:3+4:4+3)

    Train

    Valid

    Test

    Total

    Biopsy

    10:9:1

    3:7:0

    6:10:0

    19:26:1

    #Patches (B:M)

    Train

    Valid

    Test

    Total

    Biopsy

    1557:2277

    1216:1341

    1543:2718

    4316:6336

    NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.

  2. PESO: Prostate Epithelium Segmentation on H&E-stained prostatectomy whole...

    • zenodo.org
    • wouterbulten.nl
    csv, zip
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wouter Bulten; Wouter Bulten; Péter Bándi; Jeffrey Hoven; Rob van de Loo; Johannes Lotz; Nick Weiss; Jeroen van der Laak; Bram van Ginneken; Christina Hulsbergen-van de Kaa; Geert Litjens; Geert Litjens; Péter Bándi; Jeffrey Hoven; Rob van de Loo; Johannes Lotz; Nick Weiss; Jeroen van der Laak; Bram van Ginneken; Christina Hulsbergen-van de Kaa (2021). PESO: Prostate Epithelium Segmentation on H&E-stained prostatectomy whole slide images [Dataset]. http://doi.org/10.5281/zenodo.1485967
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wouter Bulten; Wouter Bulten; Péter Bándi; Jeffrey Hoven; Rob van de Loo; Johannes Lotz; Nick Weiss; Jeroen van der Laak; Bram van Ginneken; Christina Hulsbergen-van de Kaa; Geert Litjens; Geert Litjens; Péter Bándi; Jeffrey Hoven; Rob van de Loo; Johannes Lotz; Nick Weiss; Jeroen van der Laak; Bram van Ginneken; Christina Hulsbergen-van de Kaa
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Large set of whole-slide-images (WSI) of prostatectomy specimens with various grades of prostate cancer (PCa). More information can be found in the corresponding paper: https://doi.org/10.1038/s41598-018-37257-4

    The WSIs in this dataset can be viewed using the open-source software ASAP or Open Slide.

    Due to the large size of the complete dataset, the data has been split up in to multiple archives.

    The data from the training set:

    • peso_training_masks.zip: Training masks (N=62) that have been used to train the main network of our paper. These masks are generated by a trained U-Net on the corresponding IHC slides.
    • peso_training_masks_corrected.zip: A subset of the color deconvolution masks (N=25) on which manual annotations have been made. Within these regions, stain and other artifacts have been removed.
    • peso_training_colordeconvolution.zip: Mask files (N=62) containing the P63&CK8/18 channel of the color deconvolution operation. These masks mark all regions that are stained by either P63 or CK8/18 in the IHC version of the slides.
    • peso_training_wsi_{1-6}.zip: Zip files containing the whole slide images of the training set (N=62). Each archive contains 10 slides, excluding the last which contains 12. These images are exported at a pixel resolution of 0.48mu/pixels.

    The data from the test set:

    • peso_testset_regions.zip: Collection of annotation XML files with outlines of the test regions. These can be used to view the test regions in more detail using ASAP.
    • peso_testset_png.zip: Export of the test set regions in PNG format (2500x2500 pixels per region).
    • peso_testset_png_padded.zip: Export of the test regions in PNG format padded with a 500 pixel wide border (3500x3500 pixels per region). Useful for segmenting pixels at the border of the regions.
    • peso_testset_mapping.csv: A csv file mapping files from the test set (numbered 1-160) to regions in the xml files. The csv file also contains the label (benign or cancer) for each region.
    • peso_testset_wsi_{1-4}.zip: Zip files containing the whole slide images of the test set (N=40). Each archive contains 10 slides of the test set. These images are exported at a pixel resolution of 0.48mu/pixels.

    This study was financed by a grant from the Dutch Cancer Society (KWF), grant number KUN 2015-7970.

    If you make use of this dataset please cite both the dataset itself and the corresponding paper: https://doi.org/10.1038/s41598-018-37257-4

  3. c

    3D pathology of prostate biopsies with biochemical recurrence outcomes: raw...

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    csv +2
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2023). 3D pathology of prostate biopsies with biochemical recurrence outcomes: raw H&E-analog datasets and image translation-assisted segmentation in 3D (ITAS3D) datasets [Dataset]. http://doi.org/10.7937/44MA-GX21
    Explore at:
    csv, n/a, hdf5, xml, and tiffAvailable download formats
    Dataset updated
    Mar 7, 2023
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Mar 7, 2023
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This collection provides public access to a 3D pathology dataset of prostate cancer, allowing researchers to further investigate various 3D tissue structures and their correlation with prostate cancer patient outcomes (biochemical recurrence). These 3D tissue structures are revealed through: (1) a H&E-analog stain, (2) synthetically generated immunofluorescence staining of CK8 (targeting the luminal epithelial cells of all prostate glands), and (3) 3D segmentation masks of the gland lumen, epithelium, and stromal regions of prostate biopsies. This data collection will promote research in the field of computational 3D pathology for clinical decision support.

    In this TCIA collection, we provide the 2x down-sampled fused OTLS-imaged images (H&E-analog staining), the synthetic cytokeratin-8 (CK8) immunofluorescent images at 2x-downsampled resolution, the 3D semantic segmentation masks of glands at 4x down-sampled resolution, the clinical data for patient outcomes (biochemical recurrence), and the coordinates for the cancer-enriched regions of each biopsy. All datasets are from the 50 patient cases studied in this publication: [W. Xie et al., Cancer Research, 2022]. The Python code for the deep-learning models, and for 3D glandular segmentations based on synthetic-CK8 datasets, are available on GitHub at https://github.com/WeisiX/ITAS3D.

    Note that the 3D pathology datasets provided in this collection were generated in Dr. Jonathan Liu’s lab at the University of Washington with a custom open-top light-sheet (OTLS) microscope developed by the lab [A.K. Glaser et al., Nature Communications, 2019]. There is no clinical metadata within the imaging files and all patients are referred to with coded identifiers. All of the clinical outcomes data provided in this collection have already been published within the supplement of [W. Xie et al., Cancer Research, 2022].

  4. Z

    DICOM converted annotations for the Prostate-MRI-US-Biopsy collection

    • data.niaid.nih.gov
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciausu, Cosmin (2023). DICOM converted annotations for the Prostate-MRI-US-Biopsy collection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10069910
    Explore at:
    Dataset updated
    Nov 3, 2023
    Dataset provided by
    Clunie, David
    Fedorov, Andrey
    Ciausu, Cosmin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contributes DICOM-converted annotations to the publicly available National Cancer Institute Imaging Data Commons [1] Prostate-MRI-US-Biopsy collection (https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=Community&collection_id=prostate_mri_us_biopsy). Prostate-MRI-US-Biopsy collection was initially released by The Cancer Imaging Archive (TCIA) [2,3,4]. While the images in this collection are stored in the standard DICOM format, the collection is also accompanied by 1017 semi-automatic segmentations of the prostate and 1317 manual segmentations of target lesions in the STL format. Although STL is a common and practical format for 3D printing, it is not interoperable with many visualization and analysis tools commonly used in medical imaging research and does not provide any standard means to communicate metadata, among other limitations. This dataset contains segmentations of the prostate and target lesions harmonized into DICOM representation. Specifically, we created DICOM Encapsulated 3D Manufacturing Model objects (M3D modality) that includes the original STL content enriched with the DICOM metadata. Furthermore, we created an alternative encoding of the surface segmentations by rasterizing them and saving the result as a DICOM Segmentation object (SEG modality). As a result, the contributed DICOM objects can be stored in any DICOM server that supports those objects (including Google Healthcare DICOM stores), and the DICOM Segmentations can be visualized using off-the-shelf tools, such as OHIF Viewer. Conversion from STL to DICOM M3D modality was performed using PixelMed toolkit (https://www.pixelmed.com/dicomtoolkit.html). Conversion from STL to DICOM SEG was done in 2 steps. We used Slicer (https://www.slicer.org/) to rasterize the surface segmentation to the matrix of the segmented image, which were next converted to DICOM SEGs using dcmqi (https://github.com/QIICR/dcmqi) [5]. Resulting objects were validated using dicom3tools dciodvfy (https://www.dclunie.com/dicom3tools.html). Details describing the conversion process as well as the details on how to access the encapsulated STL content from the DICOM m3D files are provided in this GitHub repository: https://github.com/ImagingDataCommons/prostate_mri_us_biopsy_dcm_conversion. Specific files included in the record are:

    Prostate-MRI-US-Biopsy-DICOM-Annotations.zip: DICOM M3D and SEG files, organized into the folder hierarchy following this pattern: Prostate-MRI-US-Biopsy/%PatientID/%StudyInstanceUID/%SeriesNumber-%Modality-%SeriesDescription.dcm referenced_images_sorted-idc_file_manifest.s5cmd: IDC manifest for downloading the T2W MRI images corresponding to the annotations. To download the files in this manifest, first install s5cmd (https://github.com/peak/s5cmd), and run the following command: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run referenced_images_sorted-idc_file_manifest.s5cmd. Files will be organized in the Prostate-MRI-US-Biopsy/%PatientID/%StudyInstanceUID/ folder hierarchy upon download. References [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). doi: 10.1158/0008-5472.CAN-21-0950. [2] Natarajan, S., Priester, A., Margolis, D., Huang, J., & Marks, L. (2020). Prostate MRI and Ultrasound With Pathology and Coordinates of Tracked Biopsy (Prostate-MRI-US-Biopsy) (version 2) [Data set]. The Cancer Imaging Archive. DOI: 10.7937/TCIA.2020.A61IOC1A [3] Sonn GA, Natarajan S, Margolis DJ, MacAiran M, Lieu P, Huang J, Dorey FJ, Marks LS. Targeted biopsy in the detection of prostate cancer using an office based magnetic resonance ultrasound fusion device. Journal of Urology 189, no. 1 (2013): 86-91. DOI: 10.1016/j.juro.2012.08.095 [4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7 [5] Herz, C., Fillion-Robin, J.-C., Onken, M., Riesmeier, J., Lasso, A., Pinter, C., Fichtinger, G., Pieper, S., Clunie, D., Kikinis, R. & Fedorov, A. dcmqi: An Open Source Library for Standardized Communication of Quantitative Image Analysis Results Using DICOM. Cancer Res. 77, e87–e90 (2017). DOI: 10.1158/0008-5472.CAN-17-0336.

  5. c

    PROSTATE-DIAGNOSIS

    • cancerimagingarchive.net
    dicom, mha and zip +3
    Updated Aug 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). PROSTATE-DIAGNOSIS [Dataset]. http://doi.org/10.7937/K9/TCIA.2015.FOQEUJVT
    Explore at:
    nrrd and zip, dicom, n/a, xls, mha and zipAvailable download formats
    Dataset updated
    Aug 9, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Aug 9, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Prostate cancer T1- and T2-weighted magnetic resonance images (MRIs) were acquired on a 1.5 T Philips Achieva by combined surface and endorectal coil, including dynamic contrast-enhanced images obtained prior to, during and after I.V. administration of 0.1 mmol/kg body weight of Gadolinium-DTPA (pentetic acid). Corresponding clinical metadata (XLS format) and 3D segmentation files (NRRD format) are offered as a supplement to this image collection. The XLS file contains pathology biopsy and excised gland tissue reports and the MRI radiology report for most subjects.

    The Multi-component NRRD Segmentations allow visualization and downstream analysis in 3D Slicer of the following prostate components: prostate gland boundary; internal capsule; central gland, peripheral zone; seminal vesicles; urethra; cancer – dominant nodule; neurovascular bundle; penile bulb; ejaculatory duct; veru-montanum; and rectum. See our tutorial on Using 3D Slicer with the Prostate-Diagnosis data if you are not familiar with using this kind of data.

    The Seminal vesicles (SV) and neurovascular bundle (NVB) Segmentations delineate the neurovascular bundle and seminal vessicles as MHA files. These were provided as part of a planned challenge competition that did not materialize.

    The Third Party Analysis dataset mentioned beneath the Data Access table was added later as part of the NCI-ISBI 2013 Challenge - Automated Segmentation of Prostate Structures. It includes segmentations for 30 Prostate-Diagnosis subjects in NRRD format which mark the boundaries of the central gland and peripheral zone were also provided

  6. c

    QIN-PROSTATE-Repeatability

    • stage.cancerimagingarchive.net
    • cancerimagingarchive.net
    dicom, n/a
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). QIN-PROSTATE-Repeatability [Dataset]. http://doi.org/10.7937/K9/TCIA.2018.MR1CKGND
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    Nov 9, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Nov 9, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This is a dataset with multiparametric prostate MRI applied in a test-retest setting, allowing to evaluate repeatability of the MRI-based measurements in the prostate. There is very limited data about the repeatability in mpMRI of the prostate, while such information is critical for establishing technical characteristics of mpMRI as imaging biomarker of prostate cancer.

    Data was provided by the Brigham and Women's Hospital team. Data collection was supported by U01 CA151261 (PI Fiona Fennessy). Preparation of data for public sharing was supported by U24 CA180918 (http://qiicr.org) (MPI Andrey Fedorov and Ron Kikinis).

    Type of cancer: Confirmed or suspected prostate cancer

    Acquisition Protocol: Standard prostate mpMRI protocol implemented at Brigham and Women's Hospital was used in this study. For a given patient, we aimed to maintain similar protocol settings, and used the same scanner hardware and software configurations for both the baseline and repeat examinations, which were acquired within 2 weeks of time. All of the imaging studies were acquired at 3 Tesla magnet strength. Due to the scanner hardware upgrade in the middle of the study, 6 of the patients had baseline and repeat study performed on a GE Signa HDxt platform, software release 15.0_M4A_097.a, while the remaining 7 patients were scanned on a GE Discovery MR750w, software release DV24.0_R01_1344 (General Electric Healthcare, Milwaukee, WI). Transrectal coil within an air-filled balloon (Medrad Inc., Warrendale, PA) was used in all imaging studies. mpMRI protocol included T2-weighted, Diffusion Weighted (DW) (b-values of 0 and 1400 mm/s2) and Dynamic Contrast Enhanced (DCE) sequences. Detailed acquisition parameters are listed in Table 1 of [1]. DWI Apparent Diffusion Coefficient (ADC) and DCE subtract maps (further referred to as SUB; computed as the difference between the phase corresponding to the contrast bolus arrival and the baseline phase) were generated using the scanner software.

    The imaging data is accompanied by the following types of derived data:

    • manual segmentations of the total prostate gland, peripheral zone of the prostate gland, suspected tumor and normal regions (where applicable). Segmentations were done by a radiologist with the expertise in prostate MRI
    • volume measurements (for axial T2w images and ADC images) and mean ADC (for ADC images) corresponding to the segmented regions.

    Both segmentations and segmentation-based measurements are stored as DICOM objects (DICOM Segmentation images and DICOM Structured Reports that follow DICOM SR TID 1500). For the details about data representation and tools available to convert and visualize the data see [2].

    In the future we plan to augment this dataset with the parametric maps obtained using that analysis (in DICOM), and potentially (pending IRB clearance) clinical data (demographics, PSA), pathology sampling data (biopsy Gleason score) and results of PI-RADS interpretation.

    References:

    [1] Fedorov A, Vangel MG, Tempany CM, Fennessy FM. Multiparametric Magnetic Resonance Imaging of the Prostate: Repeatability of Volume and Apparent Diffusion Coefficient Quantification. Investigative Radiology. 52, 538–546 (2017). DOI: 10.1097/RLI.0000000000000382

    [2] Fedorov, A., Schwier, M., Clunie, D., Herz, C., Pieper, S., Kikinis,R., Tempany, C. & Fennessy, F. An annotated test-retest collection of prostate multiparametric MRI. Scientific Data 5, 180281 (2018). DOI: 10.1038/sdata.2018.281

    About the NCI QIN

    The mission of the QIN is to improve the role of quantitative imaging for clinical decision making in oncology by developing and validating data acquisition, analysis methods, and tools to tailor treatment for individual patients and predict or monitor the response to drug or radiation therapy. More information is available on the Quantitative Imaging Network Collections page. Interested investigators can apply to the QIN at: Quantitative Imaging for Evaluation of Responses to Cancer Therapies (U01) PAR-11-150.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mustafa Umit Oner; Mustafa Umit Oner; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Malay Singh; Malay Singh; Weimiao Yu; Weimiao Yu; Wing-Kin Sung; Wing-Kin Sung; Chin Fong Wong; Hwee Kuan Lee; Hwee Kuan Lee; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Chin Fong Wong (2022). Digital Pathology Dataset for Prostate Cancer Diagnosis [Dataset]. http://doi.org/10.5281/zenodo.5971764
Organization logo

Digital Pathology Dataset for Prostate Cancer Diagnosis

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Dec 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mustafa Umit Oner; Mustafa Umit Oner; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Malay Singh; Malay Singh; Weimiao Yu; Weimiao Yu; Wing-Kin Sung; Wing-Kin Sung; Chin Fong Wong; Hwee Kuan Lee; Hwee Kuan Lee; Mei Ying Ng; Danilo Medina Giron; Cecilia Ee Chen Xi; Louis Ang Yuan Xiang; Chin Fong Wong
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Links to code and bioRxiv pre-print:

1. Multi-lens Neural Machine (MLNM) Code

2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)

Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.

Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.

Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.

This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:

  1. gland_segmentation_dataset.zip
  2. gland_classification_dataset.zip

Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.

#Slides

Train

Valid

Test

Total

Prostatectomy

17

8

15

40

Biopsy

26

13

20

59

Total

43

21

35

99

#Patches

Train

Valid

Test

Total

Prostatectomy

7795

3753

7224

18772

Biopsy

5559

4028

5981

15568

Total

13354

7781

13205

34340

Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.

#Slides (GS 3+3:3+4:4+3)

Train

Valid

Test

Total

Biopsy

10:9:1

3:7:0

6:10:0

19:26:1

#Patches (B:M)

Train

Valid

Test

Total

Biopsy

1557:2277

1216:1341

1543:2718

4316:6336

NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.

Search
Clear search
Close search
Google apps
Main menu