6 datasets found
  1. DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Erika Kim; Granger Sutton (2024). DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative collections [Dataset]. http://doi.org/10.5281/zenodo.11099112
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Erika Kim; Granger Sutton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.

    CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC.

    1. CMB-AML (acute myeloid leukemia cancer)
    2. CMB-CRC (colorectal cancer)
    3. CMB-GEC (gastroesophageal cancer)
    4. CMB-LCA (lung cancer)
    5. CMB-MEL (melanoma)
    6. CMB-MEL (multiple myeloma)
    7. CMB-PCA (prostate cancer)

    Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4].

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    For each of the collections, the following manifest files are provided:

    1. : manifest of files available for download from public IDC Amazon Web Services buckets
    2. : manifest of files available for download from public IDC Google Cloud Storage buckets
    3. : Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

    [2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>

    [3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).

    [4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154

  2. c

    ACRIN 6664

    • cancerimagingarchive.net
    • dev.cancerimagingarchive.net
    dicom, n/a, xls
    Updated Oct 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2015). ACRIN 6664 [Dataset]. http://doi.org/10.7937/K9/TCIA.2015.NWTESAY1
    Explore at:
    n/a, xls, dicomAvailable download formats
    Dataset updated
    Oct 15, 2015
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Nov 15, 2013
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The National CT Colonography Trial (ACRIN 6664) collection contains 825 cases of CT colonography imaging with accompanying spreadsheets that provide polyp descriptions and their location within the colon segments. Additional information about the trial is available in the Study Protocol and Case Report Forms.

    Main Objective: To clinically validate widespread use of computerized tomographic colonography (CTC) in a screening population for the detection of colorectal neoplasia.

    Participants: Male and female outpatients, aged 50 years or older, scheduled for screening colonoscopy, who have not had a colonoscopy in the past five years.

    Study Design Summary: The study addresses aspects of central importance to the clinical application of CTC in several interrelated but independent parts that will be conducted in parallel. In Part I, the clinical performance of the CTC examination will be prospectively compared in a blinded fashion to colonoscopy. In Part II, optimization of the CT technique will be performed in view of new technological advances in CT technology. In Part III, lesion detection will be optimized by studying the morphologic features of critical lesion types and in the development of a database for computer-assisted diagnosis. In Part IV, patient preferences and cost-effectiveness implications of observed performance outcomes will be evaluated using a predictive model.

  3. DICOM converted Slide Microscopy images for the HTAN-VANDERBILT collection

    • zenodo.org
    bin
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the HTAN-VANDERBILT collection [Dataset]. http://doi.org/10.5281/zenodo.12690007
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: HTAN-VANDERBILT. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Human Tumor Atlas Network (HTAN) [2], part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types.

    Colorectal cancer (CRC) is among the top three most prevalent cancers in global incidence and mortality. Most of these cancers develop from pre-cancerous adenomas. There is an unmet need to develop new preventive strategies and risk stratification models to decrease incidence, improve early detection, and prevent deaths from CRC.

    We believe that the ability to provide the most effective precision diagnostics and preventive strategies can only be achieved with single-cell analysis. As such, we will map spatial relationships across the spectrum of normal colon, early polyps, and late adenomas, including their unique stromal and microbial microenvironments to identify unique molecular phenotypes.

    Our goal will be accomplished through prospective, standardized collection and analysis of colorectal tissue, associated biospecimens, and related clinical and epidemiological data from participants undergoing colonoscopy or surgical resection. The biospecimens from these participants will be used for single-cell RNA sequencing, whole exome sequencing, multiplex immunofluorescence, species-specific bacterial fluorescence in situ hybridization, and other approaches. Finally, the information from these approaches will be integrated to develop a single-cell pre-cancer atlas with defined molecular phenotypes for dissemination to the broader scientific community.

    Please see the HTAN-Vanderbilt information page to learn more about the images and to obtain any supporting metadata for this collection.

    Citation guidelines can be found on the HTAN Publication Policy information page.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. htan_vanderbilt-idc_v15-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. htan_vanderbilt-idc_v15-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. htan_vanderbilt-idc_v15-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Collection of the images that were converted by IDC was supported through the Human Tumor Atlas Network, grants 1U2CCA233291-01 "Integrative Single-Cell Atlas of Host and Microenvironment in Colorectal Neoplastic Transformation" and 1U24CA233243-01 "Human Tumor Atlas Network: Data Coordinating Center" from National Cancer Institute.

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

    [2] Rozenblatt-Rosen, O., Regev, A., Oberdoerffer, P., Nawy, T., Hupalowska, A., Rood, J. E., Ashenberg, O., Cerami, E., Coffey, R. J., Demir, E., Ding, L., Esplin, E. D., Ford, J. M., Goecks, J., Ghosh, S., Gray, J. W., Guinney, J., Hanlon, S. E., Hughes, S. K., Hwang, E. S., Iacobuzio-Donahue, C. A., Jané-Valbuena, J., Johnson, B. E., Lau, K. S., Lively, T., Mazzilli, S. A., Pe’er, D., Santagata, S., Shalek, A. K., Schapiro, D., Snyder, M. P., Sorger, P. K., Spira, A. E., Srivastava, S., Tan, K., West, R. B., Williams, E. H. & Human Tumor Atlas Network. The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution. Cell 181, 236–249 (2020). http://dx.doi.org/10.1016/j.cell.2020.03.053

  4. DICOM converted Slide Microscopy images for the TCGA-COAD collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.13346249
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Genome Atlas-Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several of the TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the COAD cases.

    Please see the TCGA-COAD page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_coad-idc_v18-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_coad-idc_v18-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_coad-idc_v18-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  5. o

    Data from: Health Information National Trends Survey (HINTS), 2007

    • explore.openaire.eu
    • icpsr.umich.edu
    Updated Jun 23, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bradford Hesse; Richard Moser (2009). Health Information National Trends Survey (HINTS), 2007 [Dataset]. http://doi.org/10.3886/icpsr25262.v1
    Explore at:
    Dataset updated
    Jun 23, 2009
    Authors
    Bradford Hesse; Richard Moser
    Description

    computer-assisted telephone interview (CATI); mail questionnaireThe data available for download are not weighted and users will need to weight the data prior to analysis. Users who plan to do inferential statistical testing using the data should utilize a statistical program that can incorporate the replicate weights included in the dataset. Additional information about sampling, interviewing, sampling error, weighting, and the universe of each question may be found in the codebook.This data collection utilized a split frame where approximately half of the sample completed the survey by telephone through random digit dial (RDD) and half completed it through the mail as a paper and pencil questionnaire. Users can analyse the data with only the RDD respondents, only the mail respondents, or both, as indicated by the variable SAMPFLAG. For each type of analysis, users will need to supply the proper final weight to get population estimates and replicate weights to calculate the correct variance.Variable names containing more than 16 characters were truncated in order to be compatible with current statistical programs. Therefore, variable names may differ slightly from those in the original documentation.The formats of the weight and replicate weight variables were adjusted to fit the width of the values present in these variables, and the variables REGION and DIVISION were converted from character to numeric.To protect respondent confidentiality, open-ended responses containing information on respondent's occupation in variables HC03WHERESEE2_OS and HD05OCCUPATIO_OS were blanked.ICPSR created a unique sequential record identifier variable named CASEID. The Health Information National Trends Survey (HINTS) collects nationally representative data about the American public's access to and use of cancer-related information. The 2007 HINTS survey is the third in an ongoing biannual series and provides information on the changing patterns, needs, and behavior in seeking and supplying cancer information and explores how cancer risks are perceived. Respondents were asked about the ways in which they obtained health information, their use of health care services, their views about medical information and research, and their beliefs about cancer. A series of questions specifically addressed cervical cancer, colon cancer, and the Human Papillomavirus (HPV). Information was also collected on physical and mental health status, diet, physical activity, sun exposure, history of cancer, tobacco use, and whether respondents had health insurance. Demographic variables include sex, age, race, education level, employment status, marital status, household income, number of people living in the household, ownership of residence, and whether respondents were born in the United States. For the CATI data collection, the sample design was a list-assisted RDD sample and one adult in the household was sampled for the extended interview using an algorithm designed to minimize intrusiveness. The mail survey included a stratified sample selected from a list of addresses that oversampled for minorities. Sampled addresses were matched to a database of listed telephone numbers, with 50 percent of the cases successfully matched to a telephone number. Matches in which a telephone number was both appended to an address-sample address and included in the RDD sample were deleted from the address sample. Please refer to the codebook documentation for more information on sample design. Every sampled adult who completed a questionnaire in HINTS 2007 received three full-sample weights and three sets of replicate-sample weights. Two of the three types of weights correspond to the type of samples - the address-sample weight (MWGT0) and the RDD sample weight (RWGT0). The address-sample weight is missing for a case in the RDD sample and vice versa. The sample-specific weights are used to calculate estimates based on data from one of the two samples. The third type of weight is a composite weight (CWGT0) which is used to calculate estimates based on the data from both samples. Please refer to the codebook documentation for more information on weighting. Response Rates: The overall response rate for the RDD sample was 24.23 percent, while the overall response rate for the address-sample was 30.99 percent. Please refer to the codebook documentation for more information on response rates. The civilian, noninstitutionalized population of the United States aged 18 years and older. Datasets: DS1: Health Information National Trends Survey (HINTS), 2007

  6. c

    Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology...

    • cancerimagingarchive.net
    • dev.cancerimagingarchive.net
    docx, n/a, svs, txt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images [Dataset]. http://doi.org/10.7937/TCIA.2019.4A4DKP9U
    Explore at:
    txt, docx, n/a, svsAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Feb 8, 2020
    Dataset funded by
    National Cancer Institute
    Description

    Detection, segmentation and classification of nuclei are fundamental analysis operations in digital pathology. Existing state-of-the-art approaches demand extensive amounts of supervised training data from pathologists and may still perform poorly in images from unseen tissue types. We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type. Although our synthetic patches are not always of high quality, we harness the motley crew of generated samples through a generally applicable importance sampling method. This proposed approach, for the first time, re-weighs the training loss over synthetic data so that the ideal (unbiased) generalization loss over the true data distribution is minimized. This enables us to use a random polygon generator to synthesize approximate cellular structures (i.e., nuclear masks) for which no real examples are given in many tissue types, and hence, GAN-based methods are not suited. In addition, we propose a hybrid synthesis pipeline that utilizes textures in real histopathology patches and GAN models, to tackle heterogeneity in tissue textures. Compared with existing state-of-the-art supervised models, our approach generalizes significantly better on cancer types without training data. Even in cancer types with training data, our approach achieves the same performance without supervision cost. In this dataset we release code and nucleus segmentations in whole slide tissue images with quality control results for Whole Slide Images (WSI) in The Cancer Genome Atlas (TCGA) repository from 5,204 subjects (6,142 slide images). Within this total, there are two subsets of data: (1) automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types, with quality control results, and (2) manual nucleus segmentation data of 1,356 image patches from the same 10 cancer types plus additional 4 cancer types.

    5,060 Whole Slide Images (WSIs) are from the following 10 cancer types:

    BLCA Bladder urothelial carcinoma BRCA Breast invasive carcinoma CESC Cervical squamous cell carcinoma and endocervical adenocarcinoma GBM Glioblastoma Multiforme LUAD Lung adenocarcinoma LUSC Lung squamous cell carcinoma PAAD Pancreatic adenocarcinoma PRAD Prostate adenocarcinoma SKCM Skin Cutaneous Melanoma UCEC Uterine Corpus Endometrial Carcinoma Note that you can also download segmentation data of following 4 cancer types, although they are not officially verified. COAD Colon adenocarcinoma READ Rectal adenocarcinoma STAD Stomach adenocarcinoma UVM Uveal Melanoma

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Erika Kim; Granger Sutton (2024). DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative collections [Dataset]. http://doi.org/10.5281/zenodo.11099112
Organization logo

DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative collections

Related Article
Explore at:
binAvailable download formats
Dataset updated
Sep 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Ulrike Wagner; Erika Kim; Granger Sutton; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Erika Kim; Granger Sutton
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.

CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC.

1. CMB-AML (acute myeloid leukemia cancer)
2. CMB-CRC (colorectal cancer)
3. CMB-GEC (gastroesophageal cancer)
4. CMB-LCA (lung cancer)
5. CMB-MEL (melanoma)
6. CMB-MEL (multiple myeloma)
7. CMB-PCA (prostate cancer)

Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4].

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

For each of the collections, the following manifest files are provided:

  1. : manifest of files available for download from public IDC Amazon Web Services buckets
  2. : manifest of files available for download from public IDC Google Cloud Storage buckets
  3. : Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

  1. install idc-index package: pip install --upgrade idc-index
  2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

[2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>

[3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).

[4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154

Search
Clear search
Close search
Google apps
Main menu