Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC.
1. CMB-AML (acute myeloid leukemia cancer)
2. CMB-CRC (colorectal cancer)
3. CMB-GEC (gastroesophageal cancer)
4. CMB-LCA (lung cancer)
5. CMB-MEL (melanoma)
6. CMB-MEL (multiple myeloma)
7. CMB-PCA (prostate cancer)
Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4].
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the collection_id
collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.
For each of the collections, the following manifest files are provided:
: manifest of files available for download from public IDC Amazon Web Services buckets
: manifest of files available for download from public IDC Google Cloud Storage buckets
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).
[2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>
[3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).
[4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The National CT Colonography Trial (ACRIN 6664) collection contains 825 cases of CT colonography imaging with accompanying spreadsheets that provide polyp descriptions and their location within the colon segments. Additional information about the trial is available in the Study Protocol and Case Report Forms.
Main Objective: To clinically validate widespread use of computerized tomographic colonography (CTC) in a screening population for the detection of colorectal neoplasia.
Participants: Male and female outpatients, aged 50 years or older, scheduled for screening colonoscopy, who have not had a colonoscopy in the past five years.
Study Design Summary: The study addresses aspects of central importance to the clinical application of CTC in several interrelated but independent parts that will be conducted in parallel. In Part I, the clinical performance of the CTC examination will be prospectively compared in a blinded fashion to colonoscopy. In Part II, optimization of the CT technique will be performed in view of new technological advances in CT technology. In Part III, lesion detection will be optimized by studying the morphologic features of critical lesion types and in the development of a database for computer-assisted diagnosis. In Part IV, patient preferences and cost-effectiveness implications of observed performance outcomes will be evaluated using a predictive model.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: HTAN-VANDERBILT. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
The Human Tumor Atlas Network (HTAN) [2], part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types.
Colorectal cancer (CRC) is among the top three most prevalent cancers in global incidence and mortality. Most of these cancers develop from pre-cancerous adenomas. There is an unmet need to develop new preventive strategies and risk stratification models to decrease incidence, improve early detection, and prevent deaths from CRC.
We believe that the ability to provide the most effective precision diagnostics and preventive strategies can only be achieved with single-cell analysis. As such, we will map spatial relationships across the spectrum of normal colon, early polyps, and late adenomas, including their unique stromal and microbial microenvironments to identify unique molecular phenotypes.
Our goal will be accomplished through prospective, standardized collection and analysis of colorectal tissue, associated biospecimens, and related clinical and epidemiological data from participants undergoing colonoscopy or surgical resection. The biospecimens from these participants will be used for single-cell RNA sequencing, whole exome sequencing, multiplex immunofluorescence, species-specific bacterial fluorescence in situ hybridization, and other approaches. Finally, the information from these approaches will be integrated to develop a single-cell pre-cancer atlas with defined molecular phenotypes for dissemination to the broader scientific community.
Please see the HTAN-Vanderbilt information page to learn more about the images and to obtain any supporting metadata for this collection.
Citation guidelines can be found on the HTAN Publication Policy information page.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the collection_id
collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.
htan_vanderbilt-idc_v15-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketshtan_vanderbilt-idc_v15-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketshtan_vanderbilt-idc_v15-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Collection of the images that were converted by IDC was supported through the Human Tumor Atlas Network, grants 1U2CCA233291-01 "Integrative Single-Cell Atlas of Host and Microenvironment in Colorectal Neoplastic Transformation" and 1U24CA233243-01 "Human Tumor Atlas Network: Data Coordinating Center" from National Cancer Institute.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
[2] Rozenblatt-Rosen, O., Regev, A., Oberdoerffer, P., Nawy, T., Hupalowska, A., Rood, J. E., Ashenberg, O., Cerami, E., Coffey, R. J., Demir, E., Ding, L., Esplin, E. D., Ford, J. M., Goecks, J., Ghosh, S., Gray, J. W., Guinney, J., Hanlon, S. E., Hughes, S. K., Hwang, E. S., Iacobuzio-Donahue, C. A., Jané-Valbuena, J., Johnson, B. E., Lau, K. S., Lively, T., Mazzilli, S. A., Pe’er, D., Santagata, S., Shalek, A. K., Schapiro, D., Snyder, M. P., Sorger, P. K., Spira, A. E., Srivastava, S., Tan, K., West, R. B., Williams, E. H. & Human Tumor Atlas Network. The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution. Cell 181, 236–249 (2020). http://dx.doi.org/10.1016/j.cell.2020.03.053
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
The Cancer Genome Atlas-Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several of the TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the COAD cases.
Please see the TCGA-COAD page to learn more about the images and to obtain any supporting metadata for this collection.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced.
For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the
collection_id
collection introduced in IDC data
release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of
the corresponding collection was introduced.
tcga_coad-idc_v18-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketstcga_coad-idc_v18-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketstcga_coad-idc_v18-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference
files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
computer-assisted telephone interview (CATI); mail questionnaireThe data available for download are not weighted and users will need to weight the data prior to analysis. Users who plan to do inferential statistical testing using the data should utilize a statistical program that can incorporate the replicate weights included in the dataset. Additional information about sampling, interviewing, sampling error, weighting, and the universe of each question may be found in the codebook.This data collection utilized a split frame where approximately half of the sample completed the survey by telephone through random digit dial (RDD) and half completed it through the mail as a paper and pencil questionnaire. Users can analyse the data with only the RDD respondents, only the mail respondents, or both, as indicated by the variable SAMPFLAG. For each type of analysis, users will need to supply the proper final weight to get population estimates and replicate weights to calculate the correct variance.Variable names containing more than 16 characters were truncated in order to be compatible with current statistical programs. Therefore, variable names may differ slightly from those in the original documentation.The formats of the weight and replicate weight variables were adjusted to fit the width of the values present in these variables, and the variables REGION and DIVISION were converted from character to numeric.To protect respondent confidentiality, open-ended responses containing information on respondent's occupation in variables HC03WHERESEE2_OS and HD05OCCUPATIO_OS were blanked.ICPSR created a unique sequential record identifier variable named CASEID. The Health Information National Trends Survey (HINTS) collects nationally representative data about the American public's access to and use of cancer-related information. The 2007 HINTS survey is the third in an ongoing biannual series and provides information on the changing patterns, needs, and behavior in seeking and supplying cancer information and explores how cancer risks are perceived. Respondents were asked about the ways in which they obtained health information, their use of health care services, their views about medical information and research, and their beliefs about cancer. A series of questions specifically addressed cervical cancer, colon cancer, and the Human Papillomavirus (HPV). Information was also collected on physical and mental health status, diet, physical activity, sun exposure, history of cancer, tobacco use, and whether respondents had health insurance. Demographic variables include sex, age, race, education level, employment status, marital status, household income, number of people living in the household, ownership of residence, and whether respondents were born in the United States. For the CATI data collection, the sample design was a list-assisted RDD sample and one adult in the household was sampled for the extended interview using an algorithm designed to minimize intrusiveness. The mail survey included a stratified sample selected from a list of addresses that oversampled for minorities. Sampled addresses were matched to a database of listed telephone numbers, with 50 percent of the cases successfully matched to a telephone number. Matches in which a telephone number was both appended to an address-sample address and included in the RDD sample were deleted from the address sample. Please refer to the codebook documentation for more information on sample design. Every sampled adult who completed a questionnaire in HINTS 2007 received three full-sample weights and three sets of replicate-sample weights. Two of the three types of weights correspond to the type of samples - the address-sample weight (MWGT0) and the RDD sample weight (RWGT0). The address-sample weight is missing for a case in the RDD sample and vice versa. The sample-specific weights are used to calculate estimates based on data from one of the two samples. The third type of weight is a composite weight (CWGT0) which is used to calculate estimates based on the data from both samples. Please refer to the codebook documentation for more information on weighting. Response Rates: The overall response rate for the RDD sample was 24.23 percent, while the overall response rate for the address-sample was 30.99 percent. Please refer to the codebook documentation for more information on response rates. The civilian, noninstitutionalized population of the United States aged 18 years and older. Datasets: DS1: Health Information National Trends Survey (HINTS), 2007
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Detection, segmentation and classification of nuclei are fundamental analysis operations in digital pathology. Existing state-of-the-art approaches demand extensive amounts of supervised training data from pathologists and may still perform poorly in images from unseen tissue types. We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type. Although our synthetic patches are not always of high quality, we harness the motley crew of generated samples through a generally applicable importance sampling method. This proposed approach, for the first time, re-weighs the training loss over synthetic data so that the ideal (unbiased) generalization loss over the true data distribution is minimized. This enables us to use a random polygon generator to synthesize approximate cellular structures (i.e., nuclear masks) for which no real examples are given in many tissue types, and hence, GAN-based methods are not suited. In addition, we propose a hybrid synthesis pipeline that utilizes textures in real histopathology patches and GAN models, to tackle heterogeneity in tissue textures. Compared with existing state-of-the-art supervised models, our approach generalizes significantly better on cancer types without training data. Even in cancer types with training data, our approach achieves the same performance without supervision cost. In this dataset we release code and nucleus segmentations in whole slide tissue images with quality control results for Whole Slide Images (WSI) in The Cancer Genome Atlas (TCGA) repository from 5,204 subjects (6,142 slide images). Within this total, there are two subsets of data: (1) automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types, with quality control results, and (2) manual nucleus segmentation data of 1,356 image patches from the same 10 cancer types plus additional 4 cancer types.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC.
1. CMB-AML (acute myeloid leukemia cancer)
2. CMB-CRC (colorectal cancer)
3. CMB-GEC (gastroesophageal cancer)
4. CMB-LCA (lung cancer)
5. CMB-MEL (melanoma)
6. CMB-MEL (multiple myeloma)
7. CMB-PCA (prostate cancer)
Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4].
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the collection_id
collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.
For each of the collections, the following manifest files are provided:
: manifest of files available for download from public IDC Amazon Web Services buckets
: manifest of files available for download from public IDC Google Cloud Storage buckets
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).
[2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>
[3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).
[4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154