This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below. Collection description The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites. CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC. This entry corresponds to the CMB-BRCA collection (Invasive Breast Carcinoma). Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4]. Files included A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced. For each of the collections, the following manifest files are provided: -idc_v20-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets -idc_v20-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets -idc_v20-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids) Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP. Download instructions Each of the manifests include instructions in the header on how to download the included files. To download the files using .s5cmd manifests: install idc-index package: pip install --upgrade idc-index download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd To download the files using .dcf manifest, see manifest header. Acknowledgments Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l. References [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023). [2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at [3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018). [4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-PAAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
Pancreatic ductal adenocarcinoma is the most common form of pancreatic cancer, making up more than 80% of cases. The disease begins in the cells of the pancreas's ducts, which transport juices containing digestive enzymes into the small intestine.
Pancreatic cancer is the fourth most common cause of global cancer-related deaths and is almost always fatal. In 2012, it was estimated that around 44,000 new cases of pancreatic cancer were diagnosed and more than 37,000 deaths from this disease occurred in the United States alone, affecting both men and women.
Please see the TCGA-PAAD information page to learn more about the images and to obtain any supporting metadata for this collection.
Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced.
For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the
collection_id
collection introduced in IDC data
release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of
the corresponding collection was introduced.
tcga_paad-idc_v8-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketstcga_paad-idc_v8-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketstcga_paad-idc_v8-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference
files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.
Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Purpose: Expert selected landmark points on clinical image pairs provide a basis for rigid registration validation. Using combinatorial rigid registration optimization (CORRO) we provide a statistically characterized reference data set for image registration of the pelvis by estimating the optimal ground truth.
Methods: Landmark points for each CT/CBCT image pair for 58 pelvic cases were identified. From the identified landmark pairs, combination subsets of k-number of landmark pairs were generated without repeat, to form a k-set for k=4, 8, &12. An affine registration between the image pairs was calculated for each k-combination set (2,000-8,000,000). The mean and the standard deviation of the registration were used as the final registration for each image pair. Joint entropy was employed to measure and compare the quality of CORRO to commercially available software.
Results: An average of 154 (range: 91-212) landmark pairs were selected for each CT/CBCT image pair. The mean standard deviation of the registration output decreased as the k-size increased for all cases. In general the joint entropy evaluated was found to be lower than results from commercially available software. Of all 58 cases 58.3% of the k=4, 15% of k=8 and 18.3% of k=12 resulted in the better registration using CORRO as compared to 8.3% from a commercial registration software. The minimum joint entropy was determined for one case and found to exist at the estimated registration mean in agreement with the CORRO approach.
Conclusion: The results demonstrate that CORRO works even in the extreme case of the pelvic anatomy where the CBCT suffers from reduced quality due to increased noise levels. The estimated ground truth using CORRO was found to be better than commercially available software for all k-sets tested. Additionally, the k-set of 4 resulted in overall best outcomes when compared to k=8 and 12, which is anticipated because k=8 and 12 are more likely to have combinations that affected the accuracy of the registration.
Figure 1. Content of planning CT shifted to the machine isocenter. Top left to right - axial, coronal and sagittal planning CT image with machine isocenter in red and image isocenter in blue. Middle left to right - axial, coronal and sagittal planning CT at machine isocenter. Bottom left to right - axial, coronal and sagittal CBCT at machine isocenter.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
AbstractIn Italy, approximately 400.000 new cases of malignant tumors are recorded every year. The average of annual deaths caused by tumors, according to the Italian Cancer Registers, is about 3.5 deaths and about 2.5 per 1,000 men and women respectively, for a total of about 3 deaths every 1,000 people. Long-term (at least a decade) and spatially detailed data (up to the municipality scale) are neither easily accessible nor fully available for public consultation by the citizens, scientists, research groups, and associations. Therefore, here we present a ten-year (2009–2018) database on cancer mortality rates (in the form of Standardized Mortality Ratios, SMR) for 23 cancer macro-types in Italy on municipal, provincial, and regional scales. We aim to make easily accessible a comprehensive, ready-to-use, and openly accessible source of data on the most updated status of cancer mortality in Italy for local and national stakeholders, researchers, and policymakers and to provide researchers with ready-to-use data to perform specific studies. Methods For a given locality, year, and cause of death, the SMR is the ratio between the observed number of deaths (Om) and the number of expected deaths (Em): SMR = Om/Em (1) where Om should be an available observational data and Em is estimated as the weighted sum of age-specific population size for the given locality (ni) per age-specific death rates of the reference population (MRi): Em = sum(MRi x ni) (2) MRi could be provided by a public health organization or be estimated as the ratio between the age-specific number of deaths of reference population (Mi) to the age-specific reference population size (Ni): MRi = Mi/Ni (3) Thus, the value of Em is weighted by the age distribution of deaths and population size. SMR assumes value 1 when the number of observed and expected deaths are equal. Following eqns. (1-3), the SMR was computed for single years of the period 2009-2018 and for single cause of death as defined by the International ICD-10 classification system by using the following data: age-specific number of deaths by cause of reference population (i.e., Mi) from the Italian National Institute of Statistics (ISTAT, (http://www.istat.it/en/, last access: 26/01/2022)); age-specific census data on reference population (i.e., Ni) from ISTAT; the observed number of deaths by cause (i.e., Om) from ISTAT; the age-specific census data on population (ni); the SMR was estimated at three different level of aggregation: municipal, provincial (equivalent to the European classification NUTS 3) and regional (i.e., NUTS2). The SMR was also computed for the broad category of malignant tumors (i.e. C00-C979, hereinafter cancer macro-type C), and for the broad category of malignant tumor plus non-malignant tumors (i.e. C00-C979 plus D0-D489, hereinafter cancer macro-type CD). Lower 90% and 95% confidence intervals of 10-year average values were computed according to the Byar method.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset was used by the NCI's Quantitative Imaging Network (QIN) PET-CT Subgroup for their project titled: Multi-center Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Datasets. The purpose of this project was to assess the agreement among radiomic features when computed by several groups by using different software packages under very tightly controlled conditions, which included common image data sets and standardized feature definitions.
The image datasets (and Volumes of Interest – VOIs) provided here are the same ones used in that project and reported in the publication listed below (ISSN 2379-1381 https://doi.org/10.18383/j.tom.2019.00031). In addition, we have provided detailed information about the software packages used (Table 1 in that publication) as well as the individual feature value results for each image dataset and each software package that was used to create the summary tables (Tables 2, 3 and 4) in that publication.
For that project, nine common quantitative imaging features were selected for comparison including features that describe morphology, intensity, shape, and texture and that are described in detail in the International Biomarker Standardisation Initiative (IBSI, https://arxiv.org/abs/1612.07003 and publication (Zwanenburg A. Vallières M, et al, The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020 May;295(2):328-338. doi: https://doi.org/10.1148/radiol.2020191145).
There are three datasets provided – two image datasets and one dataset consisting of four excel spreadsheets containing feature values.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-THYM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
This cancer develops in the outer surface of the thymus, a gland behind the breastbone that produces T-cells, a type of white blood cells. Thymoma is rare, but it is the most common tumor in adults affecting the mediastinum, which is the cavity between the lungs containing the heart, esophagus, and trachea. A tumor of the thymus tends to grow slowly and rarely spreads to other parts of the body. However, of the estimated 400 Americans who develop this cancer each year, half are diagnosed with metastatic thymoma. When the cancer metastasizes, only 45% of patients survive five years after their diagnosis.
Please see the TCGA-THYM information page to learn more about the images and to obtain any supporting metadata for this collection.
Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced.
For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the
collection_id
collection introduced in IDC data
release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of
the corresponding collection was introduced.
tcga_thym-idc_v10-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketstcga_thym-idc_v10-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketstcga_thym-idc_v10-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference
files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-CHOL. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
Cholangiocarcinoma is a cancer that develops in the bile duct. The bile duct is a network of tubes that carry bile from the liver and gallbladder to the small intestine. Tumors that start in bile duct branches that lie inside the liver are called intrahepatic bile duct cancer, while those that form outside the liver are called extrahepatic bile duct cancer. About 10% of all cholangiocarcinoma are intrahepatic and 90% are extrahepatic. TCGA studied both subtypes of cholangiocarcinoma.
Although cholangiocarcinoma is a rare cancer, the incidence and mortality rates for the disease have been increasing worldwide in the last three decades. Between 2,000 and 3,000 Americans are diagnosed with cholangiocarcinoma each year, the majority of them with tumors at advanced stages. This cancer is more prevalent in Asia and the Middle East, where parasitic infection of the bile duct increases the risk of cholangiocarcinoma. Other diseases of the bile duct or liver, such as bile duct stones and liver disease, obesity, diabetes, and smoking are also risk factors. When intrahepatic and extrahepatic cholangiocarcinoma spread to other parts of the body, only 2% of patients survive five years after diagnosis.
Please see the TCGA-CHOL information page to learn more about the images and to obtain any supporting metadata for this collection.
Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced.
For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the
collection_id
collection introduced in IDC data
release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of
the corresponding collection was introduced.
tcga_chol-idc_v10-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketstcga_chol-idc_v10-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketstcga_chol-idc_v10-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference
files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This work introduces a novel Pt(II) based prodrug TTFA-Platin that integrates a β-diketonate ligand TTFA with a platinum scaffold to structurally resemble carboplatin and offers intermediate kinetic lability between cisplatin and carboplatin, striking a balance between therapeutic efficacy and safety. A comprehensive stability and speciation study was conducted in various biological media, mapping the therapeutic effects of TTFA-Platin. A control molecule, TMK-Platin, was synthesized to further validate the structural-stability relationship, which displayed poor activatable features in biological systems. In vitro studies against a panel of cancer cell lines revealed that TTFA-Platin exhibited significantly higher potency compared to TMK-Platin. In vivo studies revealed that TTFA-Platin exhibited significantly lower toxicity than the reference platinum compounds. Thus, leveraging ligands that fine-tune kinetic lability and offer therapeutic benefits can help develop more effective and safer cancer treatments, addressing the limitations of existing therapies.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: HTAN-VANDERBILT. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
The Human Tumor Atlas Network (HTAN) [2], part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types.
Colorectal cancer (CRC) is among the top three most prevalent cancers in global incidence and mortality. Most of these cancers develop from pre-cancerous adenomas. There is an unmet need to develop new preventive strategies and risk stratification models to decrease incidence, improve early detection, and prevent deaths from CRC.
We believe that the ability to provide the most effective precision diagnostics and preventive strategies can only be achieved with single-cell analysis. As such, we will map spatial relationships across the spectrum of normal colon, early polyps, and late adenomas, including their unique stromal and microbial microenvironments to identify unique molecular phenotypes.
Our goal will be accomplished through prospective, standardized collection and analysis of colorectal tissue, associated biospecimens, and related clinical and epidemiological data from participants undergoing colonoscopy or surgical resection. The biospecimens from these participants will be used for single-cell RNA sequencing, whole exome sequencing, multiplex immunofluorescence, species-specific bacterial fluorescence in situ hybridization, and other approaches. Finally, the information from these approaches will be integrated to develop a single-cell pre-cancer atlas with defined molecular phenotypes for dissemination to the broader scientific community.
Please see the HTAN-Vanderbilt information page to learn more about the images and to obtain any supporting metadata for this collection.
Citation guidelines can be found on the HTAN Publication Policy information page.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the collection_id
collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.
htan_vanderbilt-idc_v15-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketshtan_vanderbilt-idc_v15-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketshtan_vanderbilt-idc_v15-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Collection of the images that were converted by IDC was supported through the Human Tumor Atlas Network, grants 1U2CCA233291-01 "Integrative Single-Cell Atlas of Host and Microenvironment in Colorectal Neoplastic Transformation" and 1U24CA233243-01 "Human Tumor Atlas Network: Data Coordinating Center" from National Cancer Institute.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
[2] Rozenblatt-Rosen, O., Regev, A., Oberdoerffer, P., Nawy, T., Hupalowska, A., Rood, J. E., Ashenberg, O., Cerami, E., Coffey, R. J., Demir, E., Ding, L., Esplin, E. D., Ford, J. M., Goecks, J., Ghosh, S., Gray, J. W., Guinney, J., Hanlon, S. E., Hughes, S. K., Hwang, E. S., Iacobuzio-Donahue, C. A., Jané-Valbuena, J., Johnson, B. E., Lau, K. S., Lively, T., Mazzilli, S. A., Pe’er, D., Santagata, S., Shalek, A. K., Schapiro, D., Snyder, M. P., Sorger, P. K., Spira, A. E., Srivastava, S., Tan, K., West, R. B., Williams, E. H. & Human Tumor Atlas Network. The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution. Cell 181, 236–249 (2020). http://dx.doi.org/10.1016/j.cell.2020.03.053
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below. Collection description The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites. CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC. This entry corresponds to the CMB-BRCA collection (Invasive Breast Carcinoma). Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4]. Files included A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced. For each of the collections, the following manifest files are provided: -idc_v20-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets -idc_v20-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets -idc_v20-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids) Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP. Download instructions Each of the manifests include instructions in the header on how to download the included files. To download the files using .s5cmd manifests: install idc-index package: pip install --upgrade idc-index download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd To download the files using .dcf manifest, see manifest header. Acknowledgments Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l. References [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023). [2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at [3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018). [4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154