52 datasets found

r
Genomic Data Commons Data Portal (GDC Data Portal)
rrid.site
scicrunch.org
+2more
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014514
Dataset updated
May 24, 2025
Description
A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.
n
Data from: NCI Imaging Data Commons
neuinfo.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). NCI Imaging Data Commons [Dataset]. http://identifiers.org/RRID:SCR_019127
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_019127
Dataset updated
Jan 29, 2022
Description
Portal for finding and analyzing cancer imaging data. Part of Cancer Research Data Commons to support cancer imaging research. Provides cloud based access to medical imaging data and library of analytical tools and workflows to share, analyze, and visualize multi modal imaging data from both clinical and basic cancer research studies.
Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...
catalog.data.gov
healthdata.gov
+2more
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (NCI), National Institutes of Health (NIH) (2023). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
Description
SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
c
The Cancer Genome Atlas Stomach Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
CDC WONDER: Cancer Statistics
data.virginia.gov
healthdata.gov
+4more
html
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention, Department of Health & Human Services (2025). CDC WONDER: Cancer Statistics [Dataset]. https://data.virginia.gov/dataset/cdc-wonder-cancer-statistics
Explore at:
htmlAvailable download formats
Dataset updated
Feb 21, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
The United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).
DICOM converted Slide Microscopy images for the TCGA-TGCT collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-TGCT collection [Dataset]. http://doi.org/10.5281/zenodo.13346196
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13346196
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-TGCT. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

More than 90% of testicular cancer start in the germ cells, which are cells in the testicles and develop into sperm. This type of cancer is known as testicular germ cell cancer. Testicular germ cell cancer can be classified as either seminomas or nonseminomas, which may be identified by microscopy. Nonseminomas typically grow and spread more quickly than seminomas. A testicular germ cell tumor that contains a mix of both these subtypes is classified as a nonseminoma. TCGA studied both seminomas and nonseminomas.

Testicular germ cell cancer is rare, comprising 1-2% of all tumors in males. However, it is the most common cancer in men ages 15 to 35. The incidence of testicular germ cell cancer has been continuously rising in many countries, including Europe and the U.S. In 2013, about 8,000 American men were estimated to be diagnosed with the cancer. Of those, 370 are predicted to die from the disease. Men who are Caucasian, have an undescended testicle, abnormally developed testicles, or a family history of testicular cancer have a greater risk of developing testicular cancer. Fortunately, testicular germ cell cancer is highly treatable.

Please see the TCGA-TGCT information page to learn more about the images and to obtain any supporting metadata for this collection.

Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_tgct-idc_v10-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_tgct-idc_v10-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_tgct-idc_v10-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
National Cancer Register
healthinformationportal.eu
www-acc.healthinformationportal.eu
html
Updated Jul 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute of Health Information and Statistics of the Czech Republic (2022). National Cancer Register [Dataset]. https://www.healthinformationportal.eu/health-information-sources/national-cancer-register
Explore at:
htmlAvailable download formats
Dataset updated
Jul 28, 2022
Dataset authored and provided by
Institute of Health Information and Statistics of the Czech Republic
Variables measured
sex, title, topics, acronym, country, language, data_owners, description, geo_coverage, contact_email, and 11 more
Measurement technique
Registry data
Description
The purpose of the National Oncology Register (hereinafter referred to as NOR) is the registration of oncological diseases and periodic monitoring of their further development, i.e. data collection, verification, storage, protection and processing. NOR provides summary data for statistical overviews at both national and international levels, as well as for epidemiological studies and health research. NOR is a nationwide population register that follows on from the monitoring of neoplasms in the population of the Czech Republic introduced in the 1950s, and as a population register of records of individual neoplasms, the ÚZIS of the Czech Republic has been operating since 1976.

NOR data are also used to support early diagnosis and treatment of neoplasms and pre-cancerous conditions, to monitor trends in their occurrence, causative factors and social consequences. At the population level, the results of the treatment of neoplasms are also evaluated in the form of a survival analysis.
V
Chemical Carcinogenesis Research Information System (CCRIS)
data.virginia.gov
datadiscovery.nlm.nih.gov
+3more
html
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Chemical Carcinogenesis Research Information System (CCRIS) [Dataset]. https://data.virginia.gov/dataset/chemical-carcinogenesis-research-information-system-ccris
Explore at:
htmlAvailable download formats
Dataset updated
May 16, 2025
Dataset provided by
National Library of Medicine
Description
The Chemical Carcinogenesis Research Information System (CCRIS) database contains chemical records with carcinogenicity, mutagenicity, tumor promotion, and tumor inhibition test results. It was developed by the National Cancer Institute (NCI). Data are derived from studies cited in primary journals, current awareness tools, NCI reports, and other sources. Test results have been reviewed by experts in carcinogenesis and mutagenesis. CCRIS provides historical information from the years 1985 - 2011. It is no longer updated.
V
Blog | Stimulating Data-driven Innovation in Breast Cancer Research
data.virginia.gov
Updated Jun 18, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandeep Patel (2015). Blog | Stimulating Data-driven Innovation in Breast Cancer Research [Dataset]. https://data.virginia.gov/dataset/blog-stimulating-data-driven-innovation-in-breast-cancer-research
Explore at:
Dataset updated
Jun 18, 2015
Dataset provided by
Sandeep Patel
Description
This blog post was posted by Sandeep Patel on June 18, 2015
c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Lithuanian Cancer registry data
healthinformationportal.eu
www-acc.healthinformationportal.eu
html
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute, Lithuania (2022). Lithuanian Cancer registry data [Dataset]. https://www.healthinformationportal.eu/health-information-sources/lithuanian-cancer-registry-data
Explore at:
htmlAvailable download formats
Dataset updated
Sep 7, 2022
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
Authors
National Cancer Institute, Lithuania
Area covered
Lithuania
Variables measured
sex, title, topics, country, language, data_owners, description, geo_coverage, contact_email, free_keywords, and 7 more
Measurement technique
Registry data
Description
National Cancer Institute’s Cancer registry is a nationwide and population-based cancer registry, which covers all territory of Lithuania and it collects information about all new cancer cases (ICD-10-AM codes: C00-C96, D00-D09, D32-D33, D39.1, D42-D43, D45-D47) of all cancer patients.

The main task of the Cancer Registry is to guarantee as complete and reliable registration of incident cancer cases as possible.

In 1984 the Lithuanian Cancer Registry was established at the National Cancer Institute by the Order of the Minister of Health. The population-based Cancer Registry was set up in 1990.
CPIC California Cancer Registry
redivis.com
application/jsonl +6
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2016). CPIC California Cancer Registry [Dataset]. http://doi.org/10.57761/sq5d-1c97
Explore at:
csv, avro, arrow, spss, sas, stata, application/jsonlAvailable download formats
Unique identifier
https://doi.org/10.57761/sq5d-1c97
Dataset updated
Sep 19, 2016
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Area covered
California
Description
Abstract

The Greater Bay Area Cancer Registry (GBACR), in compliance with California state law, gathers information about all cancers diagnosed or treated in a nine-county area (Alameda, Contra Costa, Marin, Monterey, San Benito, San Francisco, San Mateo, Santa...

Documentation

PHS does NOT host these data. This listing is information only.

The Greater Bay Area Cancer Registry (GBACR), in compliance with California state law, gathers information about all cancers diagnosed or treated in a nine-county area (Alameda, Contra Costa, Marin, Monterey, San Benito, San Francisco, San Mateo, Santa Clara and Santa Cruz). This information is obtained from medical records provided by hospitals, doctors\342\200\231 offices, and other related facilities.

The information, stored under secure conditions with strict regulations that protect confidentiality, helps the GBACR understand cancer occurrence and survival in the Greater Bay Area. For each patient, the information includes basic demographic facts like age, gender, and race/ethnicity, as well as cancer type, extent of disease, treatment and survival. Combined over the diverse Bay Area population, this information gives the GBACR and all users an opportunity to learn how such characteristics may be related to cancer causes, mortality, care and prevention.

In addition to its local use, information collected by the GBACR becomes part of state and federal population-based registries whose mission is to monitor cancer occurrence at the state and national levels, respectively. Data from the GBACR have contributed to the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) program since 1973. The nine counties are also part of the statewide California Cancer Registry (CCR), which conducts essential monitoring of cancer occurrence and survival in California.

GBACR data are of the highest quality, as recognized by national and international registry standard-setting organizations, including SEER, the National Program for Cancer Registries, and the North American Association for Central Cancer Registries (NAACCR).

The CPIC has also started collecting data on environmenal factors. These data are available in the The California Neighborhoods Data System. This a new resource for examining the impact of neighborhood characteristics on cancer incidence and outcomes in populations includes a compilation of existing geospatial and other secondary data for characterizing contextual factors

A summary and description of social and built environment data and measures in the California Neighborhoods Data System (2010) can be found here: Social and Built Environment Data and Measures

More information about this new data source can be found here: The California Neighborhoods Data System

Patient characteristics All reported cancer cases in the state of California.

Data overview Data categories Socioeconomic status Racial/ethnic composition Immigration/acculturation characteristics Racial/ethnic residential segregation Population density Urbanicity (Rural/Urban) Housing Businesses Commuting Street connectivity Parks Farmers Markets Traffic density Crime Tapestry Segmentation

Notes To apply for these data, you can see instructions here: https://www.ccrcal.org/retrieve-data/data-for-researchers/how-to-request-ccr-data/
c
The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
c
The Cancer Genome Atlas Ovarian Cancer Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.
DICOM converted Slide Microscopy images for the HTAN-OHSU collection
zenodo.org
bin
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the HTAN-OHSU collection [Dataset]. http://doi.org/10.5281/zenodo.12689951
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12689951
Dataset updated
Aug 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: HTAN-OHSU. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Human Tumor Atlas Network (HTAN) [2], part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types.

The overall goal of the HTAN OMS Atlas Center is to elucidate mechanisms by which metastatic breast cancers become resistant to current generation pathway- and immune checkpoint-targeted treatments. The OMS Atlas is motivated by the appreciation that these treatments are often effective in primary tumors but only transiently effective in the metastatic setting. Possible resistance mechanisms include tumor-intrinsic genomic instability and epigenomic plasticity, as well as events extrinsic to the cancer cells, including chemical and mechanical signals from the microenvironments, production of mechanical extracellular matrix barriers and/or changes in vasculature that reduce drug and/or immune cell access, nanoscale cancer cell-microenvironment interactions that reduce drug efficacy, and a plethora of immune resistance mechanisms, such as loss of HLA expression and antigen presentation, and immune exhaustion. These mechanisms likely vary between patients and within individual patients and change with time as tumors respond to therapeutic attack. The OMS Atlas will focus on elucidating resistance mechanisms in two specific current generation clinical trial scenarios: (a) hormone receptor-positive breast cancer (HRBC) undergoing treatment with a CDK4/6 inhibitor in combination with endocrine therapy and (b) triple negative breast cancer (TNBC) undergoing treatment with a PARP inhibitor and an immunomodulatory agent.

Please see the HTAN-OHSU information page to learn more about the images and to obtain any supporting metadata for this collection.

Citation guidelines can be found on the HTAN Publication Policy information page.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

htan_ohsu-idc_v10-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

htan_ohsu-idc_v10-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

htan_ohsu-idc_v10-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Collection of the images that were converted by IDC was supported through the Human Tumor Atlas Network, grants 1U2CCA233280-01 "Omic and Multidimensional Spatial Atlas of Metastatic Breast and Prostate Cancers" and 1U24CA233243-01 "Human Tumor Atlas Network: Data Coordinating Center" from National Cancer Institute.

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

[2] Rozenblatt-Rosen, O., Regev, A., Oberdoerffer, P., Nawy, T., Hupalowska, A., Rood, J. E., Ashenberg, O., Cerami, E., Coffey, R. J., Demir, E., Ding, L., Esplin, E. D., Ford, J. M., Goecks, J., Ghosh, S., Gray, J. W., Guinney, J., Hanlon, S. E., Hughes, S. K., Hwang, E. S., Iacobuzio-Donahue, C. A., Jané-Valbuena, J., Johnson, B. E., Lau, K. S., Lively, T., Mazzilli, S. A., Pe’er, D., Santagata, S., Shalek, A. K., Schapiro, D., Snyder, M. P., Sorger, P. K., Spira, A. E., Srivastava, S., Tan, K., West, R. B., Williams, E. H. & Human Tumor Atlas Network. The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution. Cell 181, 236–249 (2020). http://dx.doi.org/10.1016/j.cell.2020.03.053
State Cancer Profiles Web site
data.virginia.gov
healthdata.gov
+3more
html
Updated Jul 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health & Human Services (2023). State Cancer Profiles Web site [Dataset]. https://data.virginia.gov/dataset/state-cancer-profiles-web-site
Explore at:
htmlAvailable download formats
Dataset updated
Jul 25, 2023
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
The State Cancer Profiles (SCP) web site provides statistics to help guide and prioritize cancer control activities at the state and local levels. SCP is a collaborative effort using local and national level cancer data from the Centers for Disease Control and Prevention's National Program of Cancer Registries (NPCR) and National Cancer Institute's Surveillance, Epidemiology and End Results Registries (SEER). SCP address select types of cancer and select behavioral risk factors for which there are evidence-based control interventions. The site provides incidence, mortality and prevalence comparison tables as well as interactive graphs and maps and support data. The graphs and maps provide visual support for deciding where to focus cancer control efforts.
c
The Cancer Genome Atlas Prostate Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Prostate Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
c
TCGA Breast Phenotype Research Group Data sets
cancerimagingarchive.net
stage.cancerimagingarchive.net
n/a, xls, zip
Updated Sep 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2018). TCGA Breast Phenotype Research Group Data sets [Dataset]. http://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G
Explore at:
xls, n/a, zipAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G
Dataset updated
Sep 4, 2018
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Sep 4, 2018
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
At the time of our study, 108 cases with breast MRI data were available in the The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) collection. In order to minimize variations in image quality across the multi-institutional cases we included only breast MRI studies acquired on GE 1.5 Tesla magnet strength scanners (GE Medical Systems, Milwaukee, Wisconsin, USA) scanners, yielding a total of 93 cases. We then excluded cases that had missing images in the dynamic sequence (1 patient), or at the time did not have gene expression analysis available in the TCGA Data Portal (8 patients). After these criteria, a dataset of 84 breast cancer patients resulted, with MRIs from four institutions: Memorial Sloan Kettering Cancer Center, the Mayo Clinic, the University of Pittsburgh Medical Center, and the Roswell Park Cancer Institute. The resulting cases contributed by each institution were 9 (date range 1999-2002), 5 (1999-2003), 46 (1999-2004), and 24 (1999-2002), respectively. The dataset of biopsy proven invasive breast cancers included 74 (88%) ductal, 8 (10%) lobular, and 2 (2%) mixed. Of these, 73 (87%) were ER+, 67 (80%) were PR+, and 19 (23%) were HER2+. Various types of analyses were conducted using the combined imaging, genomic, and clinical data. Those analyses are described within several manuscripts created by the group (cited below). Additional information about the methodology for how the Radiologist Annotations file can be found on the TCGA Breast Image Feature Scoring Project page.
H
Supplementary Materials for A Linked Data Representation for Summary...
dataverse.harvard.edu
Updated Aug 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James McCusker (2019). Supplementary Materials for A Linked Data Representation for Summary Statistics and Grouping Criteria [Dataset]. http://doi.org/10.7910/DVN/OK0BUG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/OK0BUG
Dataset updated
Aug 28, 2019
Dataset provided by
Harvard Dataverse
Authors
James McCusker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium's provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute.
CMB-OV: DICOM converted Slide Microscopy images for the Cancer Moonshot...
zenodo.org
bin
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie (2024). CMB-OV: DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative Ovarian Cancer collection [Dataset]. http://doi.org/10.5281/zenodo.13993797
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13993797
Dataset updated
Nov 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.

CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC. This entry corresponds to the CMB-OV collection (Ovarian cancer).

Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4].

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

For each of the collections, the following manifest files are provided:

: manifest of files available for download from public IDC Amazon Web Services buckets

: manifest of files available for download from public IDC Google Cloud Storage buckets

: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

[2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>

[3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).

[4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal

Explore at:

71 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_014514

Dataset updated

May 24, 2025

Description

A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

Clear search

Close search

Google apps

Main menu

Genomic Data Commons Data Portal (GDC Data Portal)

Data from: NCI Imaging Data Commons

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...

The Cancer Genome Atlas Stomach Adenocarcinoma Collection

CIP TCGA Radiology Initiative

CDC WONDER: Cancer Statistics

DICOM converted Slide Microscopy images for the TCGA-TGCT collection

Collection description

Files included

Download instructions

Acknowledgments

References

National Cancer Register

Chemical Carcinogenesis Research Information System (CCRIS)

Blog | Stimulating Data-driven Innovation in Breast Cancer Research

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Lithuanian Cancer registry data

CPIC California Cancer Registry

Abstract

Documentation

The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection

CIP TCGA Radiology Initiative

The Cancer Genome Atlas Ovarian Cancer Collection

CIP TCGA Radiology Initiative

DICOM converted Slide Microscopy images for the HTAN-OHSU collection

Collection description

Files included

Download instructions

Acknowledgments

References

State Cancer Profiles Web site

The Cancer Genome Atlas Prostate Adenocarcinoma Collection

CIP TCGA Radiology Initiative

TCGA Breast Phenotype Research Group Data sets

Supplementary Materials for A Linked Data Representation for Summary...

CMB-OV: DICOM converted Slide Microscopy images for the Cancer Moonshot...

Collection description

Files included

Download instructions

Acknowledgments

References

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal