52 datasets found
  1. r

    Genomic Data Commons Data Portal (GDC Data Portal)

    • rrid.site
    • scicrunch.org
    • +2more
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514
    Explore at:
    Dataset updated
    May 24, 2025
    Description

    A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

  2. n

    Data from: NCI Imaging Data Commons

    • neuinfo.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). NCI Imaging Data Commons [Dataset]. http://identifiers.org/RRID:SCR_019127
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Portal for finding and analyzing cancer imaging data. Part of Cancer Research Data Commons to support cancer imaging research. Provides cloud based access to medical imaging data and library of analytical tools and workflows to share, analyze, and visualize multi modal imaging data from both clinical and basic cancer research studies.

  3. Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Cancer Institute (NCI), National Institutes of Health (NIH) (2023). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.

  4. c

    The Cancer Genome Atlas Stomach Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Feb 2, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2014). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Feb 2, 2014
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  5. CDC WONDER: Cancer Statistics

    • data.virginia.gov
    • healthdata.gov
    • +4more
    html
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention, Department of Health & Human Services (2025). CDC WONDER: Cancer Statistics [Dataset]. https://data.virginia.gov/dataset/cdc-wonder-cancer-statistics
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Feb 21, 2025
    Description

    The United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).

  6. DICOM converted Slide Microscopy images for the TCGA-TGCT collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-TGCT collection [Dataset]. http://doi.org/10.5281/zenodo.13346196
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-TGCT. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    More than 90% of testicular cancer start in the germ cells, which are cells in the testicles and develop into sperm. This type of cancer is known as testicular germ cell cancer. Testicular germ cell cancer can be classified as either seminomas or nonseminomas, which may be identified by microscopy. Nonseminomas typically grow and spread more quickly than seminomas. A testicular germ cell tumor that contains a mix of both these subtypes is classified as a nonseminoma. TCGA studied both seminomas and nonseminomas.

    Testicular germ cell cancer is rare, comprising 1-2% of all tumors in males. However, it is the most common cancer in men ages 15 to 35. The incidence of testicular germ cell cancer has been continuously rising in many countries, including Europe and the U.S. In 2013, about 8,000 American men were estimated to be diagnosed with the cancer. Of those, 370 are predicted to die from the disease. Men who are Caucasian, have an undescended testicle, abnormally developed testicles, or a family history of testicular cancer have a greater risk of developing testicular cancer. Fortunately, testicular germ cell cancer is highly treatable.

    Please see the TCGA-TGCT information page to learn more about the images and to obtain any supporting metadata for this collection.

    Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_tgct-idc_v10-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_tgct-idc_v10-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_tgct-idc_v10-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  7. National Cancer Register

    • healthinformationportal.eu
    • www-acc.healthinformationportal.eu
    html
    Updated Jul 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institute of Health Information and Statistics of the Czech Republic (2022). National Cancer Register [Dataset]. https://www.healthinformationportal.eu/health-information-sources/national-cancer-register
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jul 28, 2022
    Dataset authored and provided by
    Institute of Health Information and Statistics of the Czech Republic
    Variables measured
    sex, title, topics, acronym, country, language, data_owners, description, geo_coverage, contact_email, and 11 more
    Measurement technique
    Registry data
    Description

    The purpose of the National Oncology Register (hereinafter referred to as NOR) is the registration of oncological diseases and periodic monitoring of their further development, i.e. data collection, verification, storage, protection and processing. NOR provides summary data for statistical overviews at both national and international levels, as well as for epidemiological studies and health research. NOR is a nationwide population register that follows on from the monitoring of neoplasms in the population of the Czech Republic introduced in the 1950s, and as a population register of records of individual neoplasms, the ÚZIS of the Czech Republic has been operating since 1976.

    NOR data are also used to support early diagnosis and treatment of neoplasms and pre-cancerous conditions, to monitor trends in their occurrence, causative factors and social consequences. At the population level, the results of the treatment of neoplasms are also evaluated in the form of a survival analysis.

  8. V

    Chemical Carcinogenesis Research Information System (CCRIS)

    • data.virginia.gov
    • datadiscovery.nlm.nih.gov
    • +3more
    html
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). Chemical Carcinogenesis Research Information System (CCRIS) [Dataset]. https://data.virginia.gov/dataset/chemical-carcinogenesis-research-information-system-ccris
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 16, 2025
    Dataset provided by
    National Library of Medicine
    Description

    The Chemical Carcinogenesis Research Information System (CCRIS) database contains chemical records with carcinogenicity, mutagenicity, tumor promotion, and tumor inhibition test results. It was developed by the National Cancer Institute (NCI). Data are derived from studies cited in primary journals, current awareness tools, NCI reports, and other sources. Test results have been reviewed by experts in carcinogenesis and mutagenesis. CCRIS provides historical information from the years 1985 - 2011. It is no longer updated.

  9. V

    Blog | Stimulating Data-driven Innovation in Breast Cancer Research

    • data.virginia.gov
    Updated Jun 18, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandeep Patel (2015). Blog | Stimulating Data-driven Innovation in Breast Cancer Research [Dataset]. https://data.virginia.gov/dataset/blog-stimulating-data-driven-innovation-in-breast-cancer-research
    Explore at:
    Dataset updated
    Jun 18, 2015
    Dataset provided by
    Sandeep Patel
    Description

    This blog post was posted by Sandeep Patel on June 18, 2015

  10. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  11. Lithuanian Cancer registry data

    • healthinformationportal.eu
    • www-acc.healthinformationportal.eu
    html
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Cancer Institute, Lithuania (2022). Lithuanian Cancer registry data [Dataset]. https://www.healthinformationportal.eu/health-information-sources/lithuanian-cancer-registry-data
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 7, 2022
    Dataset provided by
    National Cancer Institutehttp://www.cancer.gov/
    Authors
    National Cancer Institute, Lithuania
    Area covered
    Lithuania
    Variables measured
    sex, title, topics, country, language, data_owners, description, geo_coverage, contact_email, free_keywords, and 7 more
    Measurement technique
    Registry data
    Description

    National Cancer Institute’s Cancer registry is a nationwide and population-based cancer registry, which covers all territory of Lithuania and it collects information about all new cancer cases (ICD-10-AM codes: C00-C96, D00-D09, D32-D33, D39.1, D42-D43, D45-D47) of all cancer patients.

    The main task of the Cancer Registry is to guarantee as complete and reliable registration of incident cancer cases as possible.

    In 1984 the Lithuanian Cancer Registry was established at the National Cancer Institute by the Order of the Minister of Health. The population-based Cancer Registry was set up in 1990.

  12. CPIC California Cancer Registry

    • redivis.com
    application/jsonl +6
    Updated Sep 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2016). CPIC California Cancer Registry [Dataset]. http://doi.org/10.57761/sq5d-1c97
    Explore at:
    csv, avro, arrow, spss, sas, stata, application/jsonlAvailable download formats
    Dataset updated
    Sep 19, 2016
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Area covered
    California
    Description

    Abstract

    The Greater Bay Area Cancer Registry (GBACR), in compliance with California state law, gathers information about all cancers diagnosed or treated in a nine-county area (Alameda, Contra Costa, Marin, Monterey, San Benito, San Francisco, San Mateo, Santa...

    Documentation

    PHS does NOT host these data. This listing is information only.

    The Greater Bay Area Cancer Registry (GBACR), in compliance with California state law, gathers information about all cancers diagnosed or treated in a nine-county area (Alameda, Contra Costa, Marin, Monterey, San Benito, San Francisco, San Mateo, Santa Clara and Santa Cruz). This information is obtained from medical records provided by hospitals, doctors\342\200\231 offices, and other related facilities.

    The information, stored under secure conditions with strict regulations that protect confidentiality, helps the GBACR understand cancer occurrence and survival in the Greater Bay Area. For each patient, the information includes basic demographic facts like age, gender, and race/ethnicity, as well as cancer type, extent of disease, treatment and survival. Combined over the diverse Bay Area population, this information gives the GBACR and all users an opportunity to learn how such characteristics may be related to cancer causes, mortality, care and prevention.

    In addition to its local use, information collected by the GBACR becomes part of state and federal population-based registries whose mission is to monitor cancer occurrence at the state and national levels, respectively. Data from the GBACR have contributed to the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) program since 1973. The nine counties are also part of the statewide California Cancer Registry (CCR), which conducts essential monitoring of cancer occurrence and survival in California.

    GBACR data are of the highest quality, as recognized by national and international registry standard-setting organizations, including SEER, the National Program for Cancer Registries, and the North American Association for Central Cancer Registries (NAACCR).

    The CPIC has also started collecting data on environmenal factors. These data are available in the The California Neighborhoods Data System. This a new resource for examining the impact of neighborhood characteristics on cancer incidence and outcomes in populations includes a compilation of existing geospatial and other secondary data for characterizing contextual factors

    A summary and description of social and built environment data and measures in the California Neighborhoods Data System (2010) can be found here: Social and Built Environment Data and Measures

    More information about this new data source can be found here: The California Neighborhoods Data System

    Patient characteristics All reported cancer cases in the state of California.

    Data overview Data categories Socioeconomic status Racial/ethnic composition Immigration/acculturation characteristics Racial/ethnic residential segregation Population density Urbanicity (Rural/Urban) Housing Businesses Commuting Street connectivity Parks Farmers Markets Traffic density Crime Tapestry Segmentation

    Notes To apply for these data, you can see instructions here: https://www.ccrcal.org/retrieve-data/data-for-researchers/how-to-request-ccr-data/

  13. c

    The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  14. c

    The Cancer Genome Atlas Ovarian Cancer Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.

  15. DICOM converted Slide Microscopy images for the HTAN-OHSU collection

    • zenodo.org
    bin
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the HTAN-OHSU collection [Dataset]. http://doi.org/10.5281/zenodo.12689951
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: HTAN-OHSU. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Human Tumor Atlas Network (HTAN) [2], part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types.

    The overall goal of the HTAN OMS Atlas Center is to elucidate mechanisms by which metastatic breast cancers become resistant to current generation pathway- and immune checkpoint-targeted treatments. The OMS Atlas is motivated by the appreciation that these treatments are often effective in primary tumors but only transiently effective in the metastatic setting. Possible resistance mechanisms include tumor-intrinsic genomic instability and epigenomic plasticity, as well as events extrinsic to the cancer cells, including chemical and mechanical signals from the microenvironments, production of mechanical extracellular matrix barriers and/or changes in vasculature that reduce drug and/or immune cell access, nanoscale cancer cell-microenvironment interactions that reduce drug efficacy, and a plethora of immune resistance mechanisms, such as loss of HLA expression and antigen presentation, and immune exhaustion. These mechanisms likely vary between patients and within individual patients and change with time as tumors respond to therapeutic attack. The OMS Atlas will focus on elucidating resistance mechanisms in two specific current generation clinical trial scenarios: (a) hormone receptor-positive breast cancer (HRBC) undergoing treatment with a CDK4/6 inhibitor in combination with endocrine therapy and (b) triple negative breast cancer (TNBC) undergoing treatment with a PARP inhibitor and an immunomodulatory agent.

    Please see the HTAN-OHSU information page to learn more about the images and to obtain any supporting metadata for this collection.

    Citation guidelines can be found on the HTAN Publication Policy information page.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. htan_ohsu-idc_v10-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. htan_ohsu-idc_v10-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. htan_ohsu-idc_v10-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Collection of the images that were converted by IDC was supported through the Human Tumor Atlas Network, grants 1U2CCA233280-01 "Omic and Multidimensional Spatial Atlas of Metastatic Breast and Prostate Cancers" and 1U24CA233243-01 "Human Tumor Atlas Network: Data Coordinating Center" from National Cancer Institute.

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

    [2] Rozenblatt-Rosen, O., Regev, A., Oberdoerffer, P., Nawy, T., Hupalowska, A., Rood, J. E., Ashenberg, O., Cerami, E., Coffey, R. J., Demir, E., Ding, L., Esplin, E. D., Ford, J. M., Goecks, J., Ghosh, S., Gray, J. W., Guinney, J., Hanlon, S. E., Hughes, S. K., Hwang, E. S., Iacobuzio-Donahue, C. A., Jané-Valbuena, J., Johnson, B. E., Lau, K. S., Lively, T., Mazzilli, S. A., Pe’er, D., Santagata, S., Shalek, A. K., Schapiro, D., Snyder, M. P., Sorger, P. K., Spira, A. E., Srivastava, S., Tan, K., West, R. B., Williams, E. H. & Human Tumor Atlas Network. The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution. Cell 181, 236–249 (2020). http://dx.doi.org/10.1016/j.cell.2020.03.053

  16. State Cancer Profiles Web site

    • data.virginia.gov
    • healthdata.gov
    • +3more
    html
    Updated Jul 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health & Human Services (2023). State Cancer Profiles Web site [Dataset]. https://data.virginia.gov/dataset/state-cancer-profiles-web-site
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Description

    The State Cancer Profiles (SCP) web site provides statistics to help guide and prioritize cancer control activities at the state and local levels. SCP is a collaborative effort using local and national level cancer data from the Centers for Disease Control and Prevention's National Program of Cancer Registries (NPCR) and National Cancer Institute's Surveillance, Epidemiology and End Results Registries (SEER). SCP address select types of cancer and select behavioral risk factors for which there are evidence-based control interventions. The site provides incidence, mortality and prevalence comparison tables as well as interactive graphs and maps and support data. The graphs and maps provide visual support for deciding where to focus cancer control efforts.

  17. c

    The Cancer Genome Atlas Prostate Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Feb 2, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2014). The Cancer Genome Atlas Prostate Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Feb 2, 2014
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  18. c

    TCGA Breast Phenotype Research Group Data sets

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    n/a, xls, zip
    Updated Sep 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2018). TCGA Breast Phenotype Research Group Data sets [Dataset]. http://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G
    Explore at:
    xls, n/a, zipAvailable download formats
    Dataset updated
    Sep 4, 2018
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 4, 2018
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    At the time of our study, 108 cases with breast MRI data were available in the The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) collection. In order to minimize variations in image quality across the multi-institutional cases we included only breast MRI studies acquired on GE 1.5 Tesla magnet strength scanners (GE Medical Systems, Milwaukee, Wisconsin, USA) scanners, yielding a total of 93 cases. We then excluded cases that had missing images in the dynamic sequence (1 patient), or at the time did not have gene expression analysis available in the TCGA Data Portal (8 patients). After these criteria, a dataset of 84 breast cancer patients resulted, with MRIs from four institutions: Memorial Sloan Kettering Cancer Center, the Mayo Clinic, the University of Pittsburgh Medical Center, and the Roswell Park Cancer Institute. The resulting cases contributed by each institution were 9 (date range 1999-2002), 5 (1999-2003), 46 (1999-2004), and 24 (1999-2002), respectively. The dataset of biopsy proven invasive breast cancers included 74 (88%) ductal, 8 (10%) lobular, and 2 (2%) mixed. Of these, 73 (87%) were ER+, 67 (80%) were PR+, and 19 (23%) were HER2+. Various types of analyses were conducted using the combined imaging, genomic, and clinical data. Those analyses are described within several manuscripts created by the group (cited below). Additional information about the methodology for how the Radiologist Annotations file can be found on the TCGA Breast Image Feature Scoring Project page.

  19. H

    Supplementary Materials for A Linked Data Representation for Summary...

    • dataverse.harvard.edu
    Updated Aug 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James McCusker (2019). Supplementary Materials for A Linked Data Representation for Summary Statistics and Grouping Criteria [Dataset]. http://doi.org/10.7910/DVN/OK0BUG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    James McCusker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium's provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute.

  20. CMB-OV: DICOM converted Slide Microscopy images for the Cancer Moonshot...

    • zenodo.org
    bin
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie (2024). CMB-OV: DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative Ovarian Cancer collection [Dataset]. http://doi.org/10.5281/zenodo.13993797
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.

    CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC. This entry corresponds to the CMB-OV collection (Ovarian cancer).

    Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4].

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    For each of the collections, the following manifest files are provided:

    1. : manifest of files available for download from public IDC Amazon Web Services buckets
    2. : manifest of files available for download from public IDC Google Cloud Storage buckets
    3. : Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

    [2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>

    [3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).

    [4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal

Explore at:
71 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 24, 2025
Description

A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

Search
Clear search
Close search
Google apps
Main menu