100+ datasets found
  1. Historical NCI Genomic Data Commons data (09-14-2017)

    • zenodo.org
    • data-staging.niaid.nih.gov
    tsv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inge Seim; Inge Seim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

    TCGA-COAD.GDC_phenotype.tsv

    dataset: phenotype - Phenotype

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
    samples570
    version11-27-2017
    hubhttps://gdc.xenahubs.net
    type of dataphenotype
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
    raw datahttps://api.gdc.cancer.gov/data/
    input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
    570 samples X 151 identifiersAll IdentifiersAll Samples

    TCGA-COAD.htseq_fpkm-uq.tsv

    dataset: gene expression RNAseq - HTSeq - FPKM-UQ

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
    samples512
    version09-14-2017
    hubhttps://gdc.xenahubs.net
    type of datagene expression RNAseq
    unitlog2(fpkm-uq+1)
    platformIllumina
    ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
    raw datahttps://api.gdc.cancer.gov/data/
    wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
    input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
    60,484 identifiers X 512 samples

  2. M

    Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data

    • datacatalog.mskcc.org
    Updated Nov 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Genome Atlas (TCGA) (2019). Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data [Dataset]. https://datacatalog.mskcc.org/dataset/10411
    Explore at:
    Dataset updated
    Nov 20, 2019
    Dataset provided by
    The Cancer Genome Atlas (TCGA)
    MSK Library
    Description

    This dataset contains summary data visualizations and clinical data from a broad sampling of 594 colorectal adenocarcinomas from 594 patients. The data was gathered as part of the PanCancer Atlas initiative, which aims to answer big, overarching questions about cancer by examining the full set of tumors characterized in the robust TCGA dataset. The clinical data includes mutation count, information about mutated genes, patient demographics, disease status, tumor typing, and chromosomal gain or loss. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.

  3. Z

    Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and...

    • data.niaid.nih.gov
    Updated Nov 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin (2021). Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5717484
    Explore at:
    Dataset updated
    Nov 23, 2021
    Dataset provided by
    Department of Gastroenterology and Hepatology, Sichuan University-University of Oxford Huaxi Joint Centre for Gastrointestinal Cancer, West China Hospital, Sichuan University
    Authors
    Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COAD/READ/COADREAD_rnaseq_fpkm.txt files contain TCGA RNA-Seq data in FPKM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_rnaseq_tpm.txt files contain TCGA RNA-Seq data in TPM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_clinical_raw.xlsx files contain TCGA clinical data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_rnaseq_clinical_raw.xlsx files contain corresponding information of TCGA clinical data and RNA-Seq data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

  4. M

    Colorectal Adenocaranoma (TCGA, Firehose Legacy)

    • datacatalog.mskcc.org
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad Institute (2020). Colorectal Adenocaranoma (TCGA, Firehose Legacy) [Dataset]. https://datacatalog.mskcc.org/dataset/10467
    Explore at:
    Dataset updated
    Sep 15, 2020
    Dataset provided by
    Broad Institute
    MSK Library
    Description

    TCGA Colorectal Adenocarcinoma. Source data from GDAC Firehose. Previously known as TCGA Provisional.
    This dataset contains summary data visualizations and clinical data from a broad sampling of 640 carcinomas from 636 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, BRAF Gene Analysis Indicator, BRAF Gene Analysis Result, and Days to Sample Collection. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.

  5. DICOM converted Slide Microscopy images for the TCGA-COAD collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.13346249
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Genome Atlas-Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several of the TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the COAD cases.

    Please see the TCGA-COAD page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_coad-idc_v18-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_coad-idc_v18-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_coad-idc_v18-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  6. h

    Data from: TCGA

    • huggingface.co
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lab-Rasool (2024). TCGA [Dataset]. https://huggingface.co/datasets/Lab-Rasool/TCGA
    Explore at:
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    Lab-Rasool
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

    The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, slide images, molecular data, and radiology images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models including GatorTron, MedGemma, Qwen, Llama, UNI, SeNMo, REMEDIS, and… See the full description on the dataset page: https://huggingface.co/datasets/Lab-Rasool/TCGA.

  7. h

    TCGA-Cancer-Variant-and-Clinical-Data

    • huggingface.co
    Updated Oct 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Seq-to-Pheno
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TCGA Cancer Variant and Clinical Data

      Dataset Description
    

    This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

      Dataset Summary
    

    The dataset includes:

    Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.

  8. TCGA-COAD.star_counts

    • kaggle.com
    zip
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeynep Sonkaya (2025). TCGA-COAD.star_counts [Dataset]. https://www.kaggle.com/datasets/zzzz07/tcga-coad-star-counts
    Explore at:
    zip(52939214 bytes)Available download formats
    Dataset updated
    May 14, 2025
    Authors
    Zeynep Sonkaya
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Zeynep Sonkaya

    Released under Apache 2.0

    Contents

  9. TCGA COAD MSI vs MSS Prediction (JPG)

    • kaggle.com
    zip
    Updated Aug 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joan Gibert (2019). TCGA COAD MSI vs MSS Prediction (JPG) [Dataset]. https://www.kaggle.com/joangibert/tcga_coad_msi_mss_jpg
    Explore at:
    zip(11756515042 bytes)Available download formats
    Dataset updated
    Aug 23, 2019
    Authors
    Joan Gibert
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    This dataset comes from here: Kather, Jakob Nikolas. (2019). Histological images for MSI vs. MSS classification in gastrointestinal cancer, FFPE samples [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2530835

    Much of the information in the description come either from the dataset description or the scientific article using it to predict MSI:

    Microsatellite instability determines whether patients with gastrointestinal cancer respond exceptionally well to immunotherapy. However, in clinical practice, not every patient is tested for MSI, because this requires additional genetic or immunohistochemical tests.

    Content

    This repository contains 192312 unique image patches derived from histological images of colorectal cancer and gastric cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from formalin-fixed paraffin-embedded (FFPE) diagnostic slides ("DX" at the GDC data portal). This is explained well in this blog: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

    Preprocessing All SVS slides were preprocessed as follows

    1. Automatic detection of tumor

    2. Resizing to 224 px x 224 px at a resolution of 0.5 µm/px

    3. Color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

    4. Assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite instable or highly mutated)

    5. Reformat the original images to JPG format (using bash command mogrify)

    Acknowledgements

    Thanks to Jakob Nikolas Kather for the paper and the github page

    Inspiration

    This dataset tries to analyze a feature that is actually impossible to identify using the human eye. Additional test are needed to identify this set of patients which take time for the patients to start a treatment. Great sensitivity of this kind of task could lead to a great boost in patient diagnosis and treatment.

  10. Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky (2023). Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2021.782699.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactions of the extracellular matrix (ECM) and cellular receptors constitute one of the crucial pathways involved in colorectal cancer progression and metastasis. With the use of bioinformatics analysis, we comprehensively evaluated the prognostic information concentrated in the genes from this pathway. First, we constructed a ECM–receptor regulatory network by integrating the transcription factor (TF) and 5’-isomiR interaction databases with mRNA/miRNA-seq data from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD). Notably, one-third of interactions mediated by 5’-isomiRs was represented by noncanonical isomiRs (isomiRs, whose 5’-end sequence did not match with the canonical miRBase version). Then, exhaustive search-based feature selection was used to fit prognostic signatures composed of nodes from the network for overall survival prediction. Two reliable prognostic signatures were identified and validated on the independent The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) cohort. The first signature was made up by six genes, directly involved in ECM–receptor interaction: AGRN, DAG1, FN1, ITGA5, THBS3, and TNC (concordance index 0.61, logrank test p = 0.0164, 3-years ROC AUC = 0.68). The second hybrid signature was composed of three regulators: hsa-miR-32-5p, NR1H2, and SNAI1 (concordance index 0.64, logrank test p = 0.0229, 3-years ROC AUC = 0.71). While hsa-miR-32-5p exclusively regulated ECM-related genes (COL1A2 and ITGA5), NR1H2 and SNAI1 also targeted other pathways (adhesion, cell cycle, and cell division). Concordant distributions of the respective risk scores across four stages of colorectal cancer and adjacent normal mucosa additionally confirmed reliability of the models.

  11. COAD samples somatic mutation data

    • figshare.com
    • search.datacite.org
    application/gzip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD samples somatic mutation data [Dataset]. http://doi.org/10.6084/m9.figshare.1061910.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD samples somatic mutation data in BED format.

  12. h

    TCGA-PAAD

    • huggingface.co
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HLMCC (2025). TCGA-PAAD [Dataset]. https://huggingface.co/datasets/HLMCC/TCGA-PAAD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2025
    Authors
    HLMCC
    Description

    Dataset Card for TCGA-PAAD Clinical Data

      Dataset Summary
    

    The TCGA-PAAD (The Cancer Genome Atlas - Pancreatic Adenocarcinoma) clinical dataset contains clinical data related to pancreatic adenocarcinoma patients. This dataset is part of the broader TCGA project, aimed at providing comprehensive genomic and clinical data for various types of cancer. The clinical data includes information such as patient demographics, treatment history, survival data, and other clinical… See the full description on the dataset page: https://huggingface.co/datasets/HLMCC/TCGA-PAAD.

  13. DICOM converted Slide Microscopy images for the TCGA-READ collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-READ collection [Dataset]. http://doi.org/10.5281/zenodo.12689999
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-READ. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Genome Atlas-Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the genetically-analyzed READ cases.


    Please see the TCGA-READ wiki page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_read-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_read-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_read-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  14. f

    Table1_Identification of Hub Genes in Colorectal Adenocarcinoma by...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated May 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye, Shujun; Ma, Lianjun; Chen, Lanlan; Liu, Yang; Meng, Xiangbo (2022). Table1_Identification of Hub Genes in Colorectal Adenocarcinoma by Integrated Bioinformatics.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000400732
    Explore at:
    Dataset updated
    May 27, 2022
    Authors
    Ye, Shujun; Ma, Lianjun; Chen, Lanlan; Liu, Yang; Meng, Xiangbo
    Description

    An improved understanding of the molecular mechanism of colorectal adenocarcinoma is necessary to predict the prognosis and develop new target gene therapy strategies. This study aims to identify hub genes associated with colorectal adenocarcinoma and further analyze their prognostic significance. In this study, The Cancer Genome Atlas (TCGA) COAD-READ database and the gene expression profiles of GSE25070 from the Gene Expression Omnibus were collected to explore the differentially expressed genes between colorectal adenocarcinoma and normal tissues. The weighted gene co-expression network analysis (WGCNA) and differential expression analysis identified 82 differentially co-expressed genes in the collected datasets. Enrichment analysis was applied to explore the regulated signaling pathway in colorectal adenocarcinoma. In addition, 10 hub genes were identified in the protein–protein interaction (PPI) network by using the cytoHubba plug-in of Cytoscape, where five genes were further proven to be significantly related to the survival rate. Compared with normal tissues, the expressions of the five genes were both downregulated in the GSE110224 dataset. Subsequently, the expression of the five hub genes was confirmed by the Human Protein Atlas database. Finally, we used Cox regression analysis to identify genes associated with prognosis, and a 3-gene signature (CLCA1–CLCA4–GUCA2A) was constructed to predict the prognosis of patients with colorectal cancer. In conclusion, our study revealed that the five hub genes and CLCA1–CLCA4–GUCA2A signature are highly correlated with the development of colorectal adenocarcinoma and can serve as promising prognosis factors to predict the overall survival rate of patients.

  15. Results of GSVA for TCGA-COAD.

    • plos.figshare.com
    xls
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongling Wang; Zan Yuan; Yi Lao; Jiangtao He; Shufen Mo; Kangbiao Chen; Yanyan Ye; Lu Huang (2025). Results of GSVA for TCGA-COAD. [Dataset]. http://doi.org/10.1371/journal.pone.0328560.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yongling Wang; Zan Yuan; Yi Lao; Jiangtao He; Shufen Mo; Kangbiao Chen; Yanyan Ye; Lu Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe exact mechanisms driving colorectal cancer (CRC) are yet to be fully elucidated. This study aims to confirm the reliability of a prognostic model for colon adenocarcinoma (COAD) by analyzing the varied expression levels of Glycolysis & Pyroptosis-Related Differentially Expressed Genes (G&PRDEGs) in COAD using bioinformatics tools.MethodsWe retrieved gene expression data and clinical details for COAD patients from the Cancer Genome Atlas (TCGA) database. These data were analyzed to categorize the samples into pyroptosis-positive and pyroptosis-negative groups based on their expression of G&PRDEGs. A prognostic model for COAD was then developed using LASSO Cox regression analysis, focusing on these differentially expressed genes (DEGs). Kaplan-Meier curves were plotted to assess the differences in survival between the two groups. Furthermore, we conducted multivariate Cox regression analyses to evaluate the influence of clinical parameters and model-derived risk scores. Analyses of pathway enrichment were performed using R software, alongside single-sample gene-set enrichment analysis (ssGSEA) to explore the role of immune cells and functions associated with G&PRDEGs.ResultsA predictive model was developed using 53 G&PRDEGs that were expressed differentially. An examination of survival rates revealed that the high-risk groups exhibited a noticeably diminished overall survival (OS) in comparison to the low-risk groups in the TCGA database (P 

  16. Manual tumor annotations in TCGA

    • zenodo.org
    zip
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Loeffler; Jakob Nikolas Kather; Jakob Nikolas Kather; Chiara Loeffler (2021). Manual tumor annotations in TCGA [Dataset]. http://doi.org/10.5281/zenodo.5320076
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chiara Loeffler; Jakob Nikolas Kather; Jakob Nikolas Kather; Chiara Loeffler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is this

    These are manual annotations of tumor tissue on TCGA diagnostic whole slide images in major solid tumor types. The aim of this project was to enrich for regions with invasive tumor tissue for subsequent molecular prediction studies, excluding whitespace, artifacts and non-tumor tissue as efficiently as possible. The aim was not to create a perfect tumor annotation on the pixel level. Annotations were done by trained observers using QuPath v0.1.2 and were converted to CSV. "COAD" and "READ" were merged to "CRC".

    More resources

    Legal

    No guarantees, no liability.

  17. e

    caArray_EXP-620: TCGA (Coad): Analysis of DNA Methylation for COAD using...

    • ebi.ac.uk
    Updated May 13, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mervi Heiskanen; Peter Laird (2015). caArray_EXP-620: TCGA (Coad): Analysis of DNA Methylation for COAD using Illumina Infinium HumanMethylation450 platform (Jhu-usc) [Dataset]. https://www.ebi.ac.uk/biostudies/studies/E-GEOD-68838
    Explore at:
    Dataset updated
    May 13, 2015
    Authors
    Mervi Heiskanen; Peter Laird
    Description

    TCGA Analysis of DNA Methylation for COAD using Illumina Infinium HumanMethylation450 platform EXP-620 Assay Type: Methylation Provider: Illumina Array Designs: jhu-usc.edu_TCGA_HumanMethylation450 Organism: Homo sapiens (ncbitax) Tissue Sites: Colon Material Types: Control Analyte, Solid normal_tissue, organism_part, Primary solid_tumor

  18. TCGA Stomach histological images

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Jaber Abdelaziz (2025). TCGA Stomach histological images [Dataset]. https://www.kaggle.com/datasets/ahmedaboenaba/tcga-stomach-histological-images
    Explore at:
    zip(601366221 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    Ahmed Jaber Abdelaziz
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset comprises histology images sourced from The Cancer Genome Atlas (TCGA), spanning the Stomach cancer. Image Specifications

    • Original Resolution: 512 × 512 pixels images are extracted from 0.5 micron-per-pixel resolution.
    • Processed Size: Images are resized to 224 × 224 pixels and saved as JPEG files.

    The dataset is provided in zipped file. Within a zip file, images are organized into two subfolders:

    * tumour
    * non-tumour
    

    Each image filename encodes the originating slide and the patch position within the slide, following this naming convention:

  19. PIVOT - COAD (light)

    • zenodo.org
    application/gzip
    Updated Jan 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy (2022). PIVOT - COAD (light) [Dataset]. http://doi.org/10.5281/zenodo.5898163
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 25, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed TCGA COAD data used for PIVOT analysis.

  20. DataSheet3_Based on cuproptosis-related lncRNAs, a novel prognostic...

    • frontiersin.figshare.com
    pdf
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chong Li; Keqian Zhang; Yuzhu Gong; Qinan Wu; Yanyan Zhang; Yan Dong; Dejia Li; Zhe Wang (2023). DataSheet3_Based on cuproptosis-related lncRNAs, a novel prognostic signature for colon adenocarcinoma prognosis, immunotherapy, and chemotherapy response.PDF [Dataset]. http://doi.org/10.3389/fphar.2023.1200054.s003
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Chong Li; Keqian Zhang; Yuzhu Gong; Qinan Wu; Yanyan Zhang; Yan Dong; Dejia Li; Zhe Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Colon adenocarcinoma (COAD) is a special pathological subtype of colorectal cancer (CRC) with highly heterogeneous solid tumors with poor prognosis, and novel biomarkers are urgently required to guide its prognosis.Material and methods: RNA-Seq data of COAD were downloaded through The Cancer Genome Atlas (TCGA) database to determine cuproptosis-related lncRNAs (CRLs) using weighted gene co-expression network analysis (WGCNA). The scores of the pathways were calculated by single-sample gene set enrichment analysis (ssGSEA). CRLs that affected prognoses were determined via the univariate COX regression analysis to develop a prognostic model using multivariate COX regression analysis and LASSO regression analysis. The model was assessed by applying Kaplan–Meier (K-M) survival analysis and receiver operating characteristic curves and validated in GSE39582 and GSE17538. The tumor microenvironment (TME), single nucleotide variants (SNV), and immunotherapy response/chemotherapy sensitivity were assessed in high- and low-score subgroups. Finally, the construction of a nomogram was adopted to predict survival rates of COAD patients during years 1, 3, and 5.Results: We found that a high cuproptosis score reduced the survival rates of COAD significantly. A total of five CRLs affecting prognosis were identified, containing AC008494.3, EIF3J-DT, AC016027.1, AL731533.2, and ZEB1-AS1. The ROC curve showed that RiskScore could perform well in predicting the prognosis of COAD. Meanwhile, we found that RiskScore showed good ability in assessing immunotherapy and chemotherapy sensitivity. Finally, the nomogram and decision curves showed that RiskScore would be a powerful predictor for COAD.Conclusion: A novel prognostic model was constructed using CRLs in COAD, and the CRLs in the model were probably a potential therapeutic target. Based on this study, RiskScore was an independent predictor factor, immunotherapy response, and chemotherapy sensitivity for COAD, providing a new scientific basis for COAD prognosis management.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
Organization logo

Historical NCI Genomic Data Commons data (09-14-2017)

Explore at:
tsvAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inge Seim; Inge Seim
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll IdentifiersAll Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples

Search
Clear search
Close search
Google apps
Main menu