100+ datasets found

Historical NCI Genomic Data Commons data (09-14-2017)
zenodo.org
data-staging.niaid.nih.gov
tsv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
Explore at:
tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1186945
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inge Seim; Inge Seim
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll Identifiers All Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples
M
Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data
datacatalog.mskcc.org
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Genome Atlas (TCGA) (2019). Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data [Dataset]. https://datacatalog.mskcc.org/dataset/10411
Explore at:
Dataset updated
Nov 20, 2019
Dataset provided by
The Cancer Genome Atlas (TCGA)
MSK Library
Description
This dataset contains summary data visualizations and clinical data from a broad sampling of 594 colorectal adenocarcinomas from 594 patients. The data was gathered as part of the PanCancer Atlas initiative, which aims to answer big, overarching questions about cancer by examining the full set of tumors characterized in the robust TCGA dataset. The clinical data includes mutation count, information about mutated genes, patient demographics, disease status, tumor typing, and chromosomal gain or loss. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
Z
Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and...
data.niaid.nih.gov
Updated Nov 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin (2021). Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5717484
Explore at:
Dataset updated
Nov 23, 2021
Dataset provided by
Department of Gastroenterology and Hepatology, Sichuan University-University of Oxford Huaxi Joint Centre for Gastrointestinal Cancer, West China Hospital, Sichuan University
Authors
Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COAD/READ/COADREAD_rnaseq_fpkm.txt files contain TCGA RNA-Seq data in FPKM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

COAD/READ/COADREAD_rnaseq_tpm.txt files contain TCGA RNA-Seq data in TPM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

COAD/READ/COADREAD_clinical_raw.xlsx files contain TCGA clinical data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

COAD/READ/COADREAD_rnaseq_clinical_raw.xlsx files contain corresponding information of TCGA clinical data and RNA-Seq data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).
M
Colorectal Adenocaranoma (TCGA, Firehose Legacy)
datacatalog.mskcc.org
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Broad Institute (2020). Colorectal Adenocaranoma (TCGA, Firehose Legacy) [Dataset]. https://datacatalog.mskcc.org/dataset/10467
Explore at:
Dataset updated
Sep 15, 2020
Dataset provided by
Broad Institute
MSK Library
Description
TCGA Colorectal Adenocarcinoma. Source data from GDAC Firehose. Previously known as TCGA Provisional.
This dataset contains summary data visualizations and clinical data from a broad sampling of 640 carcinomas from 636 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, BRAF Gene Analysis Indicator, BRAF Gene Analysis Result, and Days to Sample Collection. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
DICOM converted Slide Microscopy images for the TCGA-COAD collection
zenodo.org
bin
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.13346249
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13346249
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Genome Atlas-Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several of the TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the COAD cases.

Please see the TCGA-COAD page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_coad-idc_v18-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_coad-idc_v18-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_coad-idc_v18-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
h
Data from: TCGA
huggingface.co
Updated May 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lab-Rasool (2024). TCGA [Dataset]. https://huggingface.co/datasets/Lab-Rasool/TCGA
Explore at:
Dataset updated
May 13, 2024
Dataset authored and provided by
Lab-Rasool
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, slide images, molecular data, and radiology images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models including GatorTron, MedGemma, Qwen, Llama, UNI, SeNMo, REMEDIS, and… See the full description on the dataset page: https://huggingface.co/datasets/Lab-Rasool/TCGA.
h
TCGA-Cancer-Variant-and-Clinical-Data
huggingface.co
Updated Oct 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 10, 2024
Dataset authored and provided by
Seq-to-Pheno
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
TCGA Cancer Variant and Clinical Data

Dataset Description

This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

Dataset Summary

The dataset includes:

Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.
TCGA-COAD.star_counts
kaggle.com
zip
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeynep Sonkaya (2025). TCGA-COAD.star_counts [Dataset]. https://www.kaggle.com/datasets/zzzz07/tcga-coad-star-counts
Explore at:
zip(52939214 bytes)Available download formats
Dataset updated
May 14, 2025
Authors
Zeynep Sonkaya
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Zeynep Sonkaya

Released under Apache 2.0

Contents
TCGA COAD MSI vs MSS Prediction (JPG)
kaggle.com
zip
Updated Aug 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joan Gibert (2019). TCGA COAD MSI vs MSS Prediction (JPG) [Dataset]. https://www.kaggle.com/joangibert/tcga_coad_msi_mss_jpg
Explore at:
zip(11756515042 bytes)Available download formats
Dataset updated
Aug 23, 2019
Authors
Joan Gibert
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

This dataset comes from here: Kather, Jakob Nikolas. (2019). Histological images for MSI vs. MSS classification in gastrointestinal cancer, FFPE samples [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2530835

Much of the information in the description come either from the dataset description or the scientific article using it to predict MSI:

Microsatellite instability determines whether patients with gastrointestinal cancer respond exceptionally well to immunotherapy. However, in clinical practice, not every patient is tested for MSI, because this requires additional genetic or immunohistochemical tests.

Content

This repository contains 192312 unique image patches derived from histological images of colorectal cancer and gastric cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from formalin-fixed paraffin-embedded (FFPE) diagnostic slides ("DX" at the GDC data portal). This is explained well in this blog: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

Preprocessing All SVS slides were preprocessed as follows

Automatic detection of tumor

Resizing to 224 px x 224 px at a resolution of 0.5 µm/px

Color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

Assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite instable or highly mutated)

5. Reformat the original images to JPG format (using bash command mogrify)

Acknowledgements

Thanks to Jakob Nikolas Kather for the paper and the github page

Inspiration

This dataset tries to analyze a feature that is actually impossible to identify using the human eye. Additional test are needed to identify this set of patients which take time for the patients to start a treatment. Great sensitivity of this kind of task could lead to a great boost in patient diagnosis and treatment.
Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...
frontiersin.figshare.com
xlsx
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky (2023). Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2021.782699.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.782699.s003
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interactions of the extracellular matrix (ECM) and cellular receptors constitute one of the crucial pathways involved in colorectal cancer progression and metastasis. With the use of bioinformatics analysis, we comprehensively evaluated the prognostic information concentrated in the genes from this pathway. First, we constructed a ECM–receptor regulatory network by integrating the transcription factor (TF) and 5’-isomiR interaction databases with mRNA/miRNA-seq data from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD). Notably, one-third of interactions mediated by 5’-isomiRs was represented by noncanonical isomiRs (isomiRs, whose 5’-end sequence did not match with the canonical miRBase version). Then, exhaustive search-based feature selection was used to fit prognostic signatures composed of nodes from the network for overall survival prediction. Two reliable prognostic signatures were identified and validated on the independent The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) cohort. The first signature was made up by six genes, directly involved in ECM–receptor interaction: AGRN, DAG1, FN1, ITGA5, THBS3, and TNC (concordance index 0.61, logrank test p = 0.0164, 3-years ROC AUC = 0.68). The second hybrid signature was composed of three regulators: hsa-miR-32-5p, NR1H2, and SNAI1 (concordance index 0.64, logrank test p = 0.0229, 3-years ROC AUC = 0.71). While hsa-miR-32-5p exclusively regulated ECM-related genes (COL1A2 and ITGA5), NR1H2 and SNAI1 also targeted other pathways (adhesion, cell cycle, and cell division). Concordant distributions of the respective risk scores across four stages of colorectal cancer and adjacent normal mucosa additionally confirmed reliability of the models.
COAD samples somatic mutation data
figshare.com
search.datacite.org
application/gzip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Endre Sebestyén (2016). COAD samples somatic mutation data [Dataset]. http://doi.org/10.6084/m9.figshare.1061910.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1061910.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Endre Sebestyén
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA COAD samples somatic mutation data in BED format.
h
TCGA-PAAD
huggingface.co
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HLMCC (2025). TCGA-PAAD [Dataset]. https://huggingface.co/datasets/HLMCC/TCGA-PAAD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2025
Authors
HLMCC
Description
Dataset Card for TCGA-PAAD Clinical Data

Dataset Summary

The TCGA-PAAD (The Cancer Genome Atlas - Pancreatic Adenocarcinoma) clinical dataset contains clinical data related to pancreatic adenocarcinoma patients. This dataset is part of the broader TCGA project, aimed at providing comprehensive genomic and clinical data for various types of cancer. The clinical data includes information such as patient demographics, treatment history, survival data, and other clinical… See the full description on the dataset page: https://huggingface.co/datasets/HLMCC/TCGA-PAAD.
DICOM converted Slide Microscopy images for the TCGA-READ collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-READ collection [Dataset]. http://doi.org/10.5281/zenodo.12689999
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12689999
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-READ. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Genome Atlas-Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the genetically-analyzed READ cases.

Please see the TCGA-READ wiki page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_read-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_read-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_read-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
f
Table1_Identification of Hub Genes in Colorectal Adenocarcinoma by...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated May 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ye, Shujun; Ma, Lianjun; Chen, Lanlan; Liu, Yang; Meng, Xiangbo (2022). Table1_Identification of Hub Genes in Colorectal Adenocarcinoma by Integrated Bioinformatics.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000400732
Explore at:
Dataset updated
May 27, 2022
Authors
Ye, Shujun; Ma, Lianjun; Chen, Lanlan; Liu, Yang; Meng, Xiangbo
Description
An improved understanding of the molecular mechanism of colorectal adenocarcinoma is necessary to predict the prognosis and develop new target gene therapy strategies. This study aims to identify hub genes associated with colorectal adenocarcinoma and further analyze their prognostic significance. In this study, The Cancer Genome Atlas (TCGA) COAD-READ database and the gene expression profiles of GSE25070 from the Gene Expression Omnibus were collected to explore the differentially expressed genes between colorectal adenocarcinoma and normal tissues. The weighted gene co-expression network analysis (WGCNA) and differential expression analysis identified 82 differentially co-expressed genes in the collected datasets. Enrichment analysis was applied to explore the regulated signaling pathway in colorectal adenocarcinoma. In addition, 10 hub genes were identified in the protein–protein interaction (PPI) network by using the cytoHubba plug-in of Cytoscape, where five genes were further proven to be significantly related to the survival rate. Compared with normal tissues, the expressions of the five genes were both downregulated in the GSE110224 dataset. Subsequently, the expression of the five hub genes was confirmed by the Human Protein Atlas database. Finally, we used Cox regression analysis to identify genes associated with prognosis, and a 3-gene signature (CLCA1–CLCA4–GUCA2A) was constructed to predict the prognosis of patients with colorectal cancer. In conclusion, our study revealed that the five hub genes and CLCA1–CLCA4–GUCA2A signature are highly correlated with the development of colorectal adenocarcinoma and can serve as promising prognosis factors to predict the overall survival rate of patients.
Results of GSVA for TCGA-COAD.
plos.figshare.com
xls
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yongling Wang; Zan Yuan; Yi Lao; Jiangtao He; Shufen Mo; Kangbiao Chen; Yanyan Ye; Lu Huang (2025). Results of GSVA for TCGA-COAD. [Dataset]. http://doi.org/10.1371/journal.pone.0328560.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0328560.t005
Dataset updated
Jul 18, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Yongling Wang; Zan Yuan; Yi Lao; Jiangtao He; Shufen Mo; Kangbiao Chen; Yanyan Ye; Lu Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe exact mechanisms driving colorectal cancer (CRC) are yet to be fully elucidated. This study aims to confirm the reliability of a prognostic model for colon adenocarcinoma (COAD) by analyzing the varied expression levels of Glycolysis & Pyroptosis-Related Differentially Expressed Genes (G&PRDEGs) in COAD using bioinformatics tools.MethodsWe retrieved gene expression data and clinical details for COAD patients from the Cancer Genome Atlas (TCGA) database. These data were analyzed to categorize the samples into pyroptosis-positive and pyroptosis-negative groups based on their expression of G&PRDEGs. A prognostic model for COAD was then developed using LASSO Cox regression analysis, focusing on these differentially expressed genes (DEGs). Kaplan-Meier curves were plotted to assess the differences in survival between the two groups. Furthermore, we conducted multivariate Cox regression analyses to evaluate the influence of clinical parameters and model-derived risk scores. Analyses of pathway enrichment were performed using R software, alongside single-sample gene-set enrichment analysis (ssGSEA) to explore the role of immune cells and functions associated with G&PRDEGs.ResultsA predictive model was developed using 53 G&PRDEGs that were expressed differentially. An examination of survival rates revealed that the high-risk groups exhibited a noticeably diminished overall survival (OS) in comparison to the low-risk groups in the TCGA database (P
Manual tumor annotations in TCGA
zenodo.org
zip
Updated Oct 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chiara Loeffler; Jakob Nikolas Kather; Jakob Nikolas Kather; Chiara Loeffler (2021). Manual tumor annotations in TCGA [Dataset]. http://doi.org/10.5281/zenodo.5320076
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5320076
Dataset updated
Oct 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chiara Loeffler; Jakob Nikolas Kather; Jakob Nikolas Kather; Chiara Loeffler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is this

These are manual annotations of tumor tissue on TCGA diagnostic whole slide images in major solid tumor types. The aim of this project was to enrich for regions with invasive tumor tissue for subsequent molecular prediction studies, excluding whitespace, artifacts and non-tumor tissue as efficiently as possible. The aim was not to create a perfect tumor annotation on the pixel level. Annotations were done by trained observers using QuPath v0.1.2 and were converted to CSV. "COAD" and "READ" were merged to "CRC".

More resources

For difference between diagnostic and frozen slides, please see: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

For a list of all tumor types in TCGA, see, see: https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations

Legal

No guarantees, no liability.
e
caArray_EXP-620: TCGA (Coad): Analysis of DNA Methylation for COAD using...
ebi.ac.uk
Updated May 13, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mervi Heiskanen; Peter Laird (2015). caArray_EXP-620: TCGA (Coad): Analysis of DNA Methylation for COAD using Illumina Infinium HumanMethylation450 platform (Jhu-usc) [Dataset]. https://www.ebi.ac.uk/biostudies/studies/E-GEOD-68838
Explore at:
Dataset updated
May 13, 2015
Authors
Mervi Heiskanen; Peter Laird
Description
TCGA Analysis of DNA Methylation for COAD using Illumina Infinium HumanMethylation450 platform EXP-620 Assay Type: Methylation Provider: Illumina Array Designs: jhu-usc.edu_TCGA_HumanMethylation450 Organism: Homo sapiens (ncbitax) Tissue Sites: Colon Material Types: Control Analyte, Solid normal_tissue, organism_part, Primary solid_tumor
TCGA Stomach histological images
kaggle.com
zip
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Jaber Abdelaziz (2025). TCGA Stomach histological images [Dataset]. https://www.kaggle.com/datasets/ahmedaboenaba/tcga-stomach-histological-images
Explore at:
zip(601366221 bytes)Available download formats
Dataset updated
Jan 14, 2025
Authors
Ahmed Jaber Abdelaziz
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset comprises histology images sourced from The Cancer Genome Atlas (TCGA), spanning the Stomach cancer. Image Specifications

Original Resolution: 512 × 512 pixels images are extracted from 0.5 micron-per-pixel resolution.

Processed Size: Images are resized to 224 × 224 pixels and saved as JPEG files.

The dataset is provided in zipped file. Within a zip file, images are organized into two subfolders:

* tumour * non-tumour

Each image filename encodes the originating slide and the patch position within the slide, following this naming convention:
PIVOT - COAD (light)
zenodo.org
application/gzip
Updated Jan 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy (2022). PIVOT - COAD (light) [Dataset]. http://doi.org/10.5281/zenodo.5898163
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5898163
Dataset updated
Jan 25, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed TCGA COAD data used for PIVOT analysis.
DataSheet3_Based on cuproptosis-related lncRNAs, a novel prognostic...
frontiersin.figshare.com
pdf
Updated Jun 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chong Li; Keqian Zhang; Yuzhu Gong; Qinan Wu; Yanyan Zhang; Yan Dong; Dejia Li; Zhe Wang (2023). DataSheet3_Based on cuproptosis-related lncRNAs, a novel prognostic signature for colon adenocarcinoma prognosis, immunotherapy, and chemotherapy response.PDF [Dataset]. http://doi.org/10.3389/fphar.2023.1200054.s003
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2023.1200054.s003
Dataset updated
Jun 12, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Chong Li; Keqian Zhang; Yuzhu Gong; Qinan Wu; Yanyan Zhang; Yan Dong; Dejia Li; Zhe Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction: Colon adenocarcinoma (COAD) is a special pathological subtype of colorectal cancer (CRC) with highly heterogeneous solid tumors with poor prognosis, and novel biomarkers are urgently required to guide its prognosis.Material and methods: RNA-Seq data of COAD were downloaded through The Cancer Genome Atlas (TCGA) database to determine cuproptosis-related lncRNAs (CRLs) using weighted gene co-expression network analysis (WGCNA). The scores of the pathways were calculated by single-sample gene set enrichment analysis (ssGSEA). CRLs that affected prognoses were determined via the univariate COX regression analysis to develop a prognostic model using multivariate COX regression analysis and LASSO regression analysis. The model was assessed by applying Kaplan–Meier (K-M) survival analysis and receiver operating characteristic curves and validated in GSE39582 and GSE17538. The tumor microenvironment (TME), single nucleotide variants (SNV), and immunotherapy response/chemotherapy sensitivity were assessed in high- and low-score subgroups. Finally, the construction of a nomogram was adopted to predict survival rates of COAD patients during years 1, 3, and 5.Results: We found that a high cuproptosis score reduced the survival rates of COAD significantly. A total of five CRLs affecting prognosis were identified, containing AC008494.3, EIF3J-DT, AC016027.1, AL731533.2, and ZEB1-AS1. The ROC curve showed that RiskScore could perform well in predicting the prognosis of COAD. Meanwhile, we found that RiskScore showed good ability in assessing immunotherapy and chemotherapy sensitivity. Finally, the nomogram and decision curves showed that RiskScore would be a powerful predictor for COAD.Conclusion: A novel prognostic model was constructed using CRLs in COAD, and the CRLs in the model were probably a potential therapeutic target. Based on this study, RiskScore was an independent predictor factor, immunotherapy response, and chemotherapy sensitivity for COAD, providing a new scientific basis for COAD prognosis management.

Facebook

Twitter

Click to copy link

Link copied

Cite

Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945

Historical NCI Genomic Data Commons data (09-14-2017)

Explore at:

tsvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1186945

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Inge Seim; Inge Seim

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll Identifiers All Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples

Clear search

Close search

Google apps

Main menu

Historical NCI Genomic Data Commons data (09-14-2017)

Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data

Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and...

Colorectal Adenocaranoma (TCGA, Firehose Legacy)

DICOM converted Slide Microscopy images for the TCGA-COAD collection

Collection description

Files included

Download instructions

Acknowledgments

References

Data from: TCGA

TCGA-Cancer-Variant-and-Clinical-Data

TCGA-COAD.star_counts

Dataset

Contents

TCGA COAD MSI vs MSS Prediction (JPG)

Context

Content

Acknowledgements

Inspiration

Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...

COAD samples somatic mutation data

TCGA-PAAD

DICOM converted Slide Microscopy images for the TCGA-READ collection

Collection description

Files included

Download instructions

Acknowledgments

References

Table1_Identification of Hub Genes in Colorectal Adenocarcinoma by...

Results of GSVA for TCGA-COAD.

Manual tumor annotations in TCGA

caArray_EXP-620: TCGA (Coad): Analysis of DNA Methylation for COAD using...

TCGA Stomach histological images

PIVOT - COAD (light)

DataSheet3_Based on cuproptosis-related lncRNAs, a novel prognostic...

Historical NCI Genomic Data Commons data (09-14-2017)