100+ datasets found

Historical NCI Genomic Data Commons data (09-14-2017)
zenodo.org
data-staging.niaid.nih.gov
tsv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
Explore at:
tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1186945
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inge Seim; Inge Seim
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll Identifiers All Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples
M
Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data
datacatalog.mskcc.org
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Genome Atlas (TCGA) (2019). Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data [Dataset]. https://datacatalog.mskcc.org/dataset/10411
Explore at:
Dataset updated
Nov 20, 2019
Dataset provided by
MSK Library
The Cancer Genome Atlas (TCGA)
Description
This dataset contains summary data visualizations and clinical data from a broad sampling of 594 colorectal adenocarcinomas from 594 patients. The data was gathered as part of the PanCancer Atlas initiative, which aims to answer big, overarching questions about cancer by examining the full set of tumors characterized in the robust TCGA dataset. The clinical data includes mutation count, information about mutated genes, patient demographics, disease status, tumor typing, and chromosomal gain or loss. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
Z
Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and...
data.niaid.nih.gov
Updated Nov 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin (2021). Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5717484
Explore at:
Dataset updated
Nov 23, 2021
Dataset provided by
Department of Gastroenterology and Hepatology, Sichuan University-University of Oxford Huaxi Joint Centre for Gastrointestinal Cancer, West China Hospital, Sichuan University
Authors
Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COAD/READ/COADREAD_rnaseq_fpkm.txt files contain TCGA RNA-Seq data in FPKM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

COAD/READ/COADREAD_rnaseq_tpm.txt files contain TCGA RNA-Seq data in TPM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

COAD/READ/COADREAD_clinical_raw.xlsx files contain TCGA clinical data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

COAD/READ/COADREAD_rnaseq_clinical_raw.xlsx files contain corresponding information of TCGA clinical data and RNA-Seq data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).
M
Colorectal Adenocaranoma (TCGA, Firehose Legacy)
datacatalog.mskcc.org
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Broad Institute (2020). Colorectal Adenocaranoma (TCGA, Firehose Legacy) [Dataset]. https://datacatalog.mskcc.org/dataset/10467
Explore at:
Dataset updated
Sep 15, 2020
Dataset provided by
Broad Institute
MSK Library
Description
TCGA Colorectal Adenocarcinoma. Source data from GDAC Firehose. Previously known as TCGA Provisional.
This dataset contains summary data visualizations and clinical data from a broad sampling of 640 carcinomas from 636 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, BRAF Gene Analysis Indicator, BRAF Gene Analysis Result, and Days to Sample Collection. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
h
TCGA-Cancer-Variant-and-Clinical-Data
huggingface.co
Updated Oct 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 10, 2024
Dataset authored and provided by
Seq-to-Pheno
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
TCGA Cancer Variant and Clinical Data

Dataset Description

This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

Dataset Summary

The dataset includes:

Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.
TCGA COAD MSI vs MSS Prediction (JPG)
kaggle.com
zip
Updated Aug 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joan Gibert (2019). TCGA COAD MSI vs MSS Prediction (JPG) [Dataset]. https://www.kaggle.com/joangibert/tcga_coad_msi_mss_jpg
Explore at:
zip(11756515042 bytes)Available download formats
Dataset updated
Aug 23, 2019
Authors
Joan Gibert
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

This dataset comes from here: Kather, Jakob Nikolas. (2019). Histological images for MSI vs. MSS classification in gastrointestinal cancer, FFPE samples [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2530835

Much of the information in the description come either from the dataset description or the scientific article using it to predict MSI:

Microsatellite instability determines whether patients with gastrointestinal cancer respond exceptionally well to immunotherapy. However, in clinical practice, not every patient is tested for MSI, because this requires additional genetic or immunohistochemical tests.

Content

This repository contains 192312 unique image patches derived from histological images of colorectal cancer and gastric cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from formalin-fixed paraffin-embedded (FFPE) diagnostic slides ("DX" at the GDC data portal). This is explained well in this blog: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

Preprocessing All SVS slides were preprocessed as follows

Automatic detection of tumor

Resizing to 224 px x 224 px at a resolution of 0.5 µm/px

Color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

Assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite instable or highly mutated)

5. Reformat the original images to JPG format (using bash command mogrify)

Acknowledgements

Thanks to Jakob Nikolas Kather for the paper and the github page

Inspiration

This dataset tries to analyze a feature that is actually impossible to identify using the human eye. Additional test are needed to identify this set of patients which take time for the patients to start a treatment. Great sensitivity of this kind of task could lead to a great boost in patient diagnosis and treatment.
Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...
frontiersin.figshare.com
xlsx
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky (2023). Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2021.782699.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.782699.s003
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interactions of the extracellular matrix (ECM) and cellular receptors constitute one of the crucial pathways involved in colorectal cancer progression and metastasis. With the use of bioinformatics analysis, we comprehensively evaluated the prognostic information concentrated in the genes from this pathway. First, we constructed a ECM–receptor regulatory network by integrating the transcription factor (TF) and 5’-isomiR interaction databases with mRNA/miRNA-seq data from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD). Notably, one-third of interactions mediated by 5’-isomiRs was represented by noncanonical isomiRs (isomiRs, whose 5’-end sequence did not match with the canonical miRBase version). Then, exhaustive search-based feature selection was used to fit prognostic signatures composed of nodes from the network for overall survival prediction. Two reliable prognostic signatures were identified and validated on the independent The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) cohort. The first signature was made up by six genes, directly involved in ECM–receptor interaction: AGRN, DAG1, FN1, ITGA5, THBS3, and TNC (concordance index 0.61, logrank test p = 0.0164, 3-years ROC AUC = 0.68). The second hybrid signature was composed of three regulators: hsa-miR-32-5p, NR1H2, and SNAI1 (concordance index 0.64, logrank test p = 0.0229, 3-years ROC AUC = 0.71). While hsa-miR-32-5p exclusively regulated ECM-related genes (COL1A2 and ITGA5), NR1H2 and SNAI1 also targeted other pathways (adhesion, cell cycle, and cell division). Concordant distributions of the respective risk scores across four stages of colorectal cancer and adjacent normal mucosa additionally confirmed reliability of the models.
COAD samples somatic mutation data
figshare.com
search.datacite.org
application/gzip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Endre Sebestyén (2016). COAD samples somatic mutation data [Dataset]. http://doi.org/10.6084/m9.figshare.1061910.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1061910.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Endre Sebestyén
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA COAD samples somatic mutation data in BED format.
DICOM converted Slide Microscopy images for the TCGA-COAD collection
zenodo.org
bin
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.13346249
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13346249
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Genome Atlas-Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several of the TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the COAD cases.

Please see the TCGA-COAD page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_coad-idc_v18-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_coad-idc_v18-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_coad-idc_v18-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
DICOM converted Slide Microscopy images for the CPTAC-COAD collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the CPTAC-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.12666785
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12666785
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: CPTAC-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

This collection contains subjects from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium CPTAC Colon Adenocarcinoma cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

Please see the CPTAC-COAD wiki page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

cptac_coad-idc_v7-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

cptac_coad-idc_v7-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

cptac_coad-idc_v7-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
figshare.com
xlsx
Updated Feb 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5851743.v1
Dataset updated
Feb 2, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Namshik Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).
f
Table_2_Identification of Synergistic Drug Combinations to Target...
datasetcatalog.nlm.nih.gov
figshare.com
Updated May 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mackeyev, Yuri; De Araujo Farias, Virginea; Singh, Pankaj K.; Krishnan, Sunil; Gupta, Kshama; Jones, Jeremy C.; Quiñones-Hinojosa, Alfredo (2022). Table_2_Identification of Synergistic Drug Combinations to Target KRAS-Driven Chemoradioresistant Cancers Utilizing Tumoroid Models of Colorectal Adenocarcinoma and Recurrent Glioblastoma.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000367110
Explore at:
Dataset updated
May 18, 2022
Authors
Mackeyev, Yuri; De Araujo Farias, Virginea; Singh, Pankaj K.; Krishnan, Sunil; Gupta, Kshama; Jones, Jeremy C.; Quiñones-Hinojosa, Alfredo
Description
Treatment resistance is observed in all advanced cancers. Colorectal cancer (CRC) presenting as colorectal adenocarcinoma (COAD) is the second leading cause of cancer deaths worldwide. Multimodality treatment includes surgery, chemotherapy, and targeted therapies with selective utilization of immunotherapy and radiation therapy. Despite the early success of anti-epidermal growth factor receptor (anti-EGFR) therapy, treatment resistance is common and often driven by mutations in APC, KRAS, RAF, and PI3K/mTOR and positive feedback between activated KRAS and WNT effectors. Challenges in the direct targeting of WNT regulators and KRAS have caused alternative actionable targets to gain recent attention. Utilizing an unbiased drug screen, we identified combinatorial targeting of DDR1/BCR-ABL signaling axis with small-molecule inhibitors of EGFR-ERBB2 to be potentially cytotoxic against multicellular spheroids obtained from WNT-activated and KRAS-mutant COAD lines (HCT116, DLD1, and SW480) independent of their KRAS mutation type. Based on the data-driven approach using available patient datasets (The Cancer Genome Atlas (TCGA)), we constructed transcriptomic correlations between gene DDR1, with an expression of genes for EGFR, ERBB2-4, mitogen-activated protein kinase (MAPK) pathway intermediates, BCR, and ABL and genes for cancer stem cell reactivation, cell polarity, and adhesion; we identified a positive association of DDR1 with EGFR, ERBB2, BRAF, SOX9, and VANGL2 in Pan-Cancer. The evaluation of the pathway network using the STRING database and Pathway Commons database revealed DDR1 protein to relay its signaling via adaptor proteins (SHC1, GRB2, and SOS1) and BCR axis to contribute to the KRAS-PI3K-AKT signaling cascade, which was confirmed by Western blotting. We further confirmed the cytotoxic potential of our lead combination involving EGFR/ERBB2 inhibitor (lapatinib) with DDR1/BCR-ABL inhibitor (nilotinib) in radioresistant spheroids of HCT116 (COAD) and, in an additional devastating primary cancer model, glioblastoma (GBM). GBMs overexpress DDR1 and share some common genomic features with COAD like EGFR amplification and WNT activation. Moreover, genetic alterations in genes like NF1 make GBMs have an intrinsically high KRAS activity. We show the combination of nilotinib plus lapatinib to exhibit more potent cytotoxic efficacy than either of the drugs administered alone in tumoroids of patient-derived recurrent GBMs. Collectively, our findings suggest that combinatorial targeting of DDR1/BCR-ABL with EGFR-ERBB2 signaling may offer a therapeutic strategy against stem-like KRAS-driven chemoradioresistant tumors of COAD and GBM, widening the window for its applications in mainstream cancer therapeutics.
d
Dr (Colon Cancer)
search.dataone.org
datadryad.org
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
QunGuang Jiang; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li (2025). Dr (Colon Cancer) [Dataset]. http://doi.org/10.5061/dryad.7pvmcvdpc
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.7pvmcvdpc
Dataset updated
Jun 20, 2025
Dataset provided by
Dryad Digital Repository
Authors
QunGuang Jiang; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li
Time period covered
Jan 1, 2019
Description
Colon adenocarcinoma (COAD) is the commonest colon cancer exhibiting high mortality. Due to the association with cancers progression, long noncoding RNAs (lncRNAs) become prognostic biomarkers. This study, using relevant clinic information and expression profiles of lncRNA originating in The Cancer Genome Atlas database, aims to construct a prognostic lncRNA signature to estimate the prognosis for patients. In the training cohort, prognosis related lncRNAs were selected from differently expressed lncRNAs by univariate Cox analysis. Furthermore, the least absolute shrinkage and selection operator (LASSO) regress and multivariate Cox analysis were employed for identifying prognostic lncRNAs. The prognostic signature was constructed by those lncRNAs. Prognostic model was able to calculate each COAD patientâ€™s risk score and split the patients to groups of low and high risk. Compared to the low-risk group, the high-risk group had significant poor prognosis. Then, the prognostic signature was...
f
DataSheet_1_Identification of necroptosis-related genes for predicting...
datasetcatalog.nlm.nih.gov
Updated Nov 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying, Song-cheng; Meng, Lei; Xu, Aman; Wei, Zhi-jian; Wang, Ye; Chen, Zhang-ming; Lin, Ming-gui (2022). DataSheet_1_Identification of necroptosis-related genes for predicting prognosis and exploring immune infiltration landscape in colon adenocarcinoma.csv [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000244540
Explore at:
Dataset updated
Nov 24, 2022
Authors
Ying, Song-cheng; Meng, Lei; Xu, Aman; Wei, Zhi-jian; Wang, Ye; Chen, Zhang-ming; Lin, Ming-gui
Description
BackgroundNecroptosis is a recently discovered form of cell death that plays an important role in the occurrence and development of colon adenocarcinoma (COAD). Our study aimed to construct a risk score model to predict the prognosis of patients with COAD based on necroptosis-related genes.MethodsThe gene expression data of COAD and normal colon samples were obtained from the Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). The least absolute shrinkage and selection operator (LASSO) Cox regression analysis was used to calculate the risk score based on prognostic necroptosis-related differentially expressed genes (DEGs). Based on the risk score, patients were classified into high- and low-risk groups. Then, nomogram models were built based on the risk score and clinicopathological features. Otherwise, the model was verified in the Gene Expression Omnibus (GEO) database. Additionally, the tumor microenvironment (TME) and the level of immune infiltration were evaluated by “ESTIMATE” and single-sample gene set enrichment analysis (ssGSEA). Functional enrichment analysis was carried out to explore the potential mechanism of necroptosis in COAD. Finally, the effect of necroptosis on colon cancer cells was explored through CCK8 and transwell assays. The expression of necroptosis-related genes in colon tissues and cells treated with necroptotic inducers (TNFα) and inhibitors (NEC-1) was evaluated by quantitative real-time polymerase chain reaction (qRT-PCR).ResultsThe risk score was an independent prognostic risk factor in COAD. The predictive value of the nomogram based on the risk score and clinicopathological features was superior to TNM staging. The effectiveness of the model was well validated in GSE152430. Immune and stromal scores were significantly elevated in the high-risk group. Moreover, necroptosis may influence the prognosis of COAD via influencing the cancer immune response. In in-vitro experiments, the inhibition of necroptosis can promote proliferation and invasion ability. Finally, the differential expression of necroptosis-related genes in 16 paired colon tissues and colon cancer cells was found.ConclusionA novel necroptosis-related gene signature for forecasting the prognosis of COAD has been constructed, which possesses favorable predictive ability and offers ideas for the necroptosis-associated development of COAD.
COAD paired sample gene level read counts
commons.datacite.org
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Endre Sebestyén (2016). COAD paired sample gene level read counts [Dataset]. http://doi.org/10.6084/m9.figshare.1061501.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.1061501.v1
Dataset updated
Jan 19, 2016
Dataset provided by
DataCite
Figsharehttp://figshare.com/
Authors
Endre Sebestyén
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA COAD paired sample gene level read counts from Level 3 RNASeq-v2 data.
DICOM converted Slide Microscopy images for the TCGA-LIHC collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-LIHC collection [Dataset]. http://doi.org/10.5281/zenodo.12690003
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12690003
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-LIHC. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Imaging Program (CIP) is working directly with primary investigators from institutes participating in TCGA to obtain and load images relating to the genomic, clinical, and pathological data being stored within the TCGA Data Portal. Currently this CT and MR multi-sequence image collection of liver hepatocellular carcinoma (LIHC) patients can be matched by each unique case identifier with the extensive gene and expression data of the same case from The Cancer Genome Atlas Data Portal to research the link between clinical phenome and tissue genome.

TCGA-LIHC page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_lihc-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_lihc-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_lihc-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
PIVOT - COAD (light)
zenodo.org
application/gzip
Updated Jan 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy (2022). PIVOT - COAD (light) [Dataset]. http://doi.org/10.5281/zenodo.5898163
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5898163
Dataset updated
Jan 25, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed TCGA COAD data used for PIVOT analysis.
Esophageal Cancer Dataset
kaggle.com
zip
Updated Oct 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinaba Biswas (2024). Esophageal Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/abhinaba1biswas/esophageal-cancer-dataset/code
Explore at:
zip(407333 bytes)Available download formats
Dataset updated
Oct 14, 2024
Authors
Abhinaba Biswas
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Esophageal Cancer Dataset

Introduction:

Esophageal cancer remains one of the most aggressive cancers with a high mortality rate worldwide, presenting significant challenges for early detection and effective treatment. To support the global fight against this disease, we introduce a comprehensive clinical dataset on esophageal cancer, available on Kaggle. This dataset includes patient demographics, clinical data, and cancer-specific attributes that can be leveraged to develop AI models for detection, prognosis, and treatment planning.

Scientific Overview:

This dataset is a valuable resource for healthcare professionals and researchers working on cancer detection, personalized treatments, and prognosis models. It includes: - Patient demographics (e.g., age, gender) - Tumor histology and staging information - Treatment history - Lymph node examination results These real-world clinical attributes provide a robust foundation for AI-driven solutions in the diagnosis and treatment of esophageal cancer.

Dataset Composition:

1. Patient Demographics:

Patient Barcode: Unique patient identifier.

Tissue Source Site: Code indicating the site from which the tissue sample was sourced.

Age at Diagnosis: Facilitates age-based studies on incidence and outcomes.

Gender: Enables gender-specific analysis of disease progression.

Informed Consent Verified: Indicates whether informed consent was obtained. ### 2. Medical and Clinical History:

ICD-10 and ICD-O-3 Codes: Provides International Classification of Diseases codes for the site and histology, essential for understanding tumor characteristics (e.g., squamous cell carcinoma, adenocarcinoma).

Comorbidities: Includes information on the presence of other chronic diseases like Gastroesophageal Reflux Disease (GERD) that could impact treatment outcomes.

Smoking Status: Critical for evaluating the impact of smoking on esophageal cancer risk and prognosis. ### 3. Cancer-Specific Data:

Tumor Location: Identifies the part of the esophagus affected (e.g., upper, middle, or lower).

Histology: Details the type of cancer (e.g., squamous cell carcinoma, adenocarcinoma).

Cancer Stage: Describes the stage of cancer at diagnosis (Stages 0 to IV).

Residual Tumor Status: Indicates whether any tumors remained post-surgery (e.g., R0, R1).

Lymph Node Examination: Information such as the number of lymph nodes examined and those positive for metastasis.

Radiation Therapy and Postoperative Treatment: Indicates whether the patient received radiation therapy and additional postoperative treatments. ### 4. Clinical Outcome Data:

Karnofsky Performance Score: Assesses the patient's ability to perform daily activities.

Eastern Cooperative Oncology Group (ECOG) Performance Status: Evaluates the functional status of cancer patients. ## Implementation Guide: ### 1. Data Preprocessing:

Data Cleaning: Remove irrelevant or redundant entries and ensure consistency across the dataset (e.g., handling missing values in performance scores and treatment history).

Normalization: Standardize clinical data for model input, especially for numerical variables like age, lymph node count, and performance scores. ### 2. Model Training:

Frameworks: Use machine learning or deep learning frameworks such as TensorFlow, PyTorch, or scikit-learn.

Model Selection: Depending on dataset complexity, models like Decision Trees, Random Forests, or Neural Networks can be used.

Evaluation: Measure model performance using metrics like accuracy, precision, recall, and F1-score. ### 3. Deployment:

Clinical Decision Support: Integrate the trained model into tools for medical professionals, offering predictions or insights to support diagnosis and treatment planning for esophageal cancer.

Testing and Feedback: Test the model for accuracy and usability, incorporating a feedback loop to continuously improve model performance.

Potential Applications:

1. Machine Learning Models:

Ideal for developing algorithms for early detection, personalized treatment plans, and prognosis prediction. ### 2. Healthcare Insights:

Assists clinicians in optimizing patient care strategies and treatment protocols. ### 3. Academic Research:

Facilitates studies on the pathophysiology of esophageal cancer, risk factor assessment, and the effectiveness of various treatments.

Conclusion:

The Esophageal Cancer Dataset provides high-quality, comprehensive clinical data, essential for advancing research in esophageal cancer detection, treatment, and prognosis. We encourage the research community to utilize this dataset to drive innovation and improve patient outcomes.

Team:

Mr. Abhinaba Biswas, Student/Aspiring Data Analyst/ML Developer, JIS College of Engineering, Kalyani, Wes...
Colon cancer
kaggle.com
zip
Updated May 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AngeValli (2023). Colon cancer [Dataset]. https://www.kaggle.com/datasets/angevalli/colon-cancer/code
Explore at:
zip(607690 bytes)Available download formats
Dataset updated
May 19, 2023
Authors
AngeValli
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Colon cancer dataset of high dimension with lot of null values, for the study of dimension reduction techniques. Useful for random projections techniques. Comparison of computation time on logistic regression. To compare with sector scale dataset.

Gene Expression Cancer RNA-Seq

kaggle.com

zip

Updated May 27, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Alban NYANTUDRE (2025). Gene Expression Cancer RNA-Seq [Dataset]. https://www.kaggle.com/datasets/waalbannyantudre/gene-expression-cancer-rna-seq-donated-on-682016

Explore at:

zip(73984306 bytes)Available download formats

Dataset updated

May 27, 2025

Authors

Alban NYANTUDRE

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This collection of data is part of the RNA-Seq (HiSeq) PANCAN dataset. It is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD, and PRAD. Each sample contains the expression of 20,531 genes for a patient diagnosed with one of the following cancers:

Code	Tumor Name
BRCA	Breast invasive carcinoma (breast cancer)
KIRC	Kidney renal clear cell carcinoma (kidney)
COAD	Colon adenocarcinoma (colon)
LUAD	Lung adenocarcinoma (lung)
PRAD	Prostate adenocarcinoma (prostate)

Files:

data.csv: Gene expression matrix X (881 samples × 20,531 genes)

label.csv: True class label for each sample y (881 labels)

Source: UCI ML Repository – Gene Expression Cancer RNA-Seq Data

Facebook

Twitter

Click to copy link

Link copied

Cite

Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945

Historical NCI Genomic Data Commons data (09-14-2017)

Explore at:

tsvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1186945

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Inge Seim; Inge Seim

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll Identifiers All Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples

Clear search

Close search

Google apps

Main menu

Historical NCI Genomic Data Commons data (09-14-2017)

Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data

Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and...

Colorectal Adenocaranoma (TCGA, Firehose Legacy)

TCGA-Cancer-Variant-and-Clinical-Data

TCGA COAD MSI vs MSS Prediction (JPG)

Context

Content

Acknowledgements

Inspiration

Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...

COAD samples somatic mutation data

DICOM converted Slide Microscopy images for the TCGA-COAD collection

Collection description

Files included

Download instructions

Acknowledgments

References

DICOM converted Slide Microscopy images for the CPTAC-COAD collection

Collection description

Files included

Download instructions

Acknowledgments

References

The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

Table_2_Identification of Synergistic Drug Combinations to Target...

Dr (Colon Cancer)

DataSheet_1_Identification of necroptosis-related genes for predicting...

COAD paired sample gene level read counts

DICOM converted Slide Microscopy images for the TCGA-LIHC collection

Collection description

Files included

Download instructions

Acknowledgments

References

PIVOT - COAD (light)

Esophageal Cancer Dataset

Esophageal Cancer Dataset

Introduction:

Scientific Overview:

Dataset Composition:

1. Patient Demographics:

Potential Applications:

1. Machine Learning Models:

Conclusion:

Team:

Colon cancer

Gene Expression Cancer RNA-Seq

Historical NCI Genomic Data Commons data (09-14-2017)