100+ datasets found

c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Z
TCGA Clinical Datasets
data.niaid.nih.gov
Updated Jul 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar (2023). TCGA Clinical Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8193637
Explore at:
Dataset updated
Jul 29, 2023
Authors
Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. These datasets include phenotypic information about BLCA, CESC, GBM, HNSC, KIRC, and LGG. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
figshare.com
xlsx
Updated Feb 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5851743.v1
Dataset updated
Feb 2, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Namshik Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).
c
The Cancer Genome Atlas Breast Invasive Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Breast Invasive Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.
M
Bladder Cancer (TCGA, Cell 2017): Comprehensive analysis of muscle-invasive...
datacatalog.mskcc.org
Updated Nov 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robertson, A. Gordon; Kim, Jaegil; Al-Ahmadie, Hikmat; Bellmunt, Joaquim; Guo, Guangwu; Cherniack, Andrew D.; Hinoue, Toshinori; Laird, Peter W.; Hoadley, Katherine A.; Akbani, Rehan; Castro, Mauro A. A.; Gibb, Ewan A.; Kanchi, Rupa S.; Gordenin, Dmitry A.; Shukla, Sachet A.; Sanchez-Vega, Francisco; Hansel, Donna E.; Czerniak, Bogdan A.; Reuter, Victor; Su, Xiaoping; Carvalho, Benilton; Chagas, Vinicius S.; Mungall, Karen L.; Sadeghi, Sara; Pedamallu, Chandra Sekhar; Lu, Yiling; Klimczak, Leszek J.; Zhang, Jiexin; Choo, Caleb; Ojesina, Akinyemi I.; Bullman, Susan; Leraas, Kristen M.; Lichtenberg, Tara M.; Wu, Catherine J.; Schultz, Nikolaus D.; Getz, Gad; Meyerson, Matthew; Mills, Gordon B.; McConkey, David J.; TCGA Research Network; Weinstein, John N.; Kwiatkowski, David J.; Lerner, Seth P. (2019). Bladder Cancer (TCGA, Cell 2017): Comprehensive analysis of muscle-invasive bladder cancers characterized by multiple TCGA analytical platforms. [Dataset]. https://datacatalog.mskcc.org/dataset/10400
Explore at:
Dataset updated
Nov 18, 2019
Dataset provided by
MSK Library
The Cancer Genome Atlas (TCGA)
Authors
Robertson, A. Gordon; Kim, Jaegil; Al-Ahmadie, Hikmat; Bellmunt, Joaquim; Guo, Guangwu; Cherniack, Andrew D.; Hinoue, Toshinori; Laird, Peter W.; Hoadley, Katherine A.; Akbani, Rehan; Castro, Mauro A. A.; Gibb, Ewan A.; Kanchi, Rupa S.; Gordenin, Dmitry A.; Shukla, Sachet A.; Sanchez-Vega, Francisco; Hansel, Donna E.; Czerniak, Bogdan A.; Reuter, Victor; Su, Xiaoping; Carvalho, Benilton; Chagas, Vinicius S.; Mungall, Karen L.; Sadeghi, Sara; Pedamallu, Chandra Sekhar; Lu, Yiling; Klimczak, Leszek J.; Zhang, Jiexin; Choo, Caleb; Ojesina, Akinyemi I.; Bullman, Susan; Leraas, Kristen M.; Lichtenberg, Tara M.; Wu, Catherine J.; Schultz, Nikolaus D.; Getz, Gad; Meyerson, Matthew; Mills, Gordon B.; McConkey, David J.; TCGA Research Network; Weinstein, John N.; Kwiatkowski, David J.; Lerner, Seth P.
Description
This dataset contains summary data visualizations and clinical data for a list of oncogenic and likely oncogenic alterations in 413 samples in tumors of the bladder from 412 patients, gathered as part of a comprehensive molecular characterization of muscle-invasive bladder cancer. Clinical data includes mutation count, information about mutated genes, patient demographics, and American Joint Committee on Cancer classification codes among other relevant data points.
c
The Cancer Genome Atlas Ovarian Cancer Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.
Analysis dataset for the paper "Large-scale analysis of genome and...
figshare.com
search.datacite.org
application/gzip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Endre Sebestyén; Babita Singh; Belén Miñana; Amadís Pagès; Francesca Mateo; Miguel Angel Pujana; Juan Valcárcel; Eduardo Eyras (2023). Analysis dataset for the paper "Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks" [Dataset]. http://doi.org/10.6084/m9.figshare.3466025.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3466025.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Endre Sebestyén; Babita Singh; Belén Miñana; Amadís Pagès; Francesca Mateo; Miguel Angel Pujana; Juan Valcárcel; Eduardo Eyras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains additional files related to the paperE. Sebestyén*, B. Singh*, B. Miñana, A. Pagès, F. Mateo, M. A. Pujana, J. Valcárcel, E. Eyras (2016) Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks. Genome Res, 26: 732-744, doi:10.1101/gr.199935.115It contains the following tar.gz archives:GR-Sebestyen-2016-TCGA-correlations.tgz contains the Spearman correlation of the alternative splicing event PSI values with the expression z-score of the differentially expressed RBPs in a particular tumor type.GR-Sebestyen-2016-TCGA-deltapsi.tgz contains the differential splicing analysis results of the events between the tumor and normal conditions in a particular tumor type.GR-Sebestyen-2016-TCGA-diffexp.tgz contains the differential expression analysis results of all genes between the tumor and normal conditions in a particular tumor type. GR-Sebestyen-2016-TCGA-motif.tgz contains the fasta sequence of all event types, and the number of RNAcompete motifs found in the events using FIMO.GR-Sebestyen-2016-TCGA-psi.tgz contains the PSI values of all events in all samples processed in a particular tumor type.GR-Sebestyen-2016-TCGA-tpm.tgz contains the TPM values of all isoforms in all samples processed in a particular tumor type. GR-Sebestyen-2016-TCGA-zscore.tgz contains the expression z-score of all genes in all samples processed in a particular tumor type.GR-Sebestyen-2016-TCGA-fimo.tgz contains the original RNAcompete FIMO results for all event types.For details on data generation, see the Genome Research paper. The data presented here are based upon data generated by the TCGA Research Network: http://cancergenome.nih.govIf you reuse the data, please cite the Genome Research paper.
c
The Cancer Genome Atlas Lung Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 30, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2017). The Cancer Genome Atlas Lung Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.JGNIHEP5
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.JGNIHEP5
Dataset updated
Jan 30, 2017
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Lung Phenotype Research Group.
TCGA Lower Grade Glioma (LGG) Clinical Data
zenodo.org
data-staging.niaid.nih.gov
csv
Updated Jul 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar; Swati Baskiyar (2023). TCGA Lower Grade Glioma (LGG) Clinical Data [Dataset]. http://doi.org/10.5281/zenodo.8190154
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8190154
Dataset updated
Jul 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Swati Baskiyar; Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about LGG. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
clustering and survival analysis on multi-omics datasets
figshare.com
zip
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuting Lin (2024). clustering and survival analysis on multi-omics datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27613242.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27613242.v4
Dataset updated
Nov 8, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Shuting Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
multi-omics data: the input data of the analysis, including miRNA, gene expression data, DNA methylation data, and survival outcome data. All the data were downloaded from TCGA.code: 1. data preprocessing. 2. clustering patients in each omics layer and performing Kaplan-Meier survival analysis to determine the association between patient clusters and survival outcomes. 3. differential expression analysis to identify features that are associated with patients with consistent survival outcomes.
The Cancer Genome Atlas (TCGA) Analysis
figshare.com
xlsx
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Zach (2025). The Cancer Genome Atlas (TCGA) Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.28848074.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28848074.v1
Dataset updated
Jul 16, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Robert Zach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GWL and B55α RNA expression levels in tumour and matching normal tissues represented in the cancer genome atlas (TCGA) repository generated by the TCGA Research Network.
S
Figure S1 Functional analysis based on the DEGs between the two-risk groups...
scidb.cn
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
吴猛 (2024). Figure S1 Functional analysis based on the DEGs between the two-risk groups in the TCGA-SKCM cohort. [Dataset]. http://doi.org/10.57760/sciencedb.17560
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.17560
Dataset updated
May 14, 2024
Dataset provided by
Science Data Bank
Authors
吴猛
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figure S1 Functional analysis based on the DEGs between the two-risk groups in the TCGA-SKCM cohort. (A) The vocalno plot shows the differential expression genes between high risk and low-risk groups. Red, upregulated in high-risk group; blue, upregulated in low-risk group; Grey, no significant change. Bubble graph for GO enrichment (B) and KEGG pathways (C) (the bigger bubble means the more genes enriched, and the increasing depth of red means the differences were more obvious; q-value: the adjusted p-value). (D) Barplot shows the differences in enrichment in the cancer hallmark pathways between high-risk and low-risk groups in TCGA SKCM dataset. TCGA: The Cancer Genome Atlas; SKCM: Skin cutaneous melanoma; DEGs: Differentially expressed genes; KEGG: Kyoto Encyclopedia of Genes and Genomes
Z
TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jul 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar (2023). TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8190145
Explore at:
Dataset updated
Jul 29, 2023
Authors
Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about KIRC. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
TCGA Head & Neck Squamous Cell Carcinoma (HNSC) Clinical Data
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar; Swati Baskiyar (2023). TCGA Head & Neck Squamous Cell Carcinoma (HNSC) Clinical Data [Dataset]. http://doi.org/10.5281/zenodo.8190127
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8190127
Dataset updated
Jul 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Swati Baskiyar; Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about HNSC. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
c
The Cancer Genome Atlas Sarcoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Sarcoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.CX6YLSUX
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.CX6YLSUX
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Sarcoma (TCGA-SARC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
r
TCGA BioBombe Results
resodate.org
data.niaid.nih.gov
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Way (2020). TCGA BioBombe Results [Dataset]. https://resodate.org/resources/aHR0cHM6Ly96ZW5vZG8ub3JnL3JlY29yZHMvMjExMDc1Mg==
Explore at:
Dataset updated
Jan 21, 2020
Dataset provided by
Zenodo
Authors
Gregory Way
Description
BioBombe analysis applied to gene expression data from The Cancer Genome Atlas (TCGA) PanCanAtlas. Method and results described in https://github.com/greenelab/BioBombe
f
Data Sheet 1_Integrative analysis of DNA methylation, RNA sequencing, and...
datasetcatalog.nlm.nih.gov
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lee, Jae Kwan; Kim, Eun Na; Ouh, Yung Taek; Hong, Jin Hwa; Cho, Hyun Woong; Chun, Yikyeong; Oh, Yoonji; Roh, Sanghyun; Kim, Hayeon; Kim, Chungyeul; Jeong, Sohyeon; Gim, Jeong-An (2025). Data Sheet 1_Integrative analysis of DNA methylation, RNA sequencing, and genomic variants in the cancer genome atlas (TCGA) to predict endometrial cancer recurrence.zip [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002057531
Explore at:
Dataset updated
Apr 28, 2025
Authors
Lee, Jae Kwan; Kim, Eun Na; Ouh, Yung Taek; Hong, Jin Hwa; Cho, Hyun Woong; Chun, Yikyeong; Oh, Yoonji; Roh, Sanghyun; Kim, Hayeon; Kim, Chungyeul; Jeong, Sohyeon; Gim, Jeong-An
Description
IntroductionThe prognosis within each subtype varies due to histological and molecular factors. This study leverages omics datasets and machine learning to identify biomarkers associated with EC recurrence in different molecular subtypes.MethodsUtilizing DNA methylation, RNA-sequencing, and common variant data from 116 EC samples in The Cancer Genome Atlas (TCGA), differentially expressed genes (DEGs) and differentially methylated regions (DMRs) were identified using t-tests between recurrence and non-recurrence groups. These were visualized through volcano plots and heat maps, while decision trees and random forests classified and stratified the samples.ResultsA machine learning analysis combined with box plots showed that in the copy number-high (CN-H) recurrence group, PARD6G-AS1 had decreased methylation, CSMD1 had increased methylation, and TESC expression was higher than the non-recurrence group. In the copy number-low (CN-L) recurrence group, CD44 expression was elevated. Further validation using TCGA clinical data confirmed PARD6G-AS1 hypomethylation and CD44 overexpression as significant indicators of recurrence (p=0.006 and p=0.02, respectively), and both were linked to advanced stage and lymph node metastasis.ConclusionThe study concludes that PARD6G-AS1 hypomethylation and CD44 overexpression are potential predictors of recurrence in CN-H and CN-L EC patients, respectively.
o
TCGA Head & Neck Squamous Cell Carcinoma (HNSC) Gene Expression
explore.openaire.eu
data.niaid.nih.gov
Updated Jul 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar (2023). TCGA Head & Neck Squamous Cell Carcinoma (HNSC) Gene Expression [Dataset]. http://doi.org/10.5281/zenodo.8187719
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8187719
Dataset updated
Jul 26, 2023
Authors
Swati Baskiyar
Description
Abstract: The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset contains information about HNSC, a type of cancer that originates in the squamous cells lining the mucosal surfaces of the head and neck region, including the oral cavity, throat, and larynx. The gene expression profile was measured experimentally using the Illumina HiSeq 2000 RNA Sequencing platform by the University of North Carolina TCGA genome characterization center. The Sample IDs serve as unique identifiers for each sample. Inspiration: This dataset was uploaded to UBRITE for GTKB project. Instruction: The log2(x+1) normalization was removed, and z-normalization was performed on the dataset using a Python script. Acknowledgments: Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8 The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764 U-BRITE last update: 07/13/2023 {"references": ["Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8", "The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113\u20131120 (2013). https://doi.org/10.1038/ng.2764"]} U-BRITE location: /data/project/ubrite/gtkb/TCGA/GeneExp
f
Data from: Cyclin-Dependent Kinase 4 is expected to be a therapeutic target...
tandf.figshare.com
tiff
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia-Ning Zhang; Feng Wei; Lin-Han Lei; Yang Yang; Yuan Yang; Wei-Ping Zhou (2024). Cyclin-Dependent Kinase 4 is expected to be a therapeutic target for hepatocellular carcinoma metastasis using integrated bioinformatic analysis [Dataset]. http://doi.org/10.6084/m9.figshare.17031708.v2
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17031708.v2
Dataset updated
Feb 14, 2024
Dataset provided by
Taylor & Francis
Authors
Jia-Ning Zhang; Feng Wei; Lin-Han Lei; Yang Yang; Yuan Yang; Wei-Ping Zhou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related mortality worldwide. HCC cells possess biological characteristics of high invasion and metastasis. In this respect, to prevent cancer cell invasion and metastasis and early active intervention, we herein screened through the TCGA database for further prognostic analysis including overall survival and disease-free survival . The Kaplan-Meier curve suggested that Cyclin-Dependent Kinase 4 (CDK4) might be an independent prognostic factor for HCC. Moreover, we performed mRNA expression analysis to measure CDK4 levels in normal liver tissues and HCC tissues, and immunohistochemistry analysis to detect protein level of CDK4 in Non-tumor tissue and HCC tissues . Our findings indicated that the expression of CDK4 was significantly higher in tumor tissues compared with Non-tumor tissue in HCC, which increased from HCC stage 1 to 3. Furthermore, the results of transwell-assay indicated that knocking down CDK4 significantly suppresses the invasion and migration of HCC cells, and the results of bioinformatics analysis revealed that genes closely associated with CDK4 are potentially worthy of further investigation. Additionally, the results of Western Blot indicated CDK4 regulates epithelial mesenchymal transition in HCC,and CDK4 appears to regulate EMT and HCC progression via the Wnt/β-catenin pathway. Collectively, this study found the key target gene through bioinformatic analysis and further functional validation through cell experiments. In particular, CDK4 is anticipated to become a crucial hub gene to snipe the metastasis of cancer cells in HCC. Abbreviations: Hepatocellular carcinoma (HCC);Cyclin-Dependent Kinase 4(CDK4);Genomic Data Commons (GDC); genes; EC, Endometrial cancer; GEO, gene expression omnibus; GO, Gene Ontology; GSEA, Gene set enrichment analysis; KEGG, Database; TCGA, The Cancer Genome Atlas; TSGs, tumor suppressor genes;epithelial mesenchymal transition (EMT).
H
Preprocessed TCGA Breast Invasive Carcinoma Multi-Omics Dataset with...
dataverse.harvard.edu
search.dataone.org
Updated Dec 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Varad Pai; Yash Gawhale; Vinay E Palled; Nagathejas M S; Bhaskarjyoti Das (2025). Preprocessed TCGA Breast Invasive Carcinoma Multi-Omics Dataset with Survival Annotations [Dataset]. http://doi.org/10.7910/DVN/G2XQPI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/G2XQPI
Dataset updated
Dec 1, 2025
Dataset provided by
Harvard Dataverse
Authors
Varad Pai; Yash Gawhale; Vinay E Palled; Nagathejas M S; Bhaskarjyoti Das
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Preprocessed multi-omics dataset from TCGA Breast Invasive Carcinoma (BRCA), comprising RNA-seq gene expression, DNA methylation, and copy number variation data for 710 patients across 16,163 genes. The dataset underwent comprehensive preprocessing and quality control, including ComBat batch correction (55% reduction in technical variance), quantile normalization and log-transformation for expression data, β-value to M-value transformation for methylation data, and KNN-based imputation for missing values. All three omics layers are gene-aligned and biologically validated through expected cross-omics correlations. The dataset is fully analysis-ready and suitable for downstream machine learning tasks such as survival prediction, molecular subtyping, and integrative multi-omics studies.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

TCGA-READ

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

dicom, n/aAvailable download formats

Unique identifier

https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU

Dataset updated

Jan 5, 2016

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

May 29, 2020

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

CIP TCGA Radiology Initiative

Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

Clear search

Close search

Google apps

Main menu

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

TCGA Clinical Datasets

The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

The Cancer Genome Atlas Breast Invasive Carcinoma Collection

CIP TCGA Radiology Initiative

Bladder Cancer (TCGA, Cell 2017): Comprehensive analysis of muscle-invasive...

The Cancer Genome Atlas Ovarian Cancer Collection

CIP TCGA Radiology Initiative

Analysis dataset for the paper "Large-scale analysis of genome and...

The Cancer Genome Atlas Lung Adenocarcinoma Collection

CIP TCGA Radiology Initiative

TCGA Lower Grade Glioma (LGG) Clinical Data

clustering and survival analysis on multi-omics datasets

The Cancer Genome Atlas (TCGA) Analysis

Figure S1 Functional analysis based on the DEGs between the two-risk groups...

TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data

TCGA Head & Neck Squamous Cell Carcinoma (HNSC) Clinical Data

The Cancer Genome Atlas Sarcoma Collection

CIP TCGA Radiology Initiative

TCGA BioBombe Results

Data Sheet 1_Integrative analysis of DNA methylation, RNA sequencing, and...

TCGA Head & Neck Squamous Cell Carcinoma (HNSC) Gene Expression

Data from: Cyclin-Dependent Kinase 4 is expected to be a therapeutic target...

Preprocessed TCGA Breast Invasive Carcinoma Multi-Omics Dataset with...

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

TCGA-READ

CIP TCGA Radiology Initiative