100+ datasets found
  1. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  2. f

    The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

    • figshare.com
    xlsx
    Updated Feb 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2018
    Dataset provided by
    figshare
    Authors
    Namshik Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).

  3. c

    The Cancer Genome Atlas Ovarian Cancer Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.

  4. TCGA DATA

    • figshare.com
    zip
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Songtao (2023). TCGA DATA [Dataset]. http://doi.org/10.6084/m9.figshare.23566341.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Songtao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA-sequencing expression (level 3) profiles and corresponding clinical information for several tumors were downloaded from the TCGA dataset (https://portal.gdc.com).

  5. h

    TCGA-Cancer-Variant-and-Clinical-Data

    • huggingface.co
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Seq-to-Pheno
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TCGA Cancer Variant and Clinical Data

      Dataset Description
    

    This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

      Dataset Summary
    

    The dataset includes:

    Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.

  6. h

    Data from: TCGA

    • huggingface.co
    Updated Aug 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lab-Rasool (2024). TCGA [Dataset]. https://huggingface.co/datasets/Lab-Rasool/TCGA
    Explore at:
    Dataset updated
    Aug 15, 2024
    Dataset authored and provided by
    Lab-Rasool
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

    The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, and slide images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron and UNI.

    Curated by: Lab Rasool Language(s) (NLP): English

      Uses
    

    from… See the full description on the dataset page: https://huggingface.co/datasets/Lab-Rasool/TCGA.

  7. Z

    TCGA Chemotherapy Response Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas A. Huber (2021). TCGA Chemotherapy Response Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3719290
    Explore at:
    Dataset updated
    Nov 1, 2021
    Dataset provided by
    Balthasar Huber
    Dalibor Hrg
    Lukas A. Huber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset on chemotherapeutic drug responses in TCGA cancer patients, cross-referenced for a hit in TCIA.at database, consisting of clinical (TCGA), cancer tissue gene-expression (TCGA) and tumor-immunome (TCIA) features. The dataset consists of 5 common chemotherapy agents, 3 CRC agents (FOLFOX, 5FU, Oxaliplatin) and 2 Lung agents (Carboplatin, Cisplatin). FOLFOX as a combinational therapy or regimen, was compiled from timings of monotherapies given to patients and as such is a novel dataset derived from TCGA data. FOLFOX dataset is primarily firstline treatment, while other drugs are not to be interpreted as firstline treatments. Drug datasets are individually available in own CSV files.

    Citation Dalibor Hrg, Balthasar Huber, Lukas A. Huber. (2020). TCGA Chemotherapy Response Dataset. Zenodo. http://doi.org/10.5281/zenodo.3719291

    The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

    License CC BY-SA 4.0 International https://creativecommons.org/licenses/by-sa/4.0. Authors take no liability for any use of this data.

    Contributions D. Hrg and B. Huber acknowledge major and equal work effort: data understanding, data science and dataset preparation (monotherapies and FOLFOX); L. A. Huber: help with dictionary of drug names and curration/cleaning of FOLFOX entries, clinical validation.

    Contact & Maintenance dalibor.hrg@gmail.com dalibor.hrg@i-med.ac.at

  8. f

    The clinical information and mRNA expression data from the TCGA database of...

    • plos.figshare.com
    xlsx
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Gyu Yoon; Jin Hwan Cheong; Je Il Ryu; Yu Deok Won; Kyueng-Whan Min; Myung-Hoon Han (2023). The clinical information and mRNA expression data from the TCGA database of 525 GBM cases. [Dataset]. http://doi.org/10.1371/journal.pone.0295061.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Gyu Yoon; Jin Hwan Cheong; Je Il Ryu; Yu Deok Won; Kyueng-Whan Min; Myung-Hoon Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The clinical information and mRNA expression data from the TCGA database of 525 GBM cases.

  9. Z

    TCGA Glioblastoma Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Way, Gregory (2020). TCGA Glioblastoma Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_60304
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Way, Gregory
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All data was downloaded from UCSC Xena on 16 August 2016. The compressed folder includes the following files:

    GBM_clinicalMatrix - tab separated file with 629 samples measured by 139 variables. Refer to the source for more details.

    SOURCE: https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?dataset=TCGA.GBM.sampleMap/GBM_clinicalMatrix&host=https://tcga.xenahubs.net

    HT_HG-U133A - tab separated gene expression (Affy U133A microarry) file with 539 samples measured by 12,043 genes. Refer to the source for more details.

    SOURCE: https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?dataset=TCGA.GBM.sampleMap/HT_HG-U133A&host=https://tcga.xenahubs.net

  10. TCGA_PanCancerRNAseq

    • zenodo.org
    • data.niaid.nih.gov
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramyar Molania; Ramyar Molania (2022). TCGA_PanCancerRNAseq [Dataset]. http://doi.org/10.1101/2021.11.01.466731
    Explore at:
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ramyar Molania; Ramyar Molania
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes the harmonised version of all The Cancer Genome Atlas (TCGA) RNA-Seq data (33 cancer types, ~ 11000 samples). Each file is a "SummarizedExperiment" object that contains:1) three assays (raw gene-level counts, FPKM and FPKM.UQ), 2) sample and batch information from different resources, 3) several details for individual genes. The data can also be explored by an Rshiny app published in our pre-print paper doi: https://doi.org/10.1101/2021.11.01.466731.

  11. d

    TCGA-PRAD

    • dataportal.asia
    zip
    Updated Sep 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scidm.nchc.org.tw (2021). TCGA-PRAD [Dataset]. https://dataportal.asia/ko_KR/dataset/212601019_tcgaprad
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 17, 2021
    Dataset provided by
    scidm.nchc.org.tw
    Description

    The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA). Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  12. Z

    TCGA Glioblastoma Multiforme (GBM) Clinical Data

    • data.niaid.nih.gov
    Updated Jul 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Baskiyar (2023). TCGA Glioblastoma Multiforme (GBM) Clinical Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8190040
    Explore at:
    Dataset updated
    Jul 29, 2023
    Dataset authored and provided by
    Swati Baskiyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about GBM. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

    Inspiration:

    This dataset was uploaded to UBRITE for GTKB project.

    Instruction:

    The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

    Acknowledgments:

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

    Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

    The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

    U-BRITE last update: 07/13/2023

  13. Information on molecular subtypes for TCGA cancer studies as provided by the...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Mounir; Marta Lucchetta; Tiago C. Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo (2023). Information on molecular subtypes for TCGA cancer studies as provided by the TCGA_MolecularSubtype function. [Dataset]. http://doi.org/10.1371/journal.pcbi.1006701.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mohamed Mounir; Marta Lucchetta; Tiago C. Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information on molecular subtypes for TCGA cancer studies as provided by the TCGA_MolecularSubtype function.

  14. n

    Multi-Omic Pan-Cancer data from TCGA.

    • narcis.nl
    • data.mendeley.com
    Updated Jan 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gonzalez reymundez, A (via Mendeley Data) (2021). Multi-Omic Pan-Cancer data from TCGA. [Dataset]. http://doi.org/10.17632/r8p67nfjc8.3
    Explore at:
    Dataset updated
    Jan 11, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    gonzalez reymundez, A (via Mendeley Data)
    Description

    Data consist of three omic blocks from The Cancer Genome Atlas (TCGA), containing whole-genome profiles of

    -Gene expression (file GE.RData), -DNA methylation (file METH.RData), and -Copy number variants (file CNV.RData).

    Omic profiles consist of information from 5,408 tumor samples across 33 cancer types (as matrix rows), and 60,112 features (expression of 20,319 genes, methylation of 28,241 CpG islands, and copy number variant intensity for 11,552 genes). GE profiles by sample corresponded with the logarithm of RNA-Seq counts by gene (Illumina HiSeq RNA V2 platform). METH profiles corresponded with CpG sites B-values from the Illumina HM450 platform, summarized at the CpG island level, using the maximum connectivity approach from the WGCNA R package (Langfelder and Horvath 2008) , and further transformed into M-values (M=beta/(1-beta); Du et al. 2010). Omic blocks were adjusted for batch and tissue specific effects (see Gonzalez-Reymundez and Vazquez (2020) and references therein for further details on quality controls and data edition).

  15. Z

    TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data

    • data.niaid.nih.gov
    Updated Jul 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Baskiyar (2023). TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8190145
    Explore at:
    Dataset updated
    Jul 29, 2023
    Dataset authored and provided by
    Swati Baskiyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about KIRC. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

    Inspiration:

    This dataset was uploaded to UBRITE for GTKB project.

    Instruction:

    The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

    Acknowledgments:

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

    Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

    The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

    U-BRITE last update: 07/13/2023

  16. f

    Additional file 2 of A novel risk score system of immune genes associated...

    • springernature.figshare.com
    xlsx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongyu Zhou; Chufan Zhang; Haoran Li; Lihua Chen; Xi Cheng (2023). Additional file 2 of A novel risk score system of immune genes associated with prognosis in endometrial cancer [Dataset]. http://doi.org/10.6084/m9.figshare.12487490.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Authors
    Hongyu Zhou; Chufan Zhang; Haoran Li; Lihua Chen; Xi Cheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2: Table S2. Differentially expressed immune genes between endometrial cancer and normal tissues.

  17. S

    Study on the Expression of CXCR4 Gene in Gastric Cancer and the Impact of...

    • scidb.cn
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhang xin zhe (2025). Study on the Expression of CXCR4 Gene in Gastric Cancer and the Impact of Tumor Microenvironment Based on TCGA Database [Dataset]. http://doi.org/10.57760/sciencedb.lcbl.00070
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Science Data Bank
    Authors
    zhang xin zhe
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Objective: To explore the expression of the CXCR4 gene in the tumor microenvironment of gastric cancer tissues and its clinical significance.Methods: The transcriptome data and clinical data of gastric cancer from the The Cancer Genome Atlas (TCGA) database were downloaded, and the expression difference of the CXCR4 gene was analyzed using the ESTIMATES algorithm and DESeq2, and the clinicopathological parameters of the patients were also analyzed. Surgical tissue specimens of patients diagnosed with gastric cancer by surgical treatment and pathological examination at Hefei First People's Hospital from August 2022 to August 2023 were retrospectively collected for reverse transcription polymerase chain reaction (RT-PCR) and immunohistochemistry to verify its expression. The CIBERSORT algorithm was used to evaluate the correlation between CXCR4 and immune cell infiltration.Results: The immune score might be more suitable for indicating the prognosis of STAD patients. TCGA data showed that the expression level of the CXCR4 gene in gastric cancer tissues was significantly higher than that in adjacent tissues (P < 0.001), and the expression of CXCR4 had significant differences in tumor invasion depth and distant metastasis (all P < 0.05). The experimental results showed that the expression of CXCR4 was positively correlated with tumor distant metastasis and differentiation degree (all P < 0.05). Kaplan - Meier survival analysis of both TCGA data and clinical data showed that the survival time of gastric cancer patients in the high - expression group was significantly shortened (P = 0.003, P < 0.001). Immune cell infiltration analysis: two types of TICs, such as B cell memory and CD8 + T cells, were positively correlated with the expression of CXCR4; six types of TICs, such as resting CD4 memory T cells, activated dendritic cells, plasma cells, activated NK killer cells, macrophages M0, and activated mast cells, were negatively correlated with the expression of CXCR4.Conclusion: The high expression of CXCR4 in gastric cancer indicates a poor prognosis, which is closely related to the progression and metastasis of the tumor. And it is related to immune cell infiltration and may become a potential target for immunotherapy.

  18. Z

    TCGA BRCA cancer dataset

    • data.niaid.nih.gov
    • portalinvestigacion.udc.gal
    • +2more
    Updated Dec 11, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Fernandez-Lozano (2020). TCGA BRCA cancer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4309167
    Explore at:
    Dataset updated
    Dec 11, 2020
    Dataset authored and provided by
    Carlos Fernandez-Lozano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Following the same steps that we used in the previous course we downloaded the TCGA-BRCA using R and Bioconductor and in particular the TCGABiolinks package. We downloaded transcriptome profiling of gene expression quantification where the experimental strategy is (RNAseq) and the workflow type is HTSeq-FPKM-UQ and only primary solid tumor data of the affymetrix GPL86 profile and clinical data.

  19. TCGA HDF file pipeline

    • zenodo.org
    application/gzip
    Updated Sep 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Merrell; David Merrell (2022). TCGA HDF file pipeline [Dataset]. http://doi.org/10.5281/zenodo.6977490
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Merrell; David Merrell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HDF files containing data from The Cancer Genome Atlas (TCGA).

    https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga

    Please see the Broad Institute's TCGA data usage policy: https://broadinstitute.atlassian.net/wiki/spaces/GDAC/pages/844333156/Data+Usage+Policy

    The HDF files were generated by the code in this repository: https://github.com/dpmerrell/tcga-pipeline

    * tcga_omic.tar.gz contains multi-omic data for 10,000+ patients. This includes copy number variation, somatic mutation, methylation, gene expression, and RPPA data.

    * tcga_clinical.tar.gz contains clinical annotations for those same patients. E.g., age, sex, survival, smoking.

    See https://github.com/dpmerrell/tcga-pipeline/blob/main/README.md for more information about the data and its layout in the HDF5 files.


    Version notes:

    2022-08-09: Fixed some bugs in string formatting. (Pipeline updated on this date; data uploaded on 2022-09-26 due to Zenodo technical issues.)

    2021-12-06: **Significant changes**. `tcga_omic.hdf` is organized very differently. It also includes more kinds of data (a) somatic mutation data and (b) full TCGA barcodes for each patient and omic type (useful for extracting batch information).

    2021-03-17: improved the naming convention for RPPA data features: {GENE}_{ANTIBODY}_rppa

    2021-02-28: improved HDF file format. We provide one big matrix of data, rather than one matrix per cancer type. Cancer type is indicated by a vector (key="cancer_types"). Updated the Omic and Clinical HDFs accordingly.

    2021-02-01: added mutation annotation scores. removed GRSN from RPPA pipeline.

    2021-01-24: removed redundant/combination datasets (COADREAD, STES, GBMLGG, KIPAN). Applied Global Rank-Invariant Set normalization (GRSN) to RPPA data.

  20. c

    The Cancer Genome Atlas Stomach Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Feb 2, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2014). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Feb 2, 2014
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

TCGA-READ

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
dicom, n/aAvailable download formats
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

CIP TCGA Radiology Initiative

Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

Search
Clear search
Close search
Google apps
Main menu