100+ datasets found
  1. m

    Advanced Data Analysis TCGA Data

    • data.mendeley.com
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neil Jairath (2023). Advanced Data Analysis TCGA Data [Dataset]. http://doi.org/10.17632/zdkh23d9y6.1
    Explore at:
    Dataset updated
    Oct 9, 2023
    Authors
    Neil Jairath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Publicly available data from The Cancer Genome Atlas used to generate insights about genomic influence on clinical outcomes.

  2. clustering and survival analysis on multi-omics datasets

    • figshare.com
    zip
    Updated Nov 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuting Lin (2024). clustering and survival analysis on multi-omics datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27613242.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 8, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Shuting Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    multi-omics data: the input data of the analysis, including miRNA, gene expression data, DNA methylation data, and survival outcome data. All the data were downloaded from TCGA.code: 1. data preprocessing. 2. clustering patients in each omics layer and performing Kaplan-Meier survival analysis to determine the association between patient clusters and survival outcomes. 3. differential expression analysis to identify features that are associated with patients with consistent survival outcomes.

  3. Z

    TCGA Clinical Datasets

    • data.niaid.nih.gov
    Updated Jul 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Baskiyar (2023). TCGA Clinical Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8193637
    Explore at:
    Dataset updated
    Jul 29, 2023
    Authors
    Swati Baskiyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. These datasets include phenotypic information about BLCA, CESC, GBM, HNSC, KIRC, and LGG. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

    Inspiration:

    This dataset was uploaded to UBRITE for GTKB project.

    Instruction:

    The survival and phenotype data were merged into one file.

    Acknowledgments:

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

    Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

    The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

    U-BRITE last update: 07/13/2023

  4. f

    Data from: Discovery Analysis of TCGA Data Reveals Association between...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 22, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan, Chunhua; Finney, Richard; Braun, Rosemary; Edmonson, Michael; Chen, Qing-Rong; Buetow, Kenneth; Meerzaman, Daoud; Hu, Ying (2013). Discovery Analysis of TCGA Data Reveals Association between Germline Genotype and Survival in Ovarian Cancer Patients [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001710647
    Explore at:
    Dataset updated
    Mar 22, 2013
    Authors
    Yan, Chunhua; Finney, Richard; Braun, Rosemary; Edmonson, Michael; Chen, Qing-Rong; Buetow, Kenneth; Meerzaman, Daoud; Hu, Ying
    Description

    BackgroundOvarian cancer remains a significant public health burden, with the highest mortality rate of all the gynecological cancers. This is attributable to the late stage at which the majority of ovarian cancers are diagnosed, coupled with the low and variable response of advanced tumors to standard chemotherapies. To date, clinically useful predictors of treatment response remain lacking. Identifying the genetic determinants of ovarian cancer survival and treatment response is crucial to the development of prognostic biomarkers and personalized therapies that may improve outcomes for the late-stage patients who comprise the majority of cases. MethodsTo identify constitutional genetic variations contributing to ovarian cancer mortality, we systematically investigated associations between germline polymorphisms and ovarian cancer survival using data from The Cancer Genome Atlas Project (TCGA). Using stage-stratified Cox proportional hazards regression, we examined 650,000 SNP loci for association with survival. We additionally examined whether the association of significant SNPs with survival was modified by somatic alterations. ResultsGermline polymorphisms at rs4934282 (AGAP11/C10orf116) and rs1857623 (DNAH14) were associated with stage-adjusted survival ( = 1.12e-07 and 1.80e-07, FDR = 1.2e-04 and 2.4e-04, respectively). A third SNP, rs4869 (C10orf116), was additionally identified as significant in the exome sequencing data; it is in near-perfect LD with rs4934282. The associations with survival remained significant when somatic alterations. ConclusionsDiscovery analysis of TCGA data reveals germline genetic variations that may play a role in ovarian cancer survival even among late-stage cases. The significant loci are located near genes previously reported as having a possible relationship to platinum and taxol response. Because the variant alleles at the significant loci are common (frequencies for rs4934282 A/C alleles = 0.54/0.46, respectively; rs1857623 A/G alleles = 0.55/0.45, respectively) and germline variants can be assayed noninvasively, our findings provide potential targets for further exploration as prognostic biomarkers and individualized therapies.

  5. f

    The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

    • figshare.com
    xlsx
    Updated Feb 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2018
    Dataset provided by
    figshare
    Authors
    Namshik Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).

  6. q

    Data from: A Course-Embedded "Plug and Play" Research Project for Teaching...

    • qubeshub.org
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Hurst-Kennedy* (2025). A Course-Embedded "Plug and Play" Research Project for Teaching Cancer Genomics [Dataset]. https://qubeshub.org/publications/5377/?v=1
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    QUBES
    Authors
    Jennifer Hurst-Kennedy*
    Description

    In silico genomics research provides students with opportunities to conduct authentic, course-embedded, biomedical research. However, tools for conducting this type of research can be challenging to learn for both students and faculty. This project curriculum guides students through the analysis of human cancer patient genomic data from The Cancer Genome Atlas (TCGA) using the analytical tool, cBioPortal. The project has a “plug and play” style guide where students “plug” variables into template research questions and follow steps to collect and analyze appropriate data (“play”) in cBioPortal. The project is designed to take place over a semester, with five checkpoints to maintain student progress and provide feedback: Checkpoint 1. Selection of Research Question and cBioPortal Activity, Checkpoint 2. Annotated Bibliography and Abstract, Checkpoint 3. Draft of Scientific Poster, Checkpoint 4. Peer Review of Scientific Posters, and Checkpoint 5. Poster Presentations. The project culminates with a formal poster presentation, allowing students to share their work and gain experience in scientific communication. Here, the project curriculum, student guidelines, and assessments are presented.

    Primary Image: This image was created using cBioPortal. The image shows partial OncoPrints for the five most commonly mutated genes in gliomas (top), pancreatic cancers (middle), and breast cancers (bottom). The OncoPrints were generated using data from the Pan-cancer analysis of whole genomes (ICGC/TCGA, Nature 2020).

  7. TCGA-BRCA:survival analysis

    • kaggle.com
    zip
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Malagón (2025). TCGA-BRCA:survival analysis [Dataset]. https://www.kaggle.com/datasets/jmalagontorres/tcga-brca-survival-analysis
    Explore at:
    zip(133161552490 bytes)Available download formats
    Dataset updated
    Mar 31, 2025
    Authors
    Juan Malagón
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Introduction

    This dataset consists of 1097 breast cancer patient cases and is designed for survival analysis using both histopathological and clinical information. The combination of these data sources allows for the exploration of disease progression patterns and the development of predictive models.

    Histopathological Data

    The dataset includes a folder containing histopathological image patches extracted from whole-slide imaging (WSI) scans.

    Optical magnification: x20

    Patch size: 1000 x 1000 pixels

    Region selection: Only patches containing tissue are included, discarding areas without relevant information

    Image-Derived Data

    For each patient, a CSV file is provided with extracted information from the histopathological patches:

    Histograms: Representation of the pixel intensity distribution in each image

    Cell count: Number of cells present in the selected patches

    Clinical Data

    A second CSV file contains clinical information about the patients, which is essential for survival analysis. The included variables are:

    Time until death: The time elapsed until the patient’s death

    Vital status: Indicates whether the patient is deceased or still alive

    Other clinical variables: Factors that may influence survival and help contextualize the histopathological data

    Dataset Objective

    The primary objective of this dataset is to facilitate the development of survival models that integrate histopathological and clinical information. This will help identify patterns in breast cancer progression and enhance predictive capabilities for estimating patient survival time.

    This dataset is ideal for exploring machine learning methods applied to digital pathology and survival analysis in oncology.

  8. TCGA TPM Expression Summary: Processed Transcriptomic Data from 10 Cancer...

    • figshare.com
    xlsx
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jirapat Techachakrit (2025). TCGA TPM Expression Summary: Processed Transcriptomic Data from 10 Cancer Types [Dataset]. http://doi.org/10.6084/m9.figshare.28324271.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jirapat Techachakrit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains TPM (Transcripts Per Million) expression values derived from The Cancer Genome Atlas (TCGA) for ten cancer types: BRCA (breast cancer), COAD (colon adenocarcinoma), DLBC (diffuse large B-cell lymphoma), KIRC (kidney renal clear cell carcinoma), KIRP (kidney renal papillary cell carcinoma), LUAD (lung adenocarcinoma), LUSC (lung squamous cell carcinoma), MESO (mesothelioma), READ (rectum adenocarcinoma), and SKCM (skin cutaneous melanoma). The data have been processed and curated for downstream bioinformatics analyses.

  9. f

    Table_1_Pan-Cancer Analysis of TCGA Data Revealed Promising Reference Genes...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Mar 1, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beniaminov, Artemy D.; Dmitriev, Alexey A.; Krasnov, George S.; Melnikova, Nataliya V.; Lakunina, Valentina A.; Snezhkina, Anastasiya V.; Kudryavtseva, Anna V. (2019). Table_1_Pan-Cancer Analysis of TCGA Data Revealed Promising Reference Genes for qPCR Normalization.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000188236
    Explore at:
    Dataset updated
    Mar 1, 2019
    Authors
    Beniaminov, Artemy D.; Dmitriev, Alexey A.; Krasnov, George S.; Melnikova, Nataliya V.; Lakunina, Valentina A.; Snezhkina, Anastasiya V.; Kudryavtseva, Anna V.
    Description

    Quantitative PCR (qPCR) remains the most widely used technique for gene expression evaluation. Obtaining reliable data using this method requires reference genes (RGs) with stable mRNA level under experimental conditions. This issue is especially crucial in cancer studies because each tumor has a unique molecular portrait. The Cancer Genome Atlas (TCGA) project provides RNA-Seq data for thousands of samples corresponding to dozens of cancers and presents the basis for assessment of the suitability of genes as reference ones for qPCR data normalization. Using TCGA RNA-Seq data and previously developed CrossHub tool, we evaluated mRNA level of 32 traditionally used RGs in 12 cancer types, including those of lung, breast, prostate, kidney, and colon. We developed an 11-component scoring system for the assessment of gene expression stability. Among the 32 genes, PUM1 was one of the most stably expressed in the majority of examined cancers, whereas GAPDH, which is widely used as a RG, showed significant mRNA level alterations in more than a half of cases. For each of 12 cancer types, we suggested a pair of genes that are the most suitable for use as reference ones. These genes are characterized by high expression stability and absence of correlation between their mRNA levels. Next, the scoring system was expanded with several features of a gene: mutation rate, number of transcript isoforms and pseudogenes, participation in cancer-related processes on the basis of Gene Ontology, and mentions in PubMed-indexed articles. All the genes covered by RNA-Seq data in TCGA were analyzed using the expanded scoring system that allowed us to reveal novel promising RGs for each examined cancer type and identify several “universal” pan-cancer RG candidates, including SF3A1, CIAO1, and SFRS4. The choice of RGs is the basis for precise gene expression evaluation by qPCR. Here, we suggested optimal pairs of traditionally used RGs for 12 cancer types and identified novel promising RGs that demonstrate high expression stability and other features of reliable and convenient RGs (high expression level, low mutation rate, non-involvement in cancer-related processes, single transcript isoform, and absence of pseudogenes).

  10. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  11. c

    The Cancer Genome Atlas Ovarian Cancer Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.

  12. c

    TCGA Breast Phenotype Research Group Data sets

    • stage.cancerimagingarchive.net
    • cancerimagingarchive.net
    n/a, xls, zip
    Updated Sep 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2018). TCGA Breast Phenotype Research Group Data sets [Dataset]. http://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G
    Explore at:
    xls, n/a, zipAvailable download formats
    Dataset updated
    Sep 4, 2018
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 4, 2018
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    At the time of our study, 108 cases with breast MRI data were available in the The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) collection. In order to minimize variations in image quality across the multi-institutional cases we included only breast MRI studies acquired on GE 1.5 Tesla magnet strength scanners (GE Medical Systems, Milwaukee, Wisconsin, USA) scanners, yielding a total of 93 cases. We then excluded cases that had missing images in the dynamic sequence (1 patient), or at the time did not have gene expression analysis available in the TCGA Data Portal (8 patients). After these criteria, a dataset of 84 breast cancer patients resulted, with MRIs from four institutions: Memorial Sloan Kettering Cancer Center, the Mayo Clinic, the University of Pittsburgh Medical Center, and the Roswell Park Cancer Institute. The resulting cases contributed by each institution were 9 (date range 1999-2002), 5 (1999-2003), 46 (1999-2004), and 24 (1999-2002), respectively. The dataset of biopsy proven invasive breast cancers included 74 (88%) ductal, 8 (10%) lobular, and 2 (2%) mixed. Of these, 73 (87%) were ER+, 67 (80%) were PR+, and 19 (23%) were HER2+. Various types of analyses were conducted using the combined imaging, genomic, and clinical data. Those analyses are described within several manuscripts created by the group (cited below). Additional information about the methodology for how the Radiologist Annotations file can be found on the TCGA Breast Image Feature Scoring Project page.

  13. c

    The Cancer Genome Atlas Breast Invasive Carcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Breast Invasive Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.

  14. c

    The Cancer Genome Atlas Low Grade Glioma Collection

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Low Grade Glioma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.L4LTD3TK
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Low Grade Glioma (TCGA-LGG) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Glioma Phenotype Research Group.

  15. Data from: Benchmark study of feature selection strategies for multi-omics...

    • figshare.com
    txt
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingxia Li; Roman Hornung (2023). Benchmark study of feature selection strategies for multi-omics data [Dataset]. http://doi.org/10.6084/m9.figshare.20060201.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yingxia Li; Roman Hornung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets are the pre-processed versions of the multi-omics data sets used in the benchmark study presented in the paper "Benchmark study of feature selection strategies for multi-omics data" by Yingxia Li, Ulrich Mansmann, Shangming Du, and Roman Hornung. The outcome feature is "TP53_mutation" in each data set, where "1" / "0" indicates the presence / absence of a TP53 mutation in the respective patients. The remaining features are clinical and omics features, where the suffix "_clinical" indicates clinical features, the suffix "_cnv" copy number variation features, the suffix "_mirna" miRNA features, the suffix "_mutation" mutation features, and the suffix " _rna" RNA features. Note that while predicting the outcome feature TP53 yes vs. no is not meaningful contextually, TP53 mutations have been found to be associated with poor clinical outcomes in cancer patients [1]. Against this background, TP53 can be used as a surrogate for a phenotypic outcome. Thus, these data sets are meant for testing machine learning or statistical procedures, they may not be useful for biological analysis.

  16. TCGA Lower Grade Glioma (LGG) Clinical Data

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Baskiyar; Swati Baskiyar (2023). TCGA Lower Grade Glioma (LGG) Clinical Data [Dataset]. http://doi.org/10.5281/zenodo.8190154
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Swati Baskiyar; Swati Baskiyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about LGG. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

    Inspiration:

    This dataset was uploaded to UBRITE for GTKB project.

    Instruction:

    The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

    Acknowledgments:

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

    Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

    The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

    U-BRITE last update: 07/13/2023

  17. f

    DataSheet_1_Integration of scRNA-Seq and TCGA RNA-Seq to Analyze the...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    French, Lars E.; Wei, Erdong; Reinholz, Markus; Li, Jiahua; Clanner-Engelshofen, Benjamin; Reisinger, Amin (2022). DataSheet_1_Integration of scRNA-Seq and TCGA RNA-Seq to Analyze the Heterogeneity of HPV+ and HPV- Cervical Cancer Immune Cells and Establish Molecular Risk Models.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000311198
    Explore at:
    Dataset updated
    Jun 1, 2022
    Authors
    French, Lars E.; Wei, Erdong; Reinholz, Markus; Li, Jiahua; Clanner-Engelshofen, Benjamin; Reisinger, Amin
    Description

    BackgroundNumerous studies support that Human papillomavirus (HPV) can cause cervical cancer. However, few studies have surveyed the heterogeneity of HPV infected or uninfected (HPV+ and HPV-) cervical cancer (CESC) patients. Integration of scRNA-seq and TCGA data to analyze the heterogeneity of HPV+ and HPV- cervical cancer patients on a single-cell level could improve understanding of the cellular mechanisms during HPV-induced cervical cancer.MethodsCESC scRNA-seq data obtained from the Gene Expression Omnibus (GEO) database and the Seurat, Monocle3 package were used for scRNA-seq data analysis. The ESTIMATE package was used for single-sample gene immune score, CIBERSORT package was used to identify immune scores of cells, and the “WGCNA” package for the weighted correlation network analysis. Univariate Cox and LASSO regression were performed to establish survival and relapse signatures. KEGG and GO analyses were performed for the signature gene. Gene Expression Profiling Interactive Analysis was used for Pan-cancer analysis.ResultsIn the HPV+ CESC group, CD8+ T cells and B cells were down-regulated, whereas T reg cells, CD4+ T cells, and epithelial cells were up-regulated according to scRNA-seq data. Survival analysis of TCGA-CESC revealed that increased expression of naive B cells or CD8+ T cells favors the survival probability of CESC patients. WGCNA, univariate Cox, and LASSO Cox regression established a 9-genes survival signature and a 7-gene relapse model. Pan-cancer analysis identified IKZF3, FOXP3, and JAK3 had a similar distribution and effects in HPV-associated HNSC.ConclusionAnalysis of scRNA-seq and bulk RNA-seq of HPV+ and HPV- CESC samples revealed heterogeneity from transcriptional state to immune infiltration. Survival and relapse models were adjusted according to the heterogeneity of HPV+ and HPV- CESC immune cells to assess the prognostic risk accurately. Hub genes represent similar protection in HPV- associated HNSC while showing irrelevant to other potential HPV-related cancers.

  18. c

    The Cancer Genome Atlas Stomach Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  19. Data_Sheet_1_Quantitative Estimation of Oxidative Stress in Cancer Tissue...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liyang Liu; Haining Cui; Ying Xu (2023). Data_Sheet_1_Quantitative Estimation of Oxidative Stress in Cancer Tissue Cells Through Gene Expression Data Analyses.PDF [Dataset]. http://doi.org/10.3389/fgene.2020.00494.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Liyang Liu; Haining Cui; Ying Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quantitative assessment of the intracellular oxidative stress level is a very important problem since it is the basis for elucidation of the fundamental causes of metabolic changes in diseased human cells, particularly cancer. However, the problem proves to be very challenging to solve in vivo because of the complex nature of the problem. Here a computational method is presented for predicting the quantitative level of the intracellular oxidative stress in cancer tissue cells. The basic premise of the predictor is that the genomic mutation level is strongly associated with the intracellular oxidative stress level. Based on this, a statistical analysis is conducted to identify a set of enzyme-encoding genes, whose combined expression levels can well explain the mutation rates in individual cancer tissues in the TCGA database. We have assessed the validity of the predictor by assessing it against genes that are known to have anti-oxidative functions for specific types of oxidative stressors. Then the applications of the predictor are conducted to illustrate its utility.

  20. f

    Data Sheet 1_Integrative analysis of DNA methylation, RNA sequencing, and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee, Jae Kwan; Kim, Eun Na; Ouh, Yung Taek; Hong, Jin Hwa; Cho, Hyun Woong; Chun, Yikyeong; Oh, Yoonji; Roh, Sanghyun; Kim, Hayeon; Kim, Chungyeul; Jeong, Sohyeon; Gim, Jeong-An (2025). Data Sheet 1_Integrative analysis of DNA methylation, RNA sequencing, and genomic variants in the cancer genome atlas (TCGA) to predict endometrial cancer recurrence.zip [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002057531
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Lee, Jae Kwan; Kim, Eun Na; Ouh, Yung Taek; Hong, Jin Hwa; Cho, Hyun Woong; Chun, Yikyeong; Oh, Yoonji; Roh, Sanghyun; Kim, Hayeon; Kim, Chungyeul; Jeong, Sohyeon; Gim, Jeong-An
    Description

    IntroductionThe prognosis within each subtype varies due to histological and molecular factors. This study leverages omics datasets and machine learning to identify biomarkers associated with EC recurrence in different molecular subtypes.MethodsUtilizing DNA methylation, RNA-sequencing, and common variant data from 116 EC samples in The Cancer Genome Atlas (TCGA), differentially expressed genes (DEGs) and differentially methylated regions (DMRs) were identified using t-tests between recurrence and non-recurrence groups. These were visualized through volcano plots and heat maps, while decision trees and random forests classified and stratified the samples.ResultsA machine learning analysis combined with box plots showed that in the copy number-high (CN-H) recurrence group, PARD6G-AS1 had decreased methylation, CSMD1 had increased methylation, and TESC expression was higher than the non-recurrence group. In the copy number-low (CN-L) recurrence group, CD44 expression was elevated. Further validation using TCGA clinical data confirmed PARD6G-AS1 hypomethylation and CD44 overexpression as significant indicators of recurrence (p=0.006 and p=0.02, respectively), and both were linked to advanced stage and lymph node metastasis.ConclusionThe study concludes that PARD6G-AS1 hypomethylation and CD44 overexpression are potential predictors of recurrence in CN-H and CN-L EC patients, respectively.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Neil Jairath (2023). Advanced Data Analysis TCGA Data [Dataset]. http://doi.org/10.17632/zdkh23d9y6.1

Advanced Data Analysis TCGA Data

Explore at:
Dataset updated
Oct 9, 2023
Authors
Neil Jairath
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Publicly available data from The Cancer Genome Atlas used to generate insights about genomic influence on clinical outcomes.

Search
Clear search
Close search
Google apps
Main menu