A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).
TCGA-COAD.GDC_phenotype.tsv
dataset: phenotype - Phenotype
cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata samples570 version11-27-2017 hubhttps://gdc.xenahubs.net type of dataphenotype authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90 raw datahttps://api.gdc.cancer.gov/data/ input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix) 570 samples X 151 identifiersAll IdentifiersAll Samples
TCGA-COAD.htseq_fpkm-uq.tsv
dataset: gene expression RNAseq - HTSeq - FPKM-UQ
cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata samples512 version09-14-2017 hubhttps://gdc.xenahubs.net type of datagene expression RNAseq unitlog2(fpkm-uq+1) platformIllumina ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80 raw datahttps://api.gdc.cancer.gov/data/ wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed. input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix) 60,484 identifiers X 512 samples
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reprocessed counts were generated using our GDC RNA-seq workflow implementation. NA rank changes indicate the DEG cannot be found in the other DEG list. (CSV)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the top 10 differentially expressed genes inferred from concatenation of published counts (“published vs published”) versus those inferred from harmonized uniform GDC re-processing (“reprocessed vs reprocessed”).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are three sheets in this spreadsheet file, corresponding to each of the three samples (TCGA-AB-2821, TCGA-AB-2828, TCGA-AB-2839). Correlation and RMSD between the reprocessed counts and published counts are included in each sheet. (XLSX)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file includes the pointer to the 42 patient ids and zip file names of the 84 genomic and proteomic datasets used for the paper "Gil, Y, Garijo, D, Ratnakar, V, Mayani, R, Adusumilli, A, Srivastava, R, Boyce, H, Mallick,P. Towards Continuous Scientific Data Analysis and Hypothesis Evolution", accepted in AAAI 2017.
The datasets itself are not published due to their size and access conditions. They can be retrieved with the provided ids from TCGA (https://gdc-portal.nci.nih.gov/legacy-archive/search/f) and CPTAC (https://cptac-data-portal.georgetown.edu/cptac/s/S022) archives.
These patient ids are a subset of the nearly 90 samples used in "Zhang, B., Wang, J., Wang, X., Zhu, J., Liu, Q., et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513,382–387", in order to test the system described in the AAAI 2017 paper. More samples were not included in the analysis due to time constraints.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of 625 false positive genes resulted from comparing GTEx published counts versus GTEx reprocessed counts.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Lung cancer has inherited susceptibility and show familial aggregation, the characteristics of familial lung cancer exhibit population heterogeneity. Despite previous studies, familial lung cancer in China's Yunnan-Guizhou plateau remains understudied.Methods: Between 2015 and 2017, 1,023 lung cancer patients (residents of Yunnan-Guizhou plateau) were enrolled with no limitation on other parameters, 152 subjects had familial lung cancer. Clinicopathologic parameters were analyzed and compared, 4,754 lung cancer patients from NCI-GDC were used to represent a general population.Results: Familial lung cancer (FLC) subjects showed unique characters: early-onset; increased rate of female, adenocarcinoma, stage IV and other cancer history; unbalance in anatomic sites; all ruling out significant difference in smoking status. Unbalanced distribution of co-existing diseases or symptoms was also discovered. FLC patients were more likely to develop benign lesions (polyps, nodules, cysts) early in life, especially early-growth of multiple pulmonary nodules at higher frequency. Typical diseases with family history like diabetes and hypertension were also increased in FLC population. Compared to GDC data, our subject population was younger: the age peak of our FLC group was in 50–59; our sporadic group had an age peak around 60; while GDC patients' age peak was in 60–69. Importantly, the biggest difference happened in age 40–49: our FLC group and sporadic group had 3 times and 2 times higher ratio than GDC population, respectively. Moreover, the age peaks of our FLC males and FLC females were both in 50–59; while our sporadic females had the age peak in 50–59, much earlier than sporadic males (around 60–69); reflecting gender-specific or age-specific characters in our subject population.Conclusions: Familial lung cancer in China's Yunnan-Guizhou plateau showed unique clinicopathologic characters, differences were found in gender, age, histologic type, TNM stage and co-existing diseases or symptoms. Identification of hereditary factors which lead to increased lung cancer risk will be a challenge of both scientific and clinical significance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A comparison for the precision values of the top 15 ranked genes related to each cancer type by each centrality measure and against NCI’s GDC and by each approach.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.