47 datasets found

r
Genomic Data Commons Data Portal (GDC Data Portal)
rrid.site
scicrunch.org
+2more
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014514
Dataset updated
May 24, 2025
Description
A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.
Historical NCI Genomic Data Commons data (09-14-2017)
zenodo.org
data.niaid.nih.gov
tsv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
Explore at:
tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1186945
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inge Seim; Inge Seim
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll Identifiers All Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples
List of all reprocessed vs. reprocessed differentially expressed genes...
plos.figshare.com
csv
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung (2025). List of all reprocessed vs. reprocessed differentially expressed genes (DEGs) comparing tumor data from the GDC and normal data from the GTEx. [Dataset]. http://doi.org/10.1371/journal.pone.0318676.s004
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318676.s004
Dataset updated
Mar 4, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reprocessed counts were generated using our GDC RNA-seq workflow implementation. NA rank changes indicate the DEG cannot be found in the other DEG list. (CSV)
b
Genomic Data Commons Data Portal
bioregistry.io
Updated Apr 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Genomic Data Commons Data Portal [Dataset]. https://bioregistry.io/gdc
Explore at:
Dataset updated
Apr 23, 2021
Description
The GDC Data Portal is a robust data-driven platform that allows cancer researchers and bioinformaticians to search and download cancer data for analysis.
Information on molecular subtypes for TCGA cancer studies as provided by the...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Mounir; Marta Lucchetta; Tiago C. Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo (2023). Information on molecular subtypes for TCGA cancer studies as provided by the TCGA_MolecularSubtype function. [Dataset]. http://doi.org/10.1371/journal.pcbi.1006701.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006701.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mohamed Mounir; Marta Lucchetta; Tiago C. Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information on molecular subtypes for TCGA cancer studies as provided by the TCGA_MolecularSubtype function.
f
Table 1_TCGADownloadHelper: simplifying TCGA data extraction and...
frontiersin.figshare.com
pdf
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Anke Baumann; Olaf Wolkenhauer; Markus Wolfien (2025). Table 1_TCGADownloadHelper: simplifying TCGA data extraction and preprocessing.pdf [Dataset]. http://doi.org/10.3389/fgene.2025.1569290.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2025.1569290.s001
Dataset updated
May 2, 2025
Dataset provided by
Frontiers
Authors
Alexandra Anke Baumann; Olaf Wolkenhauer; Markus Wolfien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cancer Genome Atlas (TCGA) provides comprehensive genomic data across various cancer types. However, complex file naming conventions and the necessity of linking disparate data types to individual case IDs can be challenging for first-time users. While other tools have been introduced to facilitate TCGA data handling, they lack a straightforward combination of all required steps. To address this, we developed a streamlined pipeline using the Genomic Data Commons (GDC) portal’s cart system for file selection and the GDC Data Transfer Tool for data downloads. We use the Sample Sheet provided by the GDC portal to replace the default 36-character opaque file IDs and filenames with human-readable case IDs. We developed a pipeline integrating customizable Python scripts in a Jupyter Notebook and a Snakemake pipeline for ID mapping along with automating data preprocessing tasks (https://github.com/alex-baumann-ur/TCGADownloadHelper). Our pipeline simplifies the data download process by modifying manifest files to focus on specific subsets, facilitating the handling of multimodal data sets related to single patients. The pipeline essentially reduced the effort required to preprocess data. Overall, this pipeline enables researchers to efficiently navigate the complexities of TCGA data extraction and preprocessing. By establishing a clear step-by-step approach, we provide a streamlined methodology that minimizes errors, enhances data usability, and supports the broader utilization of TCGA data in cancer research. It is particularly beneficial for researchers new to genomic data analysis, offering them a practical framework prior to conducting their TCGA studies.
c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
dev.cancerimagingarchive.net
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
c
The Cancer Genome Atlas Breast Invasive Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Breast Invasive Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.
Genes with relative change ((v32 - v15)/v15) over 100 when comparing the GDC...
plos.figshare.com
xls
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung (2025). Genes with relative change ((v32 - v15)/v15) over 100 when comparing the GDC Data Release version 15 to version 32. [Dataset]. http://doi.org/10.1371/journal.pone.0318676.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318676.t001
Dataset updated
Mar 4, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Genes with relative change ((v32 - v15)/v15) over 100 when comparing the GDC Data Release version 15 to version 32.
Metadata and data files supporting the published article: The therapeutic...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum (2023). Metadata and data files supporting the published article: The therapeutic response of ER+/HER2- breast cancers differs according to the molecular Basal or Luminal subtype [Dataset]. http://doi.org/10.6084/m9.figshare.11558676.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11558676.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here, the authors performed an in-silico analysis on a meta-dataset including gene-expression data from 5,342 clinically defined estrogen receptor-positive/ human epidermal growth factor receptor 2-negative (ER+/HER2-) breast cancers (BC), and DNA copy number/mutational and proteomic data, to determine whether the therapeutic response of ER+/HER2- breast cancers differs according to the molecular basal or luminal subtype.Data access: The dataset Breast_cancer_classifications.csv supporting figure 1, table 1, and supplementary tables 1-3 is publicly available in the figshare repository as part of this data record. This study used and analysed 36 publicly available datasets that are all listed in Supplementary table 8 and are cited from the data availability statement of the published article.Study aims and methodology: To evaluate the response and/or potential vulnerability to hormone treatment (HT) and other systemic therapies of BC, and to assess the degree of difference between basal and luminal breast cancer subtypes, the authors performed an in-silico analysis of a meta-dataset including gene expression data from 8,982 non-redundant BCs and DNA copy number/mutational and proteomic data from TCGA. The aim was to compare the Basal versus Luminal samples. Out of the 8,982 samples of the database, 6,563 were defined as ER+ (5,342 according to immunohistochemistry (IHC) and 1,221 according to inferred stratus).The authors analysed breast cancer gene expression data pooled from 36 public datasets (the publicly available datasets are listed in supplementary table 8), comprising 8,982 invasive primary BCs. The pre-analytic data processing was done as described previously in https://doi.org/10.1038/s41416-018-0309-1. Please refer to the published article for more details on the methodology and statistical analysis.Data supporting the figures, tables and supplementary tables in the published article: Data supporting figure 1, table 1, and supplementary tables 1-3: Dataset Breast_cancer_classifications.csv is in .csv file format. The dataset includes histo-clinical and molecular data of the tumors analysed in study, and is part of this data record.Data supporting supplementary table 4: Dataset genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.tar.gz.1 is a tar archive gz compressed of maf format files. This dataset was accessed through the Genomic Data Commons (GDC) Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/afaf2790-04d4-453a-8c1b-75cf42ffd35f.Data supporting supplementary table 5: Dataset gdc_manifest.txt consists of gz archives of txt format files. The file was accessed through the GDC Data Portal here : https://portal.gdc.cancer.gov/repository?facetTab=files&filters={"op":"and","content":[{"op":"in","content":{"field":"cases.project.project_id","value":["TCGA-BRCA"]}},{"op":"in","content":{"field":"files.access","value":["open"]}},{"op":"in","content":{"field":"files.analysis.workflow_type","value":["HTSeq - Counts"]}},{"op":"in","content":{"field":"files.experimental_strategy","value":["RNA-Seq"]}}]}&searchTableTab=filesData supporting supplementary table 6: Dataset Table S5_Revised.xlsx is in .xlsx file format and is part of the supplementary information files of the published article.Data supporting supplementary table 7: Dataset BRCA.RPPA.Level_3.tar is a tar archive of txt format files. The file was accessed through the GDC Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/85988e1b-4f7d-493e-96ae-9eee61ac2833.
c
The Cancer Genome Atlas Stomach Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
f
Comparison of the top 10 differentially expressed genes inferred from...
plos.figshare.com
xls
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung (2025). Comparison of the top 10 differentially expressed genes inferred from concatenation of published counts (“published vs published”) versus those inferred from harmonized uniform GDC re-processing (“reprocessed vs reprocessed”). [Dataset]. http://doi.org/10.1371/journal.pone.0318676.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318676.t002
Dataset updated
Mar 4, 2025
Dataset provided by
PLOS ONE
Authors
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the top 10 differentially expressed genes inferred from concatenation of published counts (“published vs published”) versus those inferred from harmonized uniform GDC re-processing (“reprocessed vs reprocessed”).
TCGA-LUAD
kaggle.com
opendatalab.com
zip
Updated Jul 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nahin Kumar Dey (2021). TCGA-LUAD [Dataset]. https://www.kaggle.com/nahin333/tcgaluad
Explore at:
zip(10283785426 bytes)Available download formats
Dataset updated
Jul 28, 2021
Authors
Nahin Kumar Dey
Description
The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD
f
The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
figshare.com
xlsx
Updated Feb 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5851743.v1
Dataset updated
Feb 2, 2018
Dataset provided by
figshare
Authors
Namshik Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).
c
The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
f
List of 625 false positive genes resulted from comparing GTEx published...
figshare.com
csv
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung (2025). List of 625 false positive genes resulted from comparing GTEx published counts versus GTEx reprocessed counts. [Dataset]. http://doi.org/10.1371/journal.pone.0318676.s002
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318676.s002
Dataset updated
Mar 4, 2025
Dataset provided by
PLOS ONE
Authors
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of 625 false positive genes resulted from comparing GTEx published counts versus GTEx reprocessed counts.
c
The Cancer Genome Atlas Colon Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
ISB-CGC Cancer Gateway in the Cloud
console.cloud.google.com
Updated Mar 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:ISB%20Cancer%20Gateway&hl=de&inv=1&invt=Ab2xug (2023). ISB-CGC Cancer Gateway in the Cloud [Dataset]. https://console.cloud.google.com/marketplace/product/gcp-public-data-isb-cgc/isb-cgc-cancer-data?hl=de
Explore at:
Dataset updated
Mar 19, 2023
Dataset provided by
Googlehttp://google.com/
Description
The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three National Cancer Institute (NCI) Cloud Resources tasked with bringing cancer data and computation power together through cloud platforms. It is a collaboration between the Institute for Systems Biology (ISB) and General Dynamics Information Technology Inc. (GDIT). Since starting in 2014 as part of NCI’s Cloud Pilot Resource initiative, ISB-CGC has provided access to increasing amounts of cancer data in the cloud. In Google BigQuery, ISB-CGC stores high-level clinical, biospecimen, genomic and proteomic cancer research data obtained from the NCI Genomic Data Commons (GDC) and Proteomics Data Commons (PDC). It also stores a large amount of metadata about files that are stored in the GDC Google Cloud Storage, as well as genome reference sources (e.g. GENCODE, miRBase, etc.). The majority of these datasets and tables are completely open access and available to the research community. ISB-CGC has consolidated the data by research program and data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, etc.) and transformed it into ISB-CGC Google BigQuery tables for ease of access and analysis. This novel approach allows users to quickly analyze information from thousands of patients. The ISB-CGC BigQuery Table Search UI is a discovery tool that allows users to explore and search for ISB-CGC hosted BigQuery tables. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
f
Comparison of counts resulting from running our GDC RNA-seq workflow...
figshare.com
xlsx
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung (2025). Comparison of counts resulting from running our GDC RNA-seq workflow implementation (reprocessed counts) to GDC published counts. [Dataset]. http://doi.org/10.1371/journal.pone.0318676.s005
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318676.s005
Dataset updated
Mar 4, 2025
Dataset provided by
PLOS ONE
Authors
Ling-Hong Hung; Bryce Fukuda; Robert Schmitz; Varik Hoang; Wes Lloyd; Ka Yee Yeung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There are three sheets in this spreadsheet file, corresponding to each of the three samples (TCGA-AB-2821, TCGA-AB-2828, TCGA-AB-2839). Correlation and RMSD between the reprocessed counts and published counts are included in each sheet. (XLSX)
c
The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection
dev.cancerimagingarchive.net
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.TYGKKFMQ
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.TYGKKFMQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Lung Squamous Cell Carcinoma (TCGA-LUSC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Lung Phenotype Research Group.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal

Explore at:

70 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_014514

Dataset updated

May 24, 2025

Description

A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

Clear search

Close search

Google apps

Main menu

Genomic Data Commons Data Portal (GDC Data Portal)

Historical NCI Genomic Data Commons data (09-14-2017)

List of all reprocessed vs. reprocessed differentially expressed genes...

Genomic Data Commons Data Portal

Information on molecular subtypes for TCGA cancer studies as provided by the...

Table 1_TCGADownloadHelper: simplifying TCGA data extraction and...

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

The Cancer Genome Atlas Breast Invasive Carcinoma Collection

CIP TCGA Radiology Initiative

Genes with relative change ((v32 - v15)/v15) over 100 when comparing the GDC...

Metadata and data files supporting the published article: The therapeutic...

The Cancer Genome Atlas Stomach Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Comparison of the top 10 differentially expressed genes inferred from...

TCGA-LUAD

The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection

CIP TCGA Radiology Initiative

List of 625 false positive genes resulted from comparing GTEx published...

The Cancer Genome Atlas Colon Adenocarcinoma Collection

CIP TCGA Radiology Initiative

ISB-CGC Cancer Gateway in the Cloud

Comparison of counts resulting from running our GDC RNA-seq workflow...

The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection

CIP TCGA Radiology Initiative

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal