100+ datasets found

c
The Cancer Genome Atlas Stomach Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
h
TCGA-Cancer-Variant-and-Clinical-Data
huggingface.co
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 10, 2024
Dataset authored and provided by
Seq-to-Pheno
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
TCGA Cancer Variant and Clinical Data

Dataset Description

This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

Dataset Summary

The dataset includes:

Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.
f
The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
figshare.com
xlsx
Updated Feb 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5851743.v1
Dataset updated
Feb 2, 2018
Dataset provided by
figshare
Authors
Namshik Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).
c
The Cancer Genome Atlas Prostate Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Prostate Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
h
Data from: TCGA
huggingface.co
Updated Aug 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lab-Rasool (2024). TCGA [Dataset]. https://huggingface.co/datasets/Lab-Rasool/TCGA
Explore at:
Dataset updated
Aug 15, 2024
Dataset authored and provided by
Lab-Rasool
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, and slide images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron and UNI.

Curated by: Lab Rasool Language(s) (NLP): English

Uses

from… See the full description on the dataset page: https://huggingface.co/datasets/Lab-Rasool/TCGA.
TCGA Glioblastoma Data
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Way; Gregory Way (2020). TCGA Glioblastoma Data [Dataset]. http://doi.org/10.5281/zenodo.60304
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.60304
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gregory Way; Gregory Way
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
All data was downloaded from UCSC Xena on 16 August 2016. The compressed folder includes the following files:

GBM_clinicalMatrix - tab separated file with 629 samples measured by 139 variables. Refer to the source for more details.

SOURCE: https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?dataset=TCGA.GBM.sampleMap/GBM_clinicalMatrix&host=https://tcga.xenahubs.net

HT_HG-U133A - tab separated gene expression (Affy U133A microarry) file with 539 samples measured by 12,043 genes. Refer to the source for more details.

SOURCE: https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?dataset=TCGA.GBM.sampleMap/HT_HG-U133A&host=https://tcga.xenahubs.net
c
The Cancer Genome Atlas Ovarian Cancer Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.
n
Multi-Omic Pan-Cancer data from TCGA.
narcis.nl
data.mendeley.com
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gonzalez reymundez, A (via Mendeley Data) (2021). Multi-Omic Pan-Cancer data from TCGA. [Dataset]. http://doi.org/10.17632/r8p67nfjc8.3
Explore at:
Unique identifier
https://doi.org/10.17632/r8p67nfjc8.3
Dataset updated
Jan 11, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
gonzalez reymundez, A (via Mendeley Data)
Description
Data consist of three omic blocks from The Cancer Genome Atlas (TCGA), containing whole-genome profiles of

-Gene expression (file GE.RData), -DNA methylation (file METH.RData), and -Copy number variants (file CNV.RData).

Omic profiles consist of information from 5,408 tumor samples across 33 cancer types (as matrix rows), and 60,112 features (expression of 20,319 genes, methylation of 28,241 CpG islands, and copy number variant intensity for 11,552 genes). GE profiles by sample corresponded with the logarithm of RNA-Seq counts by gene (Illumina HiSeq RNA V2 platform). METH profiles corresponded with CpG sites B-values from the Illumina HM450 platform, summarized at the CpG island level, using the maximum connectivity approach from the WGCNA R package (Langfelder and Horvath 2008) , and further transformed into M-values (M=beta/(1-beta); Du et al. 2010). Omic blocks were adjusted for batch and tissue specific effects (see Gonzalez-Reymundez and Vazquez (2020) and references therein for further details on quality controls and data edition).
TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar; Swati Baskiyar (2023). TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data [Dataset]. http://doi.org/10.5281/zenodo.8190146
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8190146
Dataset updated
Jul 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Swati Baskiyar; Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about KIRC. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
Information on molecular subtypes for TCGA cancer studies as provided by the...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Mounir; Marta Lucchetta; Tiago C. Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo (2023). Information on molecular subtypes for TCGA cancer studies as provided by the TCGA_MolecularSubtype function. [Dataset]. http://doi.org/10.1371/journal.pcbi.1006701.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006701.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mohamed Mounir; Marta Lucchetta; Tiago C. Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information on molecular subtypes for TCGA cancer studies as provided by the TCGA_MolecularSubtype function.
TCGA Pan-Cancer sample, expression, and mutation data for Project Cognoma
figshare.com
txt
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Himmelstein; Gregory Way; Claire McLeod; Stephen Shank; Casey Greene (2023). TCGA Pan-Cancer sample, expression, and mutation data for Project Cognoma [Dataset]. http://doi.org/10.6084/m9.figshare.3487685.v7
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3487685.v7
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Daniel Himmelstein; Gregory Way; Claire McLeod; Stephen Shank; Casey Greene
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The following datasets were created for Project Cognoma:expression-matrix.tsv.bz2 is a sample × gene matrix indicating a gene's expression level for a given sample. This dataset will be the feature/x/predictor information for Project Cognoma.expression-genes.tsv provides information and summary statistics for every gene in expression-matrix.tsv.bz2.mutation-matrix.tsv.bz2 is a sample × gene matrix indicating whether a gene is mutated for a given sample. Select columns (or unions of several columns) in this dataset will be the status/y/outcome for Project Cognoma.mutation-genes.tsv provides information and summary statistics for every gene in mutation-matrix.tsv.bz2.samples.tsv is a sample × attribute matrix providing sample information and clinical measures for each sample.covariates.tsv is a sample × attribute matrix for modeling that encodes categorical variables in samples.tsv using dummies.All datasets contain the same samples as rows (in the same order). No two samples correspond to the same patient.The data was retrieved from the UCSC Xena Browser.These datasets were created by the GitHub repository commit below. See the download directory of the cancer-data repository for metadata files with the version info for the Xena downloads this release is based on.See the data/subset directory of the cancer-data repository on GitHub to browse small subsets of the expression and mutation datasets.
TCGA Chemotherapy Response Dataset
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Nov 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dalibor Hrg; Dalibor Hrg; Balthasar Huber; Lukas A. Huber; Balthasar Huber; Lukas A. Huber (2021). TCGA Chemotherapy Response Dataset [Dataset]. http://doi.org/10.5281/zenodo.3719291
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3719291
Dataset updated
Nov 1, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dalibor Hrg; Dalibor Hrg; Balthasar Huber; Lukas A. Huber; Balthasar Huber; Lukas A. Huber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset on chemotherapeutic drug responses in TCGA cancer patients, cross-referenced for a hit in TCIA.at database, consisting of clinical (TCGA), cancer tissue gene-expression (TCGA) and tumor-immunome (TCIA) features. The dataset consists of 5 common chemotherapy agents, 3 CRC agents (FOLFOX, 5FU, Oxaliplatin) and 2 Lung agents (Carboplatin, Cisplatin). FOLFOX as a combinational therapy or regimen, was compiled from timings of monotherapies given to patients and as such is a novel dataset derived from TCGA data. FOLFOX dataset is primarily firstline treatment, while other drugs are not to be interpreted as firstline treatments. Drug datasets are individually available in own CSV files.

Citation
Dalibor Hrg, Balthasar Huber, Lukas A. Huber. (2020). TCGA Chemotherapy Response Dataset. Zenodo. http://doi.org/10.5281/zenodo.3719291

The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

License
CC BY-SA 4.0 International https://creativecommons.org/licenses/by-sa/4.0. Authors take no liability for any use of this data.

Contributions
D. Hrg and B. Huber acknowledge major and equal work effort: data understanding, data science and dataset preparation (monotherapies and FOLFOX); L. A. Huber: help with dictionary of drug names and curration/cleaning of FOLFOX entries, clinical validation.

Contact & Maintenance
dalibor.hrg@gmail.com
dalibor.hrg@i-med.ac.at
c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Z
TCGA Glioblastoma Multiforme (GBM) Clinical Data
data.niaid.nih.gov
Updated Jul 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar (2023). TCGA Glioblastoma Multiforme (GBM) Clinical Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8190040
Explore at:
Dataset updated
Jul 29, 2023
Dataset authored and provided by
Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about GBM. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
TCGA Lower Grade Glioma (LGG) Clinical Data
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar; Swati Baskiyar (2023). TCGA Lower Grade Glioma (LGG) Clinical Data [Dataset]. http://doi.org/10.5281/zenodo.8190154
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8190154
Dataset updated
Jul 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Swati Baskiyar; Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset includes curated survival data from the Pan-cancer Atlas paper titled "An Integrated TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) to drive high quality survival outcome analytics". The paper highlights four types of carefully curated survival endpoints, and recommends the use of the endpoints of OS, PFI, DFI, and DSS for each TCGA cancer type. The dataset also includes phenotypic information about LGG. The Sample IDs are unique identifiers, which can be paired with the gene expression dataset.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The survival and phenotype data were merged into one file. Empty columns were removed. Columns with the same value for every sample were also removed.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

Liu, Jianfang, Caesar-Johnson, Samantha J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell, Volume 173, Issue 2, 400 - 416.e11. https://doi.org/10.1016/j.cell.2018.02.052

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

U-BRITE last update: 07/13/2023
c
The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
TCGA BRCA cancer dataset
zenodo.org
explore.openaire.eu
+2more
bin
Updated Dec 11, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Fernandez-Lozano; Carlos Fernandez-Lozano (2020). TCGA BRCA cancer dataset [Dataset]. http://doi.org/10.5281/zenodo.4309168
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4309168
Dataset updated
Dec 11, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlos Fernandez-Lozano; Carlos Fernandez-Lozano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Following the same steps that we used in the previous course we downloaded the TCGA-BRCA using R and Bioconductor and in particular the TCGABiolinks package. We downloaded transcriptome profiling of gene expression quantification where the experimental strategy is (RNAseq) and the workflow type is HTSeq-FPKM-UQ and only primary solid tumor data of the affymetrix GPL86 profile and clinical data.
TCGA Gene Expression Datasets
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swati Baskiyar; Swati Baskiyar (2023). TCGA Gene Expression Datasets [Dataset]. http://doi.org/10.5281/zenodo.8192916
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8192916
Dataset updated
Jul 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Swati Baskiyar; Swati Baskiyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. These datasets contain gene expression profiles of bladder urothelial carcinoma (BLCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), glioblastoma multiforme (GBM), head & neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), and lower grade glioma (LGG).

The gene expression profiles for BLCA, CESC, HNSC, KIRC, and LGG were measured experimentally using the Illumina HiSeq 2000 RNA Sequencing platform by the University of North Carolina TCGA genome characterization center. The gene expression profile of the GBM dataset was measured experimentally using the Affymetrix HT Human Genome U133a microarray platform by the Broad Institute of MIT and Harvard University cancer genomic characterization center.

Inspiration:

This dataset was uploaded to UBRITE for GTKB project.

Instruction:

The log₂(x+1) normalization was removed, and z-normalization was performed on the BLCA, CESC, HNSC, KIRC, and LGG datasets.

The log₂(x) normalization was removed, and z-normalization was performed on the GBM dataset.

Acknowledgments:

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8.

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764.

U-BRITE last update: 07/13/2023
TCGA Pan-Cancer clinical Data
figshare.com
txt
Updated Apr 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuchen LIU (2020). TCGA Pan-Cancer clinical Data [Dataset]. http://doi.org/10.6084/m9.figshare.12083910.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12083910.v1
Dataset updated
Apr 5, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yuchen LIU
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA Pan-Cancer clinical Data, including PATIENT_ID, OS_MONTHS, OS_STATUS
f
clustering and survival analysis on multi-omics datasets
figshare.com
zip
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuting Lin (2024). clustering and survival analysis on multi-omics datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27613242.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27613242.v4
Dataset updated
Nov 8, 2024
Dataset provided by
figshare
Authors
Shuting Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
multi-omics data: the input data of the analysis, including miRNA, gene expression data, DNA methylation data, and survival outcome data. All the data were downloaded from TCGA.code: 1. data preprocessing. 2. clustering patients in each omics layer and performing Kaplan-Meier survival analysis to determine the association between patient clusters and survival outcomes. 3. differential expression analysis to identify features that are associated with patients with consistent survival outcomes.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2016). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM

The Cancer Genome Atlas Stomach Adenocarcinoma Collection

TCGA-STAD

Explore at:

12 scholarly articles cite this dataset (View in Google Scholar)

dicom, n/aAvailable download formats

Unique identifier

https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM

Dataset updated

Jan 5, 2016

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

May 29, 2020

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

CIP TCGA Radiology Initiative

Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

Clear search

Close search

Google apps

Main menu

The Cancer Genome Atlas Stomach Adenocarcinoma Collection

CIP TCGA Radiology Initiative

TCGA-Cancer-Variant-and-Clinical-Data

The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

The Cancer Genome Atlas Prostate Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Data from: TCGA

TCGA Glioblastoma Data

The Cancer Genome Atlas Ovarian Cancer Collection

CIP TCGA Radiology Initiative

Multi-Omic Pan-Cancer data from TCGA.

TCGA Kidney Renal Clear Cell Carcinoma (KIRC) Clinical Data

Information on molecular subtypes for TCGA cancer studies as provided by the...

TCGA Pan-Cancer sample, expression, and mutation data for Project Cognoma

TCGA Chemotherapy Response Dataset

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

TCGA Glioblastoma Multiforme (GBM) Clinical Data

TCGA Lower Grade Glioma (LGG) Clinical Data

The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection

CIP TCGA Radiology Initiative

TCGA BRCA cancer dataset

TCGA Gene Expression Datasets

TCGA Pan-Cancer clinical Data

clustering and survival analysis on multi-omics datasets

The Cancer Genome Atlas Stomach Adenocarcinoma Collection

TCGA-STAD

CIP TCGA Radiology Initiative