21 datasets found

Z
Historical NCI Genomic Data Commons data (09-14-2017)
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seim, Inge (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1186944
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Seim, Inge
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata samples570 version11-27-2017 hubhttps://gdc.xenahubs.net type of dataphenotype authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90 raw datahttps://api.gdc.cancer.gov/data/ input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix) 570 samples X 151 identifiersAll IdentifiersAll Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata samples512 version09-14-2017 hubhttps://gdc.xenahubs.net type of datagene expression RNAseq unitlog2(fpkm-uq+1) platformIllumina ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80 raw datahttps://api.gdc.cancer.gov/data/ wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed. input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix) 60,484 identifiers X 512 samples
r
Genomic Data Commons Data Portal (GDC Data Portal)
rrid.site
scicrunch.org
+2more
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014514
Dataset updated
May 24, 2025
Description
A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.
Metadata and data files supporting the published article: The therapeutic...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum (2023). Metadata and data files supporting the published article: The therapeutic response of ER+/HER2- breast cancers differs according to the molecular Basal or Luminal subtype [Dataset]. http://doi.org/10.6084/m9.figshare.11558676.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11558676.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here, the authors performed an in-silico analysis on a meta-dataset including gene-expression data from 5,342 clinically defined estrogen receptor-positive/ human epidermal growth factor receptor 2-negative (ER+/HER2-) breast cancers (BC), and DNA copy number/mutational and proteomic data, to determine whether the therapeutic response of ER+/HER2- breast cancers differs according to the molecular basal or luminal subtype.Data access: The dataset Breast_cancer_classifications.csv supporting figure 1, table 1, and supplementary tables 1-3 is publicly available in the figshare repository as part of this data record. This study used and analysed 36 publicly available datasets that are all listed in Supplementary table 8 and are cited from the data availability statement of the published article.Study aims and methodology: To evaluate the response and/or potential vulnerability to hormone treatment (HT) and other systemic therapies of BC, and to assess the degree of difference between basal and luminal breast cancer subtypes, the authors performed an in-silico analysis of a meta-dataset including gene expression data from 8,982 non-redundant BCs and DNA copy number/mutational and proteomic data from TCGA. The aim was to compare the Basal versus Luminal samples. Out of the 8,982 samples of the database, 6,563 were defined as ER+ (5,342 according to immunohistochemistry (IHC) and 1,221 according to inferred stratus).The authors analysed breast cancer gene expression data pooled from 36 public datasets (the publicly available datasets are listed in supplementary table 8), comprising 8,982 invasive primary BCs. The pre-analytic data processing was done as described previously in https://doi.org/10.1038/s41416-018-0309-1. Please refer to the published article for more details on the methodology and statistical analysis.Data supporting the figures, tables and supplementary tables in the published article: Data supporting figure 1, table 1, and supplementary tables 1-3: Dataset Breast_cancer_classifications.csv is in .csv file format. The dataset includes histo-clinical and molecular data of the tumors analysed in study, and is part of this data record.Data supporting supplementary table 4: Dataset genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.tar.gz.1 is a tar archive gz compressed of maf format files. This dataset was accessed through the Genomic Data Commons (GDC) Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/afaf2790-04d4-453a-8c1b-75cf42ffd35f.Data supporting supplementary table 5: Dataset gdc_manifest.txt consists of gz archives of txt format files. The file was accessed through the GDC Data Portal here : https://portal.gdc.cancer.gov/repository?facetTab=files&filters={"op":"and","content":[{"op":"in","content":{"field":"cases.project.project_id","value":["TCGA-BRCA"]}},{"op":"in","content":{"field":"files.access","value":["open"]}},{"op":"in","content":{"field":"files.analysis.workflow_type","value":["HTSeq - Counts"]}},{"op":"in","content":{"field":"files.experimental_strategy","value":["RNA-Seq"]}}]}&searchTableTab=filesData supporting supplementary table 6: Dataset Table S5_Revised.xlsx is in .xlsx file format and is part of the supplementary information files of the published article.Data supporting supplementary table 7: Dataset BRCA.RPPA.Level_3.tar is a tar archive of txt format files. The file was accessed through the GDC Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/85988e1b-4f7d-493e-96ae-9eee61ac2833.
f
Table 1_TCGADownloadHelper: simplifying TCGA data extraction and...
frontiersin.figshare.com
pdf
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Anke Baumann; Olaf Wolkenhauer; Markus Wolfien (2025). Table 1_TCGADownloadHelper: simplifying TCGA data extraction and preprocessing.pdf [Dataset]. http://doi.org/10.3389/fgene.2025.1569290.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2025.1569290.s001
Dataset updated
May 2, 2025
Dataset provided by
Frontiers
Authors
Alexandra Anke Baumann; Olaf Wolkenhauer; Markus Wolfien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cancer Genome Atlas (TCGA) provides comprehensive genomic data across various cancer types. However, complex file naming conventions and the necessity of linking disparate data types to individual case IDs can be challenging for first-time users. While other tools have been introduced to facilitate TCGA data handling, they lack a straightforward combination of all required steps. To address this, we developed a streamlined pipeline using the Genomic Data Commons (GDC) portal’s cart system for file selection and the GDC Data Transfer Tool for data downloads. We use the Sample Sheet provided by the GDC portal to replace the default 36-character opaque file IDs and filenames with human-readable case IDs. We developed a pipeline integrating customizable Python scripts in a Jupyter Notebook and a Snakemake pipeline for ID mapping along with automating data preprocessing tasks (https://github.com/alex-baumann-ur/TCGADownloadHelper). Our pipeline simplifies the data download process by modifying manifest files to focus on specific subsets, facilitating the handling of multimodal data sets related to single patients. The pipeline essentially reduced the effort required to preprocess data. Overall, this pipeline enables researchers to efficiently navigate the complexities of TCGA data extraction and preprocessing. By establishing a clear step-by-step approach, we provide a streamlined methodology that minimizes errors, enhances data usability, and supports the broader utilization of TCGA data in cancer research. It is particularly beneficial for researchers new to genomic data analysis, offering them a practical framework prior to conducting their TCGA studies.
b
Genomic Data Commons Data Portal
bioregistry.io
registry.identifiers.org
Updated Apr 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Genomic Data Commons Data Portal [Dataset]. https://bioregistry.io/gdc
Explore at:
Dataset updated
Apr 23, 2021
Description
The GDC Data Portal is a robust data-driven platform that allows cancer researchers and bioinformaticians to search and download cancer data for analysis.
Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)
zenodo.org
explore.openaire.eu
application/gzip, csv +1
Updated Dec 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI (2020). Pan-cancer Aberrant Pathway Activity Analysis (PAPAA) [Dataset]. http://doi.org/10.5281/zenodo.3629709
Explore at:
application/gzip, tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3629709
Dataset updated
Dec 5, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information about the dataset files:

1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046]

2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046]

4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/]

6) vogelstein_cancergenes.tsv: compendium of OG and TSG used for the analysis. [https://github.com/greenelab/pancancer/]

7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt]

8) ccle_rnaseq_genes_rpkm_20180929.gct.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz]

9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct]

10) GDSC_cell_lines_EXP_CCLE_names.csv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip]

11) GDSC_CCLE_common_mut_cnv_binary.csv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct and a list of common cell lines.

12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx]

13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx]

14) compounds.csv: list of pharmacological compounds tested for our analysis

15) tcga_dictonary.tsv: list of cancer types used in the analysis.

16) seg_based_scores.tsv: Measurement of total copy number burden, Percent of genome altered by copy number alterations. This file was used as part of the Pancancer analysis by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/]
f
The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
figshare.com
xlsx
Updated Feb 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5851743.v1
Dataset updated
Feb 2, 2018
Dataset provided by
figshare
Authors
Namshik Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).
hCINAP expression in colorectal cancer
figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yapeng Ji; zefang Zhang; zemin Zhang; xiaofeng Zheng (2023). hCINAP expression in colorectal cancer [Dataset]. http://doi.org/10.6084/m9.figshare.4737181.v3
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4737181.v3
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yapeng Ji; zefang Zhang; zemin Zhang; xiaofeng Zheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The authors declare that the data analysis processes supporting the findings of this study are available within the article and its Supplementary Information files. The TCGA gene expression profile data, as recomputed based on gencode v23, were downloaded from UCSC Xena (http://xena.ucsc.edu/). The TCGA clinical data were downloaded from the GDC Data Portal (https://gdc-portal.nci.nih.gov/), with accession number phs000178.v9.p8 in dbGap. Supplementary Information: For analyzing the hCINAP expression in CRC, we downloaded the recomputed TCGA gene expression datasets for COAD and READ cancer types from the UCSC Xena (http://xena.ucsc.edu/). The gene model was based on gencode v23, and the expression unit is TPM (Transcript per million). The clinical data were downloaded from the GDC Data Portal (https://gdc-portal.nci.nih.gov/).

For differential expression analysis, we compiled a selected sample set, including 367 tumor- and 51 normal-samples, in which each sample has information available for clinical variables such as gender, age and race (Supplementary Table1). For expression analysis by pathological stages, we only used those tumor samples with stage information (Supplementary Table1). The dataset used for profiling gene expression by CRC subtypes was compiled based on the results of consensus molecular subtypes (CMSs) described previously [PMID: 26457759] , containing 265 tumor samples (Supplementary Table1).
o
Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)
explore.openaire.eu
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3625200
Dataset updated
Jan 22, 2020
Authors
DANIEL BLANKENBERG; VIJAY NAGAMPALLI
Description
Information about the dataset files: 1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046] 2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046] 3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046] 4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046] 5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/] 6) cosmic_cancer_classification.tsv: Compendium of OG and TSG used for the analysis. Added additional genes from the cosmic database to volgelstein_cancer_classification.tsv [https://github.com/greenelab/pancancer/] 7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt] 8) ccle_rnaseq_genes_rpkm_20180929_mod.tsv.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz] 9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct] 10) GDSC_cell_lines_EXP_CCLE_names.tsv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip] 11) GDSC_CCLE_common_mut_cnv_binary.tsv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv and a list of common cell lines. 12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx] 13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx] 14) compounds_of_interest.txt: list of pharmacological compounds tested for our analysis, taken from ftp://ftp.sanger.ac.uk/pub4/cancerrxgen...
Multi-omic and survival datasets used for "DeepProg: an ensemble of...
figshare.com
application/x-gzip
Updated Jun 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olivier Poirion; Lana Garmire; Kumard Deep; Sijia Huang; Zheng Jing (2021). Multi-omic and survival datasets used for "DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data" [Dataset]. http://doi.org/10.6084/m9.figshare.14832813.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14832813.v1
Dataset updated
Jun 24, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Olivier Poirion; Lana Garmire; Kumard Deep; Sijia Huang; Zheng Jing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We obtained the 32 cancer multi-omic datasets from NCBI using TCGA portal (https://tcgadata.nci.nih.gov/tcga/). We used the package TCGA-Assembler (versions 2.0.5) and wrote custom scripts to download RNA-Seq (UNC IlluminaHiSeq RNASeqV2), miRNA Sequencing (BCGSC IlluminaHiSeq, Level 3), and DNA methylation (JHU-USC HumanMethylation450) data from the TCGA website on November 4-14th, 2017. We also obtained the survival information from the portal: https://portal.gdc.cancer.gov/. We used the same preprocessing steps as detailed in our previous study. We first downloaded RNA-Seq, miRNA-Seq and methylation data using the functions DownloadRNASeqData, DownloadmiRNASeqData, and DownloadMethylationData from TCGAAssembler, respectively. Then, we processed the data with the functions ProcessRNASeqData, ProcessmiRNASeqData, and ProcessMethylation450Data. In addition, we processed the methylation data with the function CalculateSingleValueMethylationData. Finally, for each omic data type, we created a gene-by-sample data matrix in the Tabular Separated Value (TSV) format using a custom script.
TCGA DATA
figshare.com
zip
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Songtao (2023). TCGA DATA [Dataset]. http://doi.org/10.6084/m9.figshare.23566341.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23566341.v1
Dataset updated
Jun 23, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Songtao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RNA-sequencing expression (level 3) profiles and corresponding clinical information for several tumors were downloaded from the TCGA dataset (https://portal.gdc.com).
Histological images for MSI vs. MSS classification in gastrointestinal...
zenodo.org
explore.openaire.eu
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Nikolas Kather; Jakob Nikolas Kather (2020). Histological images for MSI vs. MSS classification in gastrointestinal cancer, FFPE samples [Dataset]. http://doi.org/10.5281/zenodo.2530835
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2530835
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Nikolas Kather; Jakob Nikolas Kather
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains 411,890 unique image patches derived from histological images of colorectal cancer and gastric cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from formalin-fixed paraffin-embedded (FFPE) diagnostic slides ("DX" at the GDC data portal). This is explained well in this blog: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

Preprocessing

All SVS slides were preprocessed as follows

1. automatic detection of tumor

2. resizing to 224 px x 224 px at a resolution of 0.5 µm/px

4. color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

5. assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite instable or highly mutated)

6. randomization of patients to training and testing sets (~70% and ~30%). Randomization was done on a patient level rather than on a slide or tile level

7. equilibration of training sets by undersampling (removing excess tiles in MSS class in a random way)

File description

1. STAD_TRAIN_MSS - training images (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 50285 unique image patches; FFPE samples

2. STAD_TRAIN_MSIMUT - training images ( (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 50285 unique image patches; FFPE samples

3. STAD_TEST_MSS - test images (~30% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 90104 unique image patches; FFPE samples

4. STAD_TEST_MSIMUT - test images ( ~30% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 27904 unique image patches; FFPE samples

5. CRC_DX_TEST_MSIMUT - test images (~30% of all patients) for colorectal cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 29335 unique image patches; FFPE samples

6. CRC_DX_TEST_MSS - test images (~30% of all patients) for colorectal cancer TCGA patients with MSS (microsatellite stable) tumors, 70569 unique image patches; FFPE samples

7. CRC_DX_TRAIN_MSIMUT - training images (~70% of all patients) for colorectal cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 46704 unique image patches; FFPE samples

8. CRC_DX_TRAIN_MSS - training images (~70% of all patients) for colorectal cancer TCGA patients with MSS (microsatellite stable) tumors, 46704 unique image patches; FFPE samples
f
Identification of immune-related genes prognostic index for predicting...
figshare.com
xlsx
Updated Apr 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhong-Qing Liang (2022). Identification of immune-related genes prognostic index for predicting survival and immunotherapy in colorectal carcinoma [Dataset]. http://doi.org/10.6084/m9.figshare.19534810.v4
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19534810.v4
Dataset updated
Apr 7, 2022
Dataset provided by
figshare
Authors
Zhong-Qing Liang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1.Clinical DataTranscript data and Clinicopathological information was downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.Gdc.cancer.gov/), including 41 cases of para-tumor, 473 cases of CRC tumor and 452 clinical cases.

The survival and transcriptional data of 250 CRC cases were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The transcript dataset GSE161158 uploaded in November 2020 by Moffitt Cancer Research Center, University of Miami was used (10). Lists of immune related genes were download from ImmPort (https://www.immport. org/home) and Innate DB (https://www.innatedb.ca/). KEGG (http://www.gsea-msigdb.org/gsea/index.jsp) gene sets and all Gene Ontology (GO) gene sets were used as Gene Symbols. Gene mutation information was downloaded from cBioPortal (http://www.cbioportal.org/).2.Murine Data2.1 IRGPI Genes Expression in CRC Murine ModelSPF Balb/c male mice, 6 - 8 weeks old, body mass (20 ± 5) g, purchased from Huaxing Experimental Animal Farm of Huiji District (Zhengzhou City, China), experimental animal license NO. is SCXK (Yu) 2019-0002. All animal experiments was approved by the Experimental Mouse Ethics Committee of Nanjing University of Traditional Chinese Medicine (NO. 202010A026).The above mice were randomly divided into: Control group (C), CRC Model group (M)，10 per group. Five Balb/c male mice were taken as tumor-bearing mice, and 1×107 CT26 cells were subcutaneously injected into the left axilla, and sacrificed one week later. The subcutaneous tumor was removed under sterile conditions, placed in sterile PBS, and disintegrated into several 1 mm3 masses. Under sterile conditions, the two groups of mice were dissected to expose the colon, the 1 mm3 tumor mass was fixed to the colon of the CRC Model group with tissue glue, while nothing was fixed in the Control group, and then the abdomen of the two groups of mice was sutured. After 3 days of postoperative recovery, mice were weighed, and Micro-CT scans were performed on the 26th day (under the condition of isoflurane respiratory anesthesia), and on the 27th day, the mice were sacrificed after anesthesia with 2 % sodium pentobarbital. Total RNA was extracted from the colon of the Control group and the tumor tissue of the CRC Model group by FastPure Cell/Tissue Total RNA Isolation Kit (Vazyme, China, Cat#RC101-01) , and after reverse transcribed into cDNA using HiScript® Ⅲ RT SuperMix for q PCR (Vazyme, China, Cat#R323-01), Real-Time PCR was used to detect the expression of IRCPI genes in each group using BlastaqTM Green 2× qPCR MasterMix (abm, Canada, Cat#G891). The primer sequences are shown in Supplementary Table S4.2.2 Immune Infiltration in CRC murine model The liver, colon, tumor, and mesentery of paraffin-embedded mice were sectioned, and then stained with hematoxylin-eosin (HE staining), and photographed with an upright white light photographic microscope (Nikon, Japan, Eclipse Ci-L).

TIME immune cells were detected by flow cytometry (FCM). PBMCs were extracted by RBC lysate (FcMACS, China, Cat#FMS-RBC500). At least 5×106 cell suspensions(100 μL) were incubated with FC blocker at 4 ℃ for 10 min, then Anti-Human/Mouse CD11b FITC Antibody (PeproTech, USA, Cat#03221-50)、PE-Cy™7 Rat Anti-Mouse CD86 Antibody (BD Pharmingen™, USA, Cat#560582) and Alexa Fluor® 488 Anti-Mouse CD206 Antibody (Biolegend, USA, Cat#141710) were used to marked Macrophages; Alexa Fluor® 488 anti-mouse CD19 Antibody (Invitrogen, USA, REF#11-0193-81) and PE/Cy7 anti-mouse/rat/human CD27 Antibody (Biolegend, USA, Cat#124216) were used to marked B cells; Anti-Mouse CD4 APC-Cyanine7 (PeproTech, USA, Cat#06122-87)、Anti-Mouse CD8a FITC Antibody (PeproTech, USA, Cat#10122-50)、Anti-Mouse CD25 APC Antibody (PeproTech, USA, Cat#07312-80) and Anti-Mouse/Rat FOXP3 PE Antibody (PeproTech, USA, Cat#83422-60) were used to marked T cells, and PBMCs monochromic tubes were made respectively. The cells were detected on the Amnis FlowSight flow cytometer (Merck Millipore, USA), and immunocyte subsets were analyzed using the IDEAS software (Merck Millipore, USA). Supplement Fig.S7 visualized the analysis strategies for IRGPI immunocyte subsets by Flow cytometry.
m
WDR61 ablation triggers R-loops accumulation and suppresses breast cancer...
data.mendeley.com
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Qin (2024). WDR61 ablation triggers R-loops accumulation and suppresses breast cancer progression [Dataset]. http://doi.org/10.17632/g8xs5tm4vj.1
Explore at:
Unique identifier
https://doi.org/10.17632/g8xs5tm4vj.1
Dataset updated
Apr 8, 2024
Authors
Yan Qin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Table 1: WDR61 TCGA sample sheet, download from Genomic Data Commons (GDC). Supplementary Table 2: WDR61_R2_survival data, download from R2 homepage.
Z
ecDNA machine learning modeling
data.niaid.nih.gov
zenodo.org
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Zhao (2024). ecDNA machine learning modeling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7272630
Explore at:
Dataset updated
Jun 27, 2024
Dataset provided by
Qi Zhao
Shixiang Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Today (2024-06-27), we discovered an issue with the labeling of sample groups in one of the supplementary figures (Supplementary Figure 14c) in our published article. We have corrected the figure and present it here, and we extend our apologies to all readers for any confusion this may have caused (although no report received).

The source data of supplementary figure 13 in the accompanying article table has been found to have issues, which were identified as a result of improper Excel operation. Here, we have uploaded the correct data table

ecDNA_cargo_gene_modeling_data.csv.gz

The dataset contains features from 386 TCGA tumors for modeling ecDNA cargo gene prediction. It was converted from R data format with the following code. NOTE: columns 'sample' and 'gene_id' are not used for actual modeling but for identifying, and sampling purposes.

library(data.table)

data = readRDS("~/../Downloads/ecDNA_cargo_gene_modeling_data.rds")

colnames(data)[3] = "total_cn"

data.table::fwrite(data, file = "~/../Downloads/ecDNA_cargo_gene_modeling_data.csv.gz", sep = ",")

gcap_pcawg_WGS_result.tar.gz

GCAP analysis results for PCAWG allele-specific copy number profiles derived from WGS.

gcap_tcga_snp6_result.tar.gz

GCAP analysis results for TCGA allele-specific copy number profiles derived from SNP6 array.

gcap_Changkang_WES_result.tar.gz

GCAP analysis results for SYSUCC Changkang allele-specific copy number profiles derived from tumor-normal paired WES.

tcga_overlap_gene_wgs.rds, tcga_overlap_gene_snp.rds and tcga_overlap_gene_wes.rds

These datasets contain TCGA gene-level copy number results in R data format from overlapping samples (dataset above). WGS from PCAWG, SNP array, and WES from GDC portal.

cellline-batch1.zip & cellline-batch1.zip

GCAP results of cell line batch 1 and batch 2.

AA_cellline_wgs.zip

AA software results for cell line batch 1.

Batch2_AA_summary.xlsx

AA software results for cell line batch 2.

FISH-for-supp-file.zip

Extended raw FISH images from 12 CRC samples.

SNU216.zip

Extended AA and GCAP analysis on SNU216.

aa_ffpe.zip and AA_summary_table_of_6_erbb2_ffpe_samples.xlsx

Extended AA running files (all results) and result summary data for 6 GCAP predicted ERBB2 amp clinical samples.

source data of fig.4

source data of supp fig.2 subplots

source data of supp fig.15

GCAP result data objects for three ICB cohorts. Both gene-level and sample-level data included.

PDX-P68: processed (AA and CNV) data of P68 from WGS and WES data.

source data of supp fig.13

updated supplementary figure 14
ecDNA cargo gene modeling
zenodo.org
application/gzip, bin +1
Updated Sep 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shixiang Wang; Shixiang Wang; Qi Zhao; Qi Zhao (2023). ecDNA cargo gene modeling [Dataset]. http://doi.org/10.5281/zenodo.8139537
Explore at:
zip, application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8139537
Dataset updated
Sep 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shixiang Wang; Shixiang Wang; Qi Zhao; Qi Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1. ecDNA_cargo_gene_modeling_data.csv.gz

The dataset contains features from 386 TCGA tumors for modeling ecDNA cargo gene prediction. It was converted from R data format with the following code. NOTE: columns 'sample' and 'gene_id' are not used for actual modeling but for identifying, and sampling purposes.

library(data.table) data = readRDS("~/../Downloads/ecDNA_cargo_gene_modeling_data.rds") colnames(data)[3] = "total_cn" data.table::fwrite(data, file = "~/../Downloads/ecDNA_cargo_gene_modeling_data.csv.gz", sep = ",")

2. gcap_pcawg_WGS_result.tar.gz

GCAP analysis results for PCAWG allele-specific copy number profiles derived from WGS.

3. gcap_tcga_snp6_result.tar.gz

GCAP analysis results for TCGA allele-specific copy number profiles derived from SNP6 array.

4. gcap_Changkang_WES_result.tar.gz

GCAP analysis results for SYSUCC Changkang allele-specific copy number profiles derived from tumor-normal paired WES.

5. tcga_overlap_gene_wgs.rds, tcga_overlap_gene_snp.rds and tcga_overlap_gene_wes.rds

These datasets contain TCGA gene-level copy number results in R data format from overlapping samples (dataset above). WGS from PCAWG, SNP array, and WES from GDC portal.

6. cellline-batch1.zip & cellline-batch1.zip

GCAP results of cell line batch 1 and batch 2.

7. AA_cellline_wgs.zip

AA software results for cell line batch 1.

8. Batch2_AA_summary.xlsx

AA software results for cell line batch 2.
The datasets stored in TICCom
figshare.com
bin
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yunjin xie; Weiwei Zhou; Jingyi Shi; Mengjia Xu; Zijing Lin; Donghao Li; Jianing Li; Shujun Cheng; Tingting Shao; Juan Xu (2023). The datasets stored in TICCom [Dataset]. http://doi.org/10.6084/m9.figshare.22578031.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22578031.v1
Dataset updated
May 6, 2023
Dataset provided by
figshare
Authors
Yunjin xie; Weiwei Zhou; Jingyi Shi; Mengjia Xu; Zijing Lin; Donghao Li; Jianing Li; Shujun Cheng; Tingting Shao; Juan Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database provided five datasets. The first dataset contained experimentally verified tumor-immune cell interactions which included interacting gene symbols, cell types, interaction types, cancer types, species and other detailed information. The second dataset consisted of integrated ligand-receptor interactions including ligands, receptors, functions and sources. The third dataset presented significant interacting experimentally verified tumor-immune cell interactions inferred via Interaction Intensity module based on cancer types from TCGA (https://portal.gdc.cancer.gov/), ICGC (https://dcc.icgc.org/) and EMBL-EBI Expression Atlas (https://www.ebi.ac.uk/gxa/home). The fourth dataset contained significant interacting integrated ligand-receptor interactions predicted by TItalk based on cancer types from TCGA, ICGC and EMBL-EBI Expression Atlas. The fifth dataset consisted of predicted tumor-immune cell interactions inferred by five algorithms based on 32 scRNA-seq datasets and union of ligand-receptor interactions. These datasets can be downloaded from the Download module in TICCom.
Raw and processed data for studying chaperon-client interactions in 12...
figshare.com
zip
Updated Aug 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shai Pilosof; Barak Rotblat; Geut Galai (2023). Raw and processed data for studying chaperon-client interactions in 12 cancer types. [Dataset]. http://doi.org/10.6084/m9.figshare.22779755.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22779755.v2
Dataset updated
Aug 14, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shai Pilosof; Barak Rotblat; Geut Galai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data accompanies the paper "Ecological network analysis reveals cancer-dependent chaperone-client interaction structure and robustness", by Geut Galai, Xie He, Barak Rotblat, Shai Pilosof. Published in Nature Communications. Please cite the paper when using the data.All users must read the paper to understand how the data were obtained and processed, and their limitations. Data comes without warranty. Licence is CC BY-NC-SA (Attribution-NonCommercial-ShareAlike): This license lets you remix, tweak, and build upon this work non-commercially, as long as you credit the authors and license the new creations under the identical terms.All the computational processes related to data derivation and analysis are in the GutHub repository that accompanies the paper.Raw data (raw.zip)Gene level transcriptome profiling (RNA-Seq) data (in the form of HTSeq - FPKM) that was download from The Cancer Genome Atlas (TCGA) using the Genomic Data Commons Data Portal https://portal.gdc.cancer.gov).Human protein expression data that was downloaded from the string-db.org data base, and from published papers as follows.File: 12192_2020_1080_MOESM4_ESM.xlsx. Source: Bie AS, Cömert C, Körner R, Corydon TJ, Palmfeldt J, Hipp MS, et al. An inventory of interactors of the human HSP60/HSP10 chaperonin in the mitochondrial matrix space. Cell Stress Chaperones. 2020;25: 407–416. doi:10.1007/s12192-020-01080-6File: 41467_2013_BFncomms3139_MOESM481_ESM.xls. Source: Chae YC, Angelin A, Lisanti S, Kossenkov AV, Speicher KD, Wang H, et al. Landscape of the mitochondrial Hsp90 metabolome in tumours. Nat Commun. 2013;4: 2139. doi:10.1038/ncomms3139File: 12915_2020_740_MOESM8_ESM.xlsx Source: Joshi A, Dai L, Liu Y, Lee J, Ghahhari NM, Segala G, et al. The mitochondrial HSP90 paralog TRAP1 forms an OXPHOS-regulated tetramer and is involved in mitochondrial metabolic homeostasis. BMC Biol. 2020;18: 10. doi:10.1186/s12915-020-0740-7File: mmc2.xlsx Source: Ishizawa J, Zarabi SF, Davis RE, Halgas O, Nii T, Jitkova Y, et al. Mitochondrial ClpP-Mediated Proteolysis Induces Selective Cancer Cell Lethality. Cancer Cell. 2019;35: 721–737.e9. doi:10.1016/j.ccell.2019.03.014Processed data (processed.zip)The network data. Rows are chaperones, columns are clients.Source dataFile: Source Data for Figures and Tables.zipThis is the source data underlying the figures and tables, as requested by Nature Communications.
Additional file 3 of Enhanced identification of significant regulators of...
springernature.figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rezvan Ehsani; Finn Drabløs (2023). Additional file 3 of Enhanced identification of significant regulators of gene expression [Dataset]. http://doi.org/10.6084/m9.figshare.12096555.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12096555.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rezvan Ehsani; Finn Drabløs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3. Identifiers for TCGA datasets on prostate cancer as downloaded from the GDC data portal.
f
Metadata record for the manuscript: Ancestry-associated transcriptomic...
springernature.figshare.com
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jessica Roelands; Raghvendra Mall; Hossam Almeer; Remy Thomas; Mahmoud G. Mohamed; Shahinaz Bedri; Salha Bujassoum Al Bader; Kulsoom Junejo; Elad Ziv; Rosalyn W. Sayaman; Peter J.K. Kuppen; Davide Bedognetti; Wouter Hendrickx; Julie Decock (2023). Metadata record for the manuscript: Ancestry-associated transcriptomic profiles of breast cancer in patients of African, Arab and European ancestry [Dataset]. http://doi.org/10.6084/m9.figshare.13379765.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.13379765.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Jessica Roelands; Raghvendra Mall; Hossam Almeer; Remy Thomas; Mahmoud G. Mohamed; Shahinaz Bedri; Salha Bujassoum Al Bader; Kulsoom Junejo; Elad Ziv; Rosalyn W. Sayaman; Peter J.K. Kuppen; Davide Bedognetti; Wouter Hendrickx; Julie Decock
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
SummaryThis metadata record provides details of the data supporting the claims of the related manuscript: “Ancestry-associated transcriptomic profiles of breast cancer in patients of African, Arab and European ancestry”.The related study sought to identify molecular differences that could provide insight into the biology of ancestry-associated disparities in breast cancer clinical outcome.Type of data: transcriptomic profilesSubject of data: curated survival data and breast cancer subtype classification for European, Asian, African and Arab ancestry patients.Sample size: No sample size calculation was performed. All breast cancer patients from the TCGA breast cancer dataset for which ancestry was determined were included. With regards to the RA-QA cohort, all female breast cancer patients with available tumour tissues that were newly diagnosed between 2004-2010 were enrolled.Recruitment: Two different breast cancer cohorts were included in the study; the publicly available TCGA breast cancer dataset and a local cohort from Qatar. RNA sequencing data from the TCGA breast cancer cohort (n=1082 patients) was downloaded using R (v3.5.1) and TCGA Assembler (v2.0.3). The RA-QA patient cohort constitutes a breast cancer cohort from Qatar (n=24 of which 16 of Arab ancestry) with patients that were newly diagnosed with breast cancer between 2004-2010 at the National Centre for Cancer Care and Research in Doha.Data accessThe TCGA-BRCA cohort data are available through the GDC data portal (https://gdac.broadinstitute.org/runs/stddata_2016_01_28/data/BRCA/20160128/) or by using TCGA-Assembler as detailed in the method section. TCGA-Assembler is open-source and freely available at http://www.compgenome.org/TCGA-Assembler/. The downloaded data product name is “illuminahiseq_rnaseqv2-RSEM_genes_normalized”.The RA-QA dataset RNA sequencing data are openly available in fastq file format in the European Nucleotide Archive via the following accession: https://identifiers.org/ena.embl:PRJEB41828. Several data files are openly available in figshare at the following DOI: https://doi.org/10.6084/m9.figshare.12901928. These are as follows. The RNAseq Expression matrix is in the file ‘RNASeq_QN_LOG2_RA_QA.csv’. The clinical data for the RA-QA cohort are in the file ‘Clinical_data_RA_QA.csv’.The enrichment scores data are in the files ‘Enrichment_scores_tumor_related_pathways_RA_QA.csv’ and ‘Enrichment_scores_immune_deconvolution_Bindea_RA_QA.csv’.Three data items used in the study are available from the supplementary materials of previously published articles. These are as follows:- File ‘TCGA_CLINICAL_DATA_CELL_2018_S1.xlsx’ from Liu et al, 2018: https://doi.org/10.1016/j.cell.2018.02.052.- File ‘Admixture and Ethnicity Calls.xlsx’ from Carrot-Zhang et al, 2020: https://doi.org/10.1016/j.ccell.2020.04.012.- File ‘mmc4.xlsx’ from Rooney et al, 2015: https://doi.org/10.1016/j.cell.2014.12.033. Scripts used in the study can be found on Zenodo/github: https://doi.org/10.5281/zenodo.3707660.Corresponding author(s) for this studyJulie Decock, Cancer Research Center, Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), Doha, Qatar. Email: juliedecock80@gmail.com.Wouter Hendrickx, Functional Cancer Omics Lab, Cancer group, Research Branch, Sidra Medicine, Doha, Qatar. Email: whendrickx@sidra.org.Davide Bedognetti, Cancer Immunogenetics Lab, Cancer group, Research Branch, Sidra Medicine, Doha, Qatar. Email: dbedognetti@sidra.org.Study approval The study was approved by the local ethical committees of the Hamad Medical Corporation (study approval number #14027/14), the Qatar Biomedical Research Institute (study approval number #2016-002), and Sidra Medicine (study approval number #1711015664), and was performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Facebook

Twitter

Click to copy link

Link copied

Cite

Seim, Inge (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1186944

Historical NCI Genomic Data Commons data (09-14-2017)

Explore at:

Dataset updated

Jan 24, 2020

Dataset authored and provided by

Seim, Inge

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata samples570 version11-27-2017 hubhttps://gdc.xenahubs.net type of dataphenotype authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90 raw datahttps://api.gdc.cancer.gov/data/ input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix) 570 samples X 151 identifiersAll IdentifiersAll Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata samples512 version09-14-2017 hubhttps://gdc.xenahubs.net type of datagene expression RNAseq unitlog2(fpkm-uq+1) platformIllumina ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80 raw datahttps://api.gdc.cancer.gov/data/ wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed. input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix) 60,484 identifiers X 512 samples

Clear search

Close search

Google apps

Main menu

Historical NCI Genomic Data Commons data (09-14-2017)

Genomic Data Commons Data Portal (GDC Data Portal)

Metadata and data files supporting the published article: The therapeutic...

Table 1_TCGADownloadHelper: simplifying TCGA data extraction and...

Genomic Data Commons Data Portal

Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)

The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

hCINAP expression in colorectal cancer

Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)

Multi-omic and survival datasets used for "DeepProg: an ensemble of...

TCGA DATA

Histological images for MSI vs. MSS classification in gastrointestinal...

Identification of immune-related genes prognostic index for predicting...

WDR61 ablation triggers R-loops accumulation and suppresses breast cancer...

ecDNA machine learning modeling

ecDNA cargo gene modeling

The datasets stored in TICCom

Raw and processed data for studying chaperon-client interactions in 12...

Additional file 3 of Enhanced identification of significant regulators of...

Metadata record for the manuscript: Ancestry-associated transcriptomic...

Historical NCI Genomic Data Commons data (09-14-2017)