57 datasets found
  1. d

    Genomic Data Commons Data Portal (GDC Data Portal)

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

  2. Metadata and data files supporting the published article: The therapeutic...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum (2023). Metadata and data files supporting the published article: The therapeutic response of ER+/HER2- breast cancers differs according to the molecular Basal or Luminal subtype [Dataset]. http://doi.org/10.6084/m9.figshare.11558676.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here, the authors performed an in-silico analysis on a meta-dataset including gene-expression data from 5,342 clinically defined estrogen receptor-positive/ human epidermal growth factor receptor 2-negative (ER+/HER2-) breast cancers (BC), and DNA copy number/mutational and proteomic data, to determine whether the therapeutic response of ER+/HER2- breast cancers differs according to the molecular basal or luminal subtype.Data access: The dataset Breast_cancer_classifications.csv supporting figure 1, table 1, and supplementary tables 1-3 is publicly available in the figshare repository as part of this data record. This study used and analysed 36 publicly available datasets that are all listed in Supplementary table 8 and are cited from the data availability statement of the published article.Study aims and methodology: To evaluate the response and/or potential vulnerability to hormone treatment (HT) and other systemic therapies of BC, and to assess the degree of difference between basal and luminal breast cancer subtypes, the authors performed an in-silico analysis of a meta-dataset including gene expression data from 8,982 non-redundant BCs and DNA copy number/mutational and proteomic data from TCGA. The aim was to compare the Basal versus Luminal samples. Out of the 8,982 samples of the database, 6,563 were defined as ER+ (5,342 according to immunohistochemistry (IHC) and 1,221 according to inferred stratus).The authors analysed breast cancer gene expression data pooled from 36 public datasets (the publicly available datasets are listed in supplementary table 8), comprising 8,982 invasive primary BCs. The pre-analytic data processing was done as described previously in https://doi.org/10.1038/s41416-018-0309-1. Please refer to the published article for more details on the methodology and statistical analysis.Data supporting the figures, tables and supplementary tables in the published article: Data supporting figure 1, table 1, and supplementary tables 1-3: Dataset Breast_cancer_classifications.csv is in .csv file format. The dataset includes histo-clinical and molecular data of the tumors analysed in study, and is part of this data record.Data supporting supplementary table 4: Dataset genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.tar.gz.1 is a tar archive gz compressed of maf format files. This dataset was accessed through the Genomic Data Commons (GDC) Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/afaf2790-04d4-453a-8c1b-75cf42ffd35f.Data supporting supplementary table 5: Dataset gdc_manifest.txt consists of gz archives of txt format files. The file was accessed through the GDC Data Portal here : https://portal.gdc.cancer.gov/repository?facetTab=files&filters={"op":"and","content":[{"op":"in","content":{"field":"cases.project.project_id","value":["TCGA-BRCA"]}},{"op":"in","content":{"field":"files.access","value":["open"]}},{"op":"in","content":{"field":"files.analysis.workflow_type","value":["HTSeq - Counts"]}},{"op":"in","content":{"field":"files.experimental_strategy","value":["RNA-Seq"]}}]}&searchTableTab=filesData supporting supplementary table 6: Dataset Table S5_Revised.xlsx is in .xlsx file format and is part of the supplementary information files of the published article.Data supporting supplementary table 7: Dataset BRCA.RPPA.Level_3.tar is a tar archive of txt format files. The file was accessed through the GDC Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/85988e1b-4f7d-493e-96ae-9eee61ac2833.

  3. b

    Genomic Data Commons Data Portal

    • bioregistry.io
    Updated Apr 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Genomic Data Commons Data Portal [Dataset]. https://bioregistry.io/gdc
    Explore at:
    Dataset updated
    Apr 23, 2021
    Description

    The GDC Data Portal is a robust data-driven platform that allows cancer researchers and bioinformaticians to search and download cancer data for analysis.

  4. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  5. c

    The Cancer Genome Atlas Breast Invasive Carcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Breast Invasive Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.

  6. f

    Table 1_TCGADownloadHelper: simplifying TCGA data extraction and...

    • frontiersin.figshare.com
    pdf
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Anke Baumann; Olaf Wolkenhauer; Markus Wolfien (2025). Table 1_TCGADownloadHelper: simplifying TCGA data extraction and preprocessing.pdf [Dataset]. http://doi.org/10.3389/fgene.2025.1569290.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset provided by
    Frontiers
    Authors
    Alexandra Anke Baumann; Olaf Wolkenhauer; Markus Wolfien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cancer Genome Atlas (TCGA) provides comprehensive genomic data across various cancer types. However, complex file naming conventions and the necessity of linking disparate data types to individual case IDs can be challenging for first-time users. While other tools have been introduced to facilitate TCGA data handling, they lack a straightforward combination of all required steps. To address this, we developed a streamlined pipeline using the Genomic Data Commons (GDC) portal’s cart system for file selection and the GDC Data Transfer Tool for data downloads. We use the Sample Sheet provided by the GDC portal to replace the default 36-character opaque file IDs and filenames with human-readable case IDs. We developed a pipeline integrating customizable Python scripts in a Jupyter Notebook and a Snakemake pipeline for ID mapping along with automating data preprocessing tasks (https://github.com/alex-baumann-ur/TCGADownloadHelper). Our pipeline simplifies the data download process by modifying manifest files to focus on specific subsets, facilitating the handling of multimodal data sets related to single patients. The pipeline essentially reduced the effort required to preprocess data. Overall, this pipeline enables researchers to efficiently navigate the complexities of TCGA data extraction and preprocessing. By establishing a clear step-by-step approach, we provide a streamlined methodology that minimizes errors, enhances data usability, and supports the broader utilization of TCGA data in cancer research. It is particularly beneficial for researchers new to genomic data analysis, offering them a practical framework prior to conducting their TCGA studies.

  7. c

    The Cancer Genome Atlas Stomach Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  8. hCINAP expression in colorectal cancer

    • figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yapeng Ji; zefang Zhang; zemin Zhang; xiaofeng Zheng (2023). hCINAP expression in colorectal cancer [Dataset]. http://doi.org/10.6084/m9.figshare.4737181.v3
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yapeng Ji; zefang Zhang; zemin Zhang; xiaofeng Zheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The authors declare that the data analysis processes supporting the findings of this study are available within the article and its Supplementary Information files. The TCGA gene expression profile data, as recomputed based on gencode v23, were downloaded from UCSC Xena (http://xena.ucsc.edu/). The TCGA clinical data were downloaded from the GDC Data Portal (https://gdc-portal.nci.nih.gov/), with accession number phs000178.v9.p8 in dbGap. Supplementary Information: For analyzing the hCINAP expression in CRC, we downloaded the recomputed TCGA gene expression datasets for COAD and READ cancer types from the UCSC Xena (http://xena.ucsc.edu/). The gene model was based on gencode v23, and the expression unit is TPM (Transcript per million). The clinical data were downloaded from the GDC Data Portal (https://gdc-portal.nci.nih.gov/).

    For differential expression analysis, we compiled a selected sample set, including 367 tumor- and 51 normal-samples, in which each sample has information available for clinical variables such as gender, age and race (Supplementary Table1). For expression analysis by pathological stages, we only used those tumor samples with stage information (Supplementary Table1). The dataset used for profiling gene expression by CRC subtypes was compiled based on the results of consensus molecular subtypes (CMSs) described previously [PMID: 26457759] , containing 265 tumor samples (Supplementary Table1).

  9. The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

    • figshare.com
    xlsx
    Updated Feb 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Namshik Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).

  10. TCGA-LUAD

    • kaggle.com
    • opendatalab.com
    zip
    Updated Jul 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nahin Kumar Dey (2021). TCGA-LUAD [Dataset]. https://www.kaggle.com/nahin333/tcgaluad
    Explore at:
    zip(10283785426 bytes)Available download formats
    Dataset updated
    Jul 28, 2021
    Authors
    Nahin Kumar Dey
    Description

    The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD

  11. f

    Metadata record for the article: A subset of lung cancer cases shows robust...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miklos Diossy; Zsofia Sztupinszki; Judit Borcsok; Marcin Krzystanek; Viktoria Tisza; Sandor Spisak; Orsolya Rusz; Jozsef Timar; István Csabai; Janos Fillinger; Judit Moldvay; Anders Gorm Pedersen; David Szuts; Zoltan Szallasi (2023). Metadata record for the article: A subset of lung cancer cases shows robust signs of homologous recombination deficiency associated genomic mutational signatures [Dataset]. http://doi.org/10.6084/m9.figshare.14452854.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Miklos Diossy; Zsofia Sztupinszki; Judit Borcsok; Marcin Krzystanek; Viktoria Tisza; Sandor Spisak; Orsolya Rusz; Jozsef Timar; István Csabai; Janos Fillinger; Judit Moldvay; Anders Gorm Pedersen; David Szuts; Zoltan Szallasi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary

    This metadata record provides details of the data supporting the claims of the related article: “A subset of lung cancer cases shows robust signs of homologous recombination deficiency associated genomic mutational signatures”.

    The related study analysed all available whole genome sequencing data from the TCGA lung adenocarcinoma (LUAD) and squamous lung cancer (LUSC) cohorts and determined which of a list of mutational signatures were present in these cases, analysing whole genome and whole exome data to estimate the frequency of potentially homologous recombination (HR) deficient lung cancer cases.

    Type of data: single nucleotide variation; binary alignment maps

    Subject of data: Eukaryotic cell lines; Homo sapiens

    Population characteristics: lung cancer cases

    Recruitment: Cancer cell lines were sourced from Cancer Cell Line Encyclopedia, Genomics of Drug Sensitivity in Cancer data portal. The exceptional responder was identified as part of a larger ongoing study to understand the determinants of treatment response to platinum based therapy.

    Data access

    The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga, and the LUAD and LUSC data are available at ICGC (https://dcc.icgc.org/) and GDC (https://portal.gdc.cancer.gov/) data portals. A comprehensive list of the file names underlying the figures and supplementary materials of the related article, along with direct links to the data in the above sources, is provided in the file ‘Diossy_et_al_2021_underlying_data_list.xlsx’, which is included with this data record.

    Sample single nucleotide variation analysis of a stage IVA lung squamous carcinoma case with a durable (> 20 months), symptom-free survival in response to platinum-based treatment (H75T) has been deposited in the European Variation Archive under accession https://identifiers.org/ebi/bioproject:PRJEB45238.

    Corresponding author(s) for this study

    Zoltan Szallasi, Computational Health Informatics Program (CHIP) Boston Children’s Hospital, Harvard Medical School, 300 Longwood Ave., Boston Massachusetts, USA, 02215, e-mail: Zoltan.szallasi@childrens.harvard.edu, +1-617-355-2179.

    Study approval

    The Hungarian Scientific and Research Ethics Committee of the Medical Research Council, No 2285-1/2019/EUIG és 2307-3/2020/EUIG has approved the study.

  12. c

    The Cancer Genome Atlas Prostate Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Feb 2, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2014). The Cancer Genome Atlas Prostate Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Feb 2, 2014
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  13. e

    TCGA case study for ASTERICS - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Aug 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). TCGA case study for ASTERICS - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/90b36891-8290-5d26-9206-cc2a74fc7c10
    Explore at:
    Dataset updated
    Aug 21, 2021
    Description

    This dataset is issued from the public repository TCGA (https://portal.gdc.cancer.gov/) and contain several files, each corresponding to a given omic on the same individuals with breast cancer. Raw data have been obtained from the mixOmics case study described in http://mixomics.org/mixdiablo/case-study-tcga/ [link accessed on August 18, 2021] and were made available by the package authors at http://mixomics.org/wp-content/uploads/2016/08/TCGA.normalised.mixDIABLO.RData_.zip (R data format). Data in the zip file had been normalised for technical biases by the package authors. Data from the train and test sets were exported as TXT/CSV files and completed with miRNA expression on the smae individuals and toy datasets to handle missing value cases and alike. They serve as a basis for the illustration of the web data analysis tool ASTERICS (Project 20008788 funded by Région Occitanie). R, 4.0.4 The Cancer Genome Atlas (TCGA) https://portal.gdc.cancer.gov/ Data dictionnary is available on TCGA website https://docs.gdc.cancer.gov/Data_Dictionary/viewer/ The origin of sources is a public repository where raw original data may be retrieved. Data were preprocessed (normalized) by the mixOmics package authors as described in Supplementary Section S2 of [Singh et al, 2019], where the origin of the dataset is also fully described.

  14. O

    TCGA-KICH

    • opendatalab.com
    zip
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GDC Data Portal (2023). TCGA-KICH [Dataset]. https://opendatalab.com/OpenDataLab/TCGA-KICH
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    GDC Data Portal
    License

    https://portal.gdc.cancer.gov/projects/TCGA-KICHhttps://portal.gdc.cancer.gov/projects/TCGA-KICH

    Description

    TCGA - KICH cancer CT image is a dataset related to adenoma and adenocarcinoma, which contains a total of 2325 data files from 113 people. Test results, prescriptions and treatments. This dataset is published by GDC Data Portal.

  15. f

    Metadata record for the manuscript: Ancestry-associated transcriptomic...

    • springernature.figshare.com
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Roelands; Raghvendra Mall; Hossam Almeer; Remy Thomas; Mahmoud G. Mohamed; Shahinaz Bedri; Salha Bujassoum Al Bader; Kulsoom Junejo; Elad Ziv; Rosalyn W. Sayaman; Peter J.K. Kuppen; Davide Bedognetti; Wouter Hendrickx; Julie Decock (2023). Metadata record for the manuscript: Ancestry-associated transcriptomic profiles of breast cancer in patients of African, Arab and European ancestry [Dataset]. http://doi.org/10.6084/m9.figshare.13379765.v1
    Explore at:
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Jessica Roelands; Raghvendra Mall; Hossam Almeer; Remy Thomas; Mahmoud G. Mohamed; Shahinaz Bedri; Salha Bujassoum Al Bader; Kulsoom Junejo; Elad Ziv; Rosalyn W. Sayaman; Peter J.K. Kuppen; Davide Bedognetti; Wouter Hendrickx; Julie Decock
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    SummaryThis metadata record provides details of the data supporting the claims of the related manuscript: “Ancestry-associated transcriptomic profiles of breast cancer in patients of African, Arab and European ancestry”.The related study sought to identify molecular differences that could provide insight into the biology of ancestry-associated disparities in breast cancer clinical outcome.Type of data: transcriptomic profilesSubject of data: curated survival data and breast cancer subtype classification for European, Asian, African and Arab ancestry patients.Sample size: No sample size calculation was performed. All breast cancer patients from the TCGA breast cancer dataset for which ancestry was determined were included. With regards to the RA-QA cohort, all female breast cancer patients with available tumour tissues that were newly diagnosed between 2004-2010 were enrolled.Recruitment: Two different breast cancer cohorts were included in the study; the publicly available TCGA breast cancer dataset and a local cohort from Qatar. RNA sequencing data from the TCGA breast cancer cohort (n=1082 patients) was downloaded using R (v3.5.1) and TCGA Assembler (v2.0.3). The RA-QA patient cohort constitutes a breast cancer cohort from Qatar (n=24 of which 16 of Arab ancestry) with patients that were newly diagnosed with breast cancer between 2004-2010 at the National Centre for Cancer Care and Research in Doha.Data accessThe TCGA-BRCA cohort data are available through the GDC data portal (https://gdac.broadinstitute.org/runs/stddata_2016_01_28/data/BRCA/20160128/) or by using TCGA-Assembler as detailed in the method section. TCGA-Assembler is open-source and freely available at http://www.compgenome.org/TCGA-Assembler/. The downloaded data product name is “illuminahiseq_rnaseqv2-RSEM_genes_normalized”.The RA-QA dataset RNA sequencing data are openly available in fastq file format in the European Nucleotide Archive via the following accession: https://identifiers.org/ena.embl:PRJEB41828. Several data files are openly available in figshare at the following DOI: https://doi.org/10.6084/m9.figshare.12901928. These are as follows. The RNAseq Expression matrix is in the file ‘RNASeq_QN_LOG2_RA_QA.csv’. The clinical data for the RA-QA cohort are in the file ‘Clinical_data_RA_QA.csv’.The enrichment scores data are in the files ‘Enrichment_scores_tumor_related_pathways_RA_QA.csv’ and ‘Enrichment_scores_immune_deconvolution_Bindea_RA_QA.csv’.Three data items used in the study are available from the supplementary materials of previously published articles. These are as follows:- File ‘TCGA_CLINICAL_DATA_CELL_2018_S1.xlsx’ from Liu et al, 2018: https://doi.org/10.1016/j.cell.2018.02.052.- File ‘Admixture and Ethnicity Calls.xlsx’ from Carrot-Zhang et al, 2020: https://doi.org/10.1016/j.ccell.2020.04.012.- File ‘mmc4.xlsx’ from Rooney et al, 2015: https://doi.org/10.1016/j.cell.2014.12.033. Scripts used in the study can be found on Zenodo/github: https://doi.org/10.5281/zenodo.3707660.Corresponding author(s) for this studyJulie Decock, Cancer Research Center, Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), Doha, Qatar. Email: juliedecock80@gmail.com.Wouter Hendrickx, Functional Cancer Omics Lab, Cancer group, Research Branch, Sidra Medicine, Doha, Qatar. Email: whendrickx@sidra.org.Davide Bedognetti, Cancer Immunogenetics Lab, Cancer group, Research Branch, Sidra Medicine, Doha, Qatar. Email: dbedognetti@sidra.org.Study approval The study was approved by the local ethical committees of the Hamad Medical Corporation (study approval number #14027/14), the Qatar Biomedical Research Institute (study approval number #2016-002), and Sidra Medicine (study approval number #1711015664), and was performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

  16. Small NBL Dataset for Analysis of AKT Signaling

    • zenodo.org
    csv
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greg Hyde; Greg Hyde (2023). Small NBL Dataset for Analysis of AKT Signaling [Dataset]. http://doi.org/10.5281/zenodo.8148323
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Greg Hyde; Greg Hyde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ExpressionData.csv

    ----------------------------

    A small subset of transcriptomics data (30 genes) curated for learning Gene Regulatory Networks (GRNs) pertaining to signaling by the ALK pathway. Genes were selected by referencing the "signaling by ALK" pathway from Reactome (https://reactome.org/content/detail/R-HSA-201556). This subset of data belongs the TARGET-NBL project (https://portal.gdc.cancer.gov/projects/TARGET-NBL), hosted via the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/). Please refer to GDCs data access policies (https://gdc.cancer.gov/about-gdc/gdc-policies) if planning to use the data.

    refNetwork.csv

    ----------------------

    Contains a reference network of known pairwise regulatory relationships among the genes of which we have transcriptomics data available in "ExpressionData.csv." These relationships were again determined by referencing the "signaling by ALK" pathway from Reactome (https://reactome.org/content/detail/R-HSA-201556).

  17. c

    The Cancer Genome Atlas Colon Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  18. Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)

    • zenodo.org
    • explore.openaire.eu
    application/gzip, csv +1
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI (2020). Pan-cancer Aberrant Pathway Activity Analysis (PAPAA) [Dataset]. http://doi.org/10.5281/zenodo.3629709
    Explore at:
    application/gzip, tsv, csvAvailable download formats
    Dataset updated
    Dec 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information about the dataset files:

    1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046]

    2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

    3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046]

    4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

    5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/]

    6) vogelstein_cancergenes.tsv: compendium of OG and TSG used for the analysis. [https://github.com/greenelab/pancancer/]

    7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt]

    8) ccle_rnaseq_genes_rpkm_20180929.gct.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz]

    9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct]

    10) GDSC_cell_lines_EXP_CCLE_names.csv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip]

    11) GDSC_CCLE_common_mut_cnv_binary.csv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct and a list of common cell lines.

    12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx]

    13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx]

    14) compounds.csv: list of pharmacological compounds tested for our analysis

    15) tcga_dictonary.tsv: list of cancer types used in the analysis.

    16) seg_based_scores.tsv: Measurement of total copy number burden, Percent of genome altered by copy number alterations. This file was used as part of the Pancancer analysis by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/]

  19. f

    Additional file 3 of Enhanced identification of significant regulators of...

    • springernature.figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rezvan Ehsani; Finn Drabløs (2023). Additional file 3 of Enhanced identification of significant regulators of gene expression [Dataset]. http://doi.org/10.6084/m9.figshare.12096555.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Rezvan Ehsani; Finn Drabløs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3. Identifiers for TCGA datasets on prostate cancer as downloaded from the GDC data portal.

  20. Histological images for MSI vs. MSS classification in gastrointestinal...

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakob Nikolas Kather; Jakob Nikolas Kather (2020). Histological images for MSI vs. MSS classification in gastrointestinal cancer, snap-frozen samples [Dataset]. http://doi.org/10.5281/zenodo.2532612
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jakob Nikolas Kather; Jakob Nikolas Kather
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains 218,578 unique image patches derived from histological images of colorectal cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from snap-frozen tissue slides ("TS" or "BS" at the GDC data portal).

    Preprocessing

    All SVS slides were preprocessed as follows

    1. automatic detection of tumor

    2. resizing to 224 px x 224 px at a resolution of 0.5 µm/px

    4. color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

    5. assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite unstable or hypermutated)

    6. randomization of patients to training and testing sets (~70% and ~30%). Randomization was done on a patient level rather than on a slide or tile level

    7. equilibration of training sets by undersampling (removing excess tiles in MSS class in a random way)

    File description

    1. STAD_TRAIN_MSS - training images (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 50285 unique image patches; FFPE samples

    2. STAD_TRAIN_MSIMUT - training images ( (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 50285 unique image patches; FFPE samples

    3. STAD_TEST_MSS - test images (~30% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 90104 unique image patches; FFPE samples

    4. STAD_TEST_MSIMUT - test images ( ~30% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 27904 unique image patches; FFPE samples

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal

Explore at:
74 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 29, 2022
Description

A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

Search
Clear search
Close search
Google apps
Main menu