100+ datasets found
  1. Historical NCI Genomic Data Commons data (09-14-2017)

    • zenodo.org
    • data-staging.niaid.nih.gov
    tsv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inge Seim; Inge Seim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

    TCGA-COAD.GDC_phenotype.tsv

    dataset: phenotype - Phenotype

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
    samples570
    version11-27-2017
    hubhttps://gdc.xenahubs.net
    type of dataphenotype
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
    raw datahttps://api.gdc.cancer.gov/data/
    input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
    570 samples X 151 identifiersAll IdentifiersAll Samples

    TCGA-COAD.htseq_fpkm-uq.tsv

    dataset: gene expression RNAseq - HTSeq - FPKM-UQ

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
    samples512
    version09-14-2017
    hubhttps://gdc.xenahubs.net
    type of datagene expression RNAseq
    unitlog2(fpkm-uq+1)
    platformIllumina
    ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
    raw datahttps://api.gdc.cancer.gov/data/
    wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
    input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
    60,484 identifiers X 512 samples

  2. M

    Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data

    • datacatalog.mskcc.org
    Updated Nov 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Genome Atlas (TCGA) (2019). Colorectal Adenocarcinoma (TCGA, PanCancer Atlas) data [Dataset]. https://datacatalog.mskcc.org/dataset/10411
    Explore at:
    Dataset updated
    Nov 20, 2019
    Dataset provided by
    MSK Library
    The Cancer Genome Atlas (TCGA)
    Description

    This dataset contains summary data visualizations and clinical data from a broad sampling of 594 colorectal adenocarcinomas from 594 patients. The data was gathered as part of the PanCancer Atlas initiative, which aims to answer big, overarching questions about cancer by examining the full set of tumors characterized in the robust TCGA dataset. The clinical data includes mutation count, information about mutated genes, patient demographics, disease status, tumor typing, and chromosomal gain or loss. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.

  3. Z

    Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and...

    • data.niaid.nih.gov
    Updated Nov 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin (2021). Formatted TCGA clinical and RNA-Seq data for colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5717484
    Explore at:
    Dataset updated
    Nov 23, 2021
    Dataset provided by
    Department of Gastroenterology and Hepatology, Sichuan University-University of Oxford Huaxi Joint Centre for Gastrointestinal Cancer, West China Hospital, Sichuan University
    Authors
    Liu, Tong; Wang, Zi-Jing; Qi, Shao-Chong; Xia, Bi-Han; Zhang, Xiao-Shuang; Yang, Jin-Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COAD/READ/COADREAD_rnaseq_fpkm.txt files contain TCGA RNA-Seq data in FPKM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_rnaseq_tpm.txt files contain TCGA RNA-Seq data in TPM normalisation for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_clinical_raw.xlsx files contain TCGA clinical data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_rnaseq_clinical_raw.xlsx files contain corresponding information of TCGA clinical data and RNA-Seq data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

  4. M

    Colorectal Adenocaranoma (TCGA, Firehose Legacy)

    • datacatalog.mskcc.org
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad Institute (2020). Colorectal Adenocaranoma (TCGA, Firehose Legacy) [Dataset]. https://datacatalog.mskcc.org/dataset/10467
    Explore at:
    Dataset updated
    Sep 15, 2020
    Dataset provided by
    Broad Institute
    MSK Library
    Description

    TCGA Colorectal Adenocarcinoma. Source data from GDAC Firehose. Previously known as TCGA Provisional.
    This dataset contains summary data visualizations and clinical data from a broad sampling of 640 carcinomas from 636 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, BRAF Gene Analysis Indicator, BRAF Gene Analysis Result, and Days to Sample Collection. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.

  5. h

    TCGA-Cancer-Variant-and-Clinical-Data

    • huggingface.co
    Updated Oct 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Seq-to-Pheno
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TCGA Cancer Variant and Clinical Data

      Dataset Description
    

    This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

      Dataset Summary
    

    The dataset includes:

    Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.

  6. TCGA COAD MSI vs MSS Prediction (JPG)

    • kaggle.com
    zip
    Updated Aug 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joan Gibert (2019). TCGA COAD MSI vs MSS Prediction (JPG) [Dataset]. https://www.kaggle.com/joangibert/tcga_coad_msi_mss_jpg
    Explore at:
    zip(11756515042 bytes)Available download formats
    Dataset updated
    Aug 23, 2019
    Authors
    Joan Gibert
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    This dataset comes from here: Kather, Jakob Nikolas. (2019). Histological images for MSI vs. MSS classification in gastrointestinal cancer, FFPE samples [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2530835

    Much of the information in the description come either from the dataset description or the scientific article using it to predict MSI:

    Microsatellite instability determines whether patients with gastrointestinal cancer respond exceptionally well to immunotherapy. However, in clinical practice, not every patient is tested for MSI, because this requires additional genetic or immunohistochemical tests.

    Content

    This repository contains 192312 unique image patches derived from histological images of colorectal cancer and gastric cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from formalin-fixed paraffin-embedded (FFPE) diagnostic slides ("DX" at the GDC data portal). This is explained well in this blog: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

    Preprocessing All SVS slides were preprocessed as follows

    1. Automatic detection of tumor

    2. Resizing to 224 px x 224 px at a resolution of 0.5 µm/px

    3. Color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

    4. Assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite instable or highly mutated)

    5. Reformat the original images to JPG format (using bash command mogrify)

    Acknowledgements

    Thanks to Jakob Nikolas Kather for the paper and the github page

    Inspiration

    This dataset tries to analyze a feature that is actually impossible to identify using the human eye. Additional test are needed to identify this set of patients which take time for the patients to start a treatment. Great sensitivity of this kind of task could lead to a great boost in patient diagnosis and treatment.

  7. Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky (2023). Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2021.782699.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactions of the extracellular matrix (ECM) and cellular receptors constitute one of the crucial pathways involved in colorectal cancer progression and metastasis. With the use of bioinformatics analysis, we comprehensively evaluated the prognostic information concentrated in the genes from this pathway. First, we constructed a ECM–receptor regulatory network by integrating the transcription factor (TF) and 5’-isomiR interaction databases with mRNA/miRNA-seq data from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD). Notably, one-third of interactions mediated by 5’-isomiRs was represented by noncanonical isomiRs (isomiRs, whose 5’-end sequence did not match with the canonical miRBase version). Then, exhaustive search-based feature selection was used to fit prognostic signatures composed of nodes from the network for overall survival prediction. Two reliable prognostic signatures were identified and validated on the independent The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) cohort. The first signature was made up by six genes, directly involved in ECM–receptor interaction: AGRN, DAG1, FN1, ITGA5, THBS3, and TNC (concordance index 0.61, logrank test p = 0.0164, 3-years ROC AUC = 0.68). The second hybrid signature was composed of three regulators: hsa-miR-32-5p, NR1H2, and SNAI1 (concordance index 0.64, logrank test p = 0.0229, 3-years ROC AUC = 0.71). While hsa-miR-32-5p exclusively regulated ECM-related genes (COL1A2 and ITGA5), NR1H2 and SNAI1 also targeted other pathways (adhesion, cell cycle, and cell division). Concordant distributions of the respective risk scores across four stages of colorectal cancer and adjacent normal mucosa additionally confirmed reliability of the models.

  8. COAD samples somatic mutation data

    • figshare.com
    • search.datacite.org
    application/gzip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD samples somatic mutation data [Dataset]. http://doi.org/10.6084/m9.figshare.1061910.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD samples somatic mutation data in BED format.

  9. DICOM converted Slide Microscopy images for the TCGA-COAD collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.13346249
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Genome Atlas-Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to enhance the TCGA http://cancergenome.nih.gov/ data set with characterized radiological images. The Cancer Imaging Program (CIP), with the cooperation of several of the TCGA tissue-contributing institutions, has archived a large portion of the radiological images of the COAD cases.

    Please see the TCGA-COAD page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_coad-idc_v18-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_coad-idc_v18-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_coad-idc_v18-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  10. DICOM converted Slide Microscopy images for the CPTAC-COAD collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the CPTAC-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.12666785
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: CPTAC-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    This collection contains subjects from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium CPTAC Colon Adenocarcinoma cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

    Please see the CPTAC-COAD wiki page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. cptac_coad-idc_v7-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. cptac_coad-idc_v7-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. cptac_coad-idc_v7-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  11. The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

    • figshare.com
    xlsx
    Updated Feb 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Namshik Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).

  12. f

    Table_2_Identification of Synergistic Drug Combinations to Target...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated May 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mackeyev, Yuri; De Araujo Farias, Virginea; Singh, Pankaj K.; Krishnan, Sunil; Gupta, Kshama; Jones, Jeremy C.; Quiñones-Hinojosa, Alfredo (2022). Table_2_Identification of Synergistic Drug Combinations to Target KRAS-Driven Chemoradioresistant Cancers Utilizing Tumoroid Models of Colorectal Adenocarcinoma and Recurrent Glioblastoma.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000367110
    Explore at:
    Dataset updated
    May 18, 2022
    Authors
    Mackeyev, Yuri; De Araujo Farias, Virginea; Singh, Pankaj K.; Krishnan, Sunil; Gupta, Kshama; Jones, Jeremy C.; Quiñones-Hinojosa, Alfredo
    Description

    Treatment resistance is observed in all advanced cancers. Colorectal cancer (CRC) presenting as colorectal adenocarcinoma (COAD) is the second leading cause of cancer deaths worldwide. Multimodality treatment includes surgery, chemotherapy, and targeted therapies with selective utilization of immunotherapy and radiation therapy. Despite the early success of anti-epidermal growth factor receptor (anti-EGFR) therapy, treatment resistance is common and often driven by mutations in APC, KRAS, RAF, and PI3K/mTOR and positive feedback between activated KRAS and WNT effectors. Challenges in the direct targeting of WNT regulators and KRAS have caused alternative actionable targets to gain recent attention. Utilizing an unbiased drug screen, we identified combinatorial targeting of DDR1/BCR-ABL signaling axis with small-molecule inhibitors of EGFR-ERBB2 to be potentially cytotoxic against multicellular spheroids obtained from WNT-activated and KRAS-mutant COAD lines (HCT116, DLD1, and SW480) independent of their KRAS mutation type. Based on the data-driven approach using available patient datasets (The Cancer Genome Atlas (TCGA)), we constructed transcriptomic correlations between gene DDR1, with an expression of genes for EGFR, ERBB2-4, mitogen-activated protein kinase (MAPK) pathway intermediates, BCR, and ABL and genes for cancer stem cell reactivation, cell polarity, and adhesion; we identified a positive association of DDR1 with EGFR, ERBB2, BRAF, SOX9, and VANGL2 in Pan-Cancer. The evaluation of the pathway network using the STRING database and Pathway Commons database revealed DDR1 protein to relay its signaling via adaptor proteins (SHC1, GRB2, and SOS1) and BCR axis to contribute to the KRAS-PI3K-AKT signaling cascade, which was confirmed by Western blotting. We further confirmed the cytotoxic potential of our lead combination involving EGFR/ERBB2 inhibitor (lapatinib) with DDR1/BCR-ABL inhibitor (nilotinib) in radioresistant spheroids of HCT116 (COAD) and, in an additional devastating primary cancer model, glioblastoma (GBM). GBMs overexpress DDR1 and share some common genomic features with COAD like EGFR amplification and WNT activation. Moreover, genetic alterations in genes like NF1 make GBMs have an intrinsically high KRAS activity. We show the combination of nilotinib plus lapatinib to exhibit more potent cytotoxic efficacy than either of the drugs administered alone in tumoroids of patient-derived recurrent GBMs. Collectively, our findings suggest that combinatorial targeting of DDR1/BCR-ABL with EGFR-ERBB2 signaling may offer a therapeutic strategy against stem-like KRAS-driven chemoradioresistant tumors of COAD and GBM, widening the window for its applications in mainstream cancer therapeutics.

  13. d

    Dr (Colon Cancer)

    • search.dataone.org
    • datadryad.org
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QunGuang Jiang; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li (2025). Dr (Colon Cancer) [Dataset]. http://doi.org/10.5061/dryad.7pvmcvdpc
    Explore at:
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    QunGuang Jiang; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li
    Time period covered
    Jan 1, 2019
    Description

    Colon adenocarcinoma (COAD) is the commonest colon cancer exhibiting high mortality. Due to the association with cancers progression, long noncoding RNAs (lncRNAs) become prognostic biomarkers. This study, using relevant clinic information and expression profiles of lncRNA originating in The Cancer Genome Atlas database, aims to construct a prognostic lncRNA signature to estimate the prognosis for patients. In the training cohort, prognosis related lncRNAs were selected from differently expressed lncRNAs by univariate Cox analysis. Furthermore, the least absolute shrinkage and selection operator (LASSO) regress and multivariate Cox analysis were employed for identifying prognostic lncRNAs. The prognostic signature was constructed by those lncRNAs. Prognostic model was able to calculate each COAD patient’s risk score and split the patients to groups of low and high risk. Compared to the low-risk group, the high-risk group had significant poor prognosis. Then, the prognostic signature was...

  14. f

    DataSheet_1_Identification of necroptosis-related genes for predicting...

    • datasetcatalog.nlm.nih.gov
    Updated Nov 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying, Song-cheng; Meng, Lei; Xu, Aman; Wei, Zhi-jian; Wang, Ye; Chen, Zhang-ming; Lin, Ming-gui (2022). DataSheet_1_Identification of necroptosis-related genes for predicting prognosis and exploring immune infiltration landscape in colon adenocarcinoma.csv [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000244540
    Explore at:
    Dataset updated
    Nov 24, 2022
    Authors
    Ying, Song-cheng; Meng, Lei; Xu, Aman; Wei, Zhi-jian; Wang, Ye; Chen, Zhang-ming; Lin, Ming-gui
    Description

    BackgroundNecroptosis is a recently discovered form of cell death that plays an important role in the occurrence and development of colon adenocarcinoma (COAD). Our study aimed to construct a risk score model to predict the prognosis of patients with COAD based on necroptosis-related genes.MethodsThe gene expression data of COAD and normal colon samples were obtained from the Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). The least absolute shrinkage and selection operator (LASSO) Cox regression analysis was used to calculate the risk score based on prognostic necroptosis-related differentially expressed genes (DEGs). Based on the risk score, patients were classified into high- and low-risk groups. Then, nomogram models were built based on the risk score and clinicopathological features. Otherwise, the model was verified in the Gene Expression Omnibus (GEO) database. Additionally, the tumor microenvironment (TME) and the level of immune infiltration were evaluated by “ESTIMATE” and single-sample gene set enrichment analysis (ssGSEA). Functional enrichment analysis was carried out to explore the potential mechanism of necroptosis in COAD. Finally, the effect of necroptosis on colon cancer cells was explored through CCK8 and transwell assays. The expression of necroptosis-related genes in colon tissues and cells treated with necroptotic inducers (TNFα) and inhibitors (NEC-1) was evaluated by quantitative real-time polymerase chain reaction (qRT-PCR).ResultsThe risk score was an independent prognostic risk factor in COAD. The predictive value of the nomogram based on the risk score and clinicopathological features was superior to TNM staging. The effectiveness of the model was well validated in GSE152430. Immune and stromal scores were significantly elevated in the high-risk group. Moreover, necroptosis may influence the prognosis of COAD via influencing the cancer immune response. In in-vitro experiments, the inhibition of necroptosis can promote proliferation and invasion ability. Finally, the differential expression of necroptosis-related genes in 16 paired colon tissues and colon cancer cells was found.ConclusionA novel necroptosis-related gene signature for forecasting the prognosis of COAD has been constructed, which possesses favorable predictive ability and offers ideas for the necroptosis-associated development of COAD.

  15. COAD paired sample gene level read counts

    • commons.datacite.org
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD paired sample gene level read counts [Dataset]. http://doi.org/10.6084/m9.figshare.1061501.v1
    Explore at:
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    DataCite
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD paired sample gene level read counts from Level 3 RNASeq-v2 data.

  16. DICOM converted Slide Microscopy images for the TCGA-LIHC collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-LIHC collection [Dataset]. http://doi.org/10.5281/zenodo.12690003
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-LIHC. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Imaging Program (CIP) is working directly with primary investigators from institutes participating in TCGA to obtain and load images relating to the genomic, clinical, and pathological data being stored within the TCGA Data Portal. Currently this CT and MR multi-sequence image collection of liver hepatocellular carcinoma (LIHC) patients can be matched by each unique case identifier with the extensive gene and expression data of the same case from The Cancer Genome Atlas Data Portal to research the link between clinical phenome and tissue genome.


    TCGA-LIHC page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_lihc-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_lihc-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_lihc-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  17. PIVOT - COAD (light)

    • zenodo.org
    application/gzip
    Updated Jan 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy (2022). PIVOT - COAD (light) [Dataset]. http://doi.org/10.5281/zenodo.5898163
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 25, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Malvika Sudhakar; Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman; Karthik Raman; Raghunathan Rengaswamy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed TCGA COAD data used for PIVOT analysis.

  18. Esophageal Cancer Dataset

    • kaggle.com
    zip
    Updated Oct 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinaba Biswas (2024). Esophageal Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/abhinaba1biswas/esophageal-cancer-dataset/code
    Explore at:
    zip(407333 bytes)Available download formats
    Dataset updated
    Oct 14, 2024
    Authors
    Abhinaba Biswas
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Esophageal Cancer Dataset

    Introduction:

    Esophageal cancer remains one of the most aggressive cancers with a high mortality rate worldwide, presenting significant challenges for early detection and effective treatment. To support the global fight against this disease, we introduce a comprehensive clinical dataset on esophageal cancer, available on Kaggle. This dataset includes patient demographics, clinical data, and cancer-specific attributes that can be leveraged to develop AI models for detection, prognosis, and treatment planning.

    Scientific Overview:

    This dataset is a valuable resource for healthcare professionals and researchers working on cancer detection, personalized treatments, and prognosis models. It includes: - Patient demographics (e.g., age, gender) - Tumor histology and staging information - Treatment history - Lymph node examination results These real-world clinical attributes provide a robust foundation for AI-driven solutions in the diagnosis and treatment of esophageal cancer.

    Dataset Composition:

    1. Patient Demographics:

    • Patient Barcode: Unique patient identifier.
    • Tissue Source Site: Code indicating the site from which the tissue sample was sourced.
    • Age at Diagnosis: Facilitates age-based studies on incidence and outcomes.
    • Gender: Enables gender-specific analysis of disease progression.
    • Informed Consent Verified: Indicates whether informed consent was obtained. ### 2. Medical and Clinical History:
    • ICD-10 and ICD-O-3 Codes: Provides International Classification of Diseases codes for the site and histology, essential for understanding tumor characteristics (e.g., squamous cell carcinoma, adenocarcinoma).
    • Comorbidities: Includes information on the presence of other chronic diseases like Gastroesophageal Reflux Disease (GERD) that could impact treatment outcomes.
    • Smoking Status: Critical for evaluating the impact of smoking on esophageal cancer risk and prognosis. ### 3. Cancer-Specific Data:
    • Tumor Location: Identifies the part of the esophagus affected (e.g., upper, middle, or lower).
    • Histology: Details the type of cancer (e.g., squamous cell carcinoma, adenocarcinoma).
    • Cancer Stage: Describes the stage of cancer at diagnosis (Stages 0 to IV).
    • Residual Tumor Status: Indicates whether any tumors remained post-surgery (e.g., R0, R1).
    • Lymph Node Examination: Information such as the number of lymph nodes examined and those positive for metastasis.
    • Radiation Therapy and Postoperative Treatment: Indicates whether the patient received radiation therapy and additional postoperative treatments. ### 4. Clinical Outcome Data:
    • Karnofsky Performance Score: Assesses the patient's ability to perform daily activities.
    • Eastern Cooperative Oncology Group (ECOG) Performance Status: Evaluates the functional status of cancer patients. ## Implementation Guide: ### 1. Data Preprocessing:
    • Data Cleaning: Remove irrelevant or redundant entries and ensure consistency across the dataset (e.g., handling missing values in performance scores and treatment history).
    • Normalization: Standardize clinical data for model input, especially for numerical variables like age, lymph node count, and performance scores. ### 2. Model Training:
    • Frameworks: Use machine learning or deep learning frameworks such as TensorFlow, PyTorch, or scikit-learn.
    • Model Selection: Depending on dataset complexity, models like Decision Trees, Random Forests, or Neural Networks can be used.
    • Evaluation: Measure model performance using metrics like accuracy, precision, recall, and F1-score. ### 3. Deployment:
    • Clinical Decision Support: Integrate the trained model into tools for medical professionals, offering predictions or insights to support diagnosis and treatment planning for esophageal cancer.
    • Testing and Feedback: Test the model for accuracy and usability, incorporating a feedback loop to continuously improve model performance.

    Potential Applications:

    1. Machine Learning Models:

    • Ideal for developing algorithms for early detection, personalized treatment plans, and prognosis prediction. ### 2. Healthcare Insights:
    • Assists clinicians in optimizing patient care strategies and treatment protocols. ### 3. Academic Research:
    • Facilitates studies on the pathophysiology of esophageal cancer, risk factor assessment, and the effectiveness of various treatments.

    Conclusion:

    The Esophageal Cancer Dataset provides high-quality, comprehensive clinical data, essential for advancing research in esophageal cancer detection, treatment, and prognosis. We encourage the research community to utilize this dataset to drive innovation and improve patient outcomes.

    Team:

    • Mr. Abhinaba Biswas, Student/Aspiring Data Analyst/ML Developer, JIS College of Engineering, Kalyani, Wes...
  19. Colon cancer

    • kaggle.com
    zip
    Updated May 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AngeValli (2023). Colon cancer [Dataset]. https://www.kaggle.com/datasets/angevalli/colon-cancer/code
    Explore at:
    zip(607690 bytes)Available download formats
    Dataset updated
    May 19, 2023
    Authors
    AngeValli
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Colon cancer dataset of high dimension with lot of null values, for the study of dimension reduction techniques. Useful for random projections techniques. Comparison of computation time on logistic regression. To compare with sector scale dataset.

  20. Gene Expression Cancer RNA-Seq

    • kaggle.com
    zip
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alban NYANTUDRE (2025). Gene Expression Cancer RNA-Seq [Dataset]. https://www.kaggle.com/datasets/waalbannyantudre/gene-expression-cancer-rna-seq-donated-on-682016
    Explore at:
    zip(73984306 bytes)Available download formats
    Dataset updated
    May 27, 2025
    Authors
    Alban NYANTUDRE
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This collection of data is part of the RNA-Seq (HiSeq) PANCAN dataset. It is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD, and PRAD. Each sample contains the expression of 20,531 genes for a patient diagnosed with one of the following cancers:

    CodeTumor Name
    BRCABreast invasive carcinoma (breast cancer)
    KIRCKidney renal clear cell carcinoma (kidney)
    COADColon adenocarcinoma (colon)
    LUADLung adenocarcinoma (lung)
    PRADProstate adenocarcinoma (prostate)

    Files:

    • data.csv: Gene expression matrix X (881 samples × 20,531 genes)
    • label.csv: True class label for each sample y (881 labels)

    Source: UCI ML Repository – Gene Expression Cancer RNA-Seq Data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
Organization logo

Historical NCI Genomic Data Commons data (09-14-2017)

Explore at:
tsvAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inge Seim; Inge Seim
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

TCGA-COAD.GDC_phenotype.tsv

dataset: phenotype - Phenotype

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
samples570
version11-27-2017
hubhttps://gdc.xenahubs.net
type of dataphenotype
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
raw datahttps://api.gdc.cancer.gov/data/
input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
570 samples X 151 identifiersAll IdentifiersAll Samples

TCGA-COAD.htseq_fpkm-uq.tsv

dataset: gene expression RNAseq - HTSeq - FPKM-UQ

cohortGDC TCGA Colon Cancer (COAD)
dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
samples512
version09-14-2017
hubhttps://gdc.xenahubs.net
type of datagene expression RNAseq
unitlog2(fpkm-uq+1)
platformIllumina
ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
authorGenomic Data Commons
raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
raw datahttps://api.gdc.cancer.gov/data/
wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
60,484 identifiers X 512 samples

Search
Clear search
Close search
Google apps
Main menu