100+ datasets found
  1. c

    The Cancer Genome Atlas Colon Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  2. Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky (2023). Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2021.782699.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactions of the extracellular matrix (ECM) and cellular receptors constitute one of the crucial pathways involved in colorectal cancer progression and metastasis. With the use of bioinformatics analysis, we comprehensively evaluated the prognostic information concentrated in the genes from this pathway. First, we constructed a ECM–receptor regulatory network by integrating the transcription factor (TF) and 5’-isomiR interaction databases with mRNA/miRNA-seq data from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD). Notably, one-third of interactions mediated by 5’-isomiRs was represented by noncanonical isomiRs (isomiRs, whose 5’-end sequence did not match with the canonical miRBase version). Then, exhaustive search-based feature selection was used to fit prognostic signatures composed of nodes from the network for overall survival prediction. Two reliable prognostic signatures were identified and validated on the independent The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) cohort. The first signature was made up by six genes, directly involved in ECM–receptor interaction: AGRN, DAG1, FN1, ITGA5, THBS3, and TNC (concordance index 0.61, logrank test p = 0.0164, 3-years ROC AUC = 0.68). The second hybrid signature was composed of three regulators: hsa-miR-32-5p, NR1H2, and SNAI1 (concordance index 0.64, logrank test p = 0.0229, 3-years ROC AUC = 0.71). While hsa-miR-32-5p exclusively regulated ECM-related genes (COL1A2 and ITGA5), NR1H2 and SNAI1 also targeted other pathways (adhesion, cell cycle, and cell division). Concordant distributions of the respective risk scores across four stages of colorectal cancer and adjacent normal mucosa additionally confirmed reliability of the models.

  3. Z

    Historical NCI Genomic Data Commons data (09-14-2017)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seim, Inge (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1186944
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Seim, Inge
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

    TCGA-COAD.GDC_phenotype.tsv

    dataset: phenotype - Phenotype

    cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata samples570 version11-27-2017 hubhttps://gdc.xenahubs.net type of dataphenotype authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90 raw datahttps://api.gdc.cancer.gov/data/ input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix) 570 samples X 151 identifiersAll IdentifiersAll Samples

    TCGA-COAD.htseq_fpkm-uq.tsv

    dataset: gene expression RNAseq - HTSeq - FPKM-UQ

    cohortGDC TCGA Colon Cancer (COAD) dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata samples512 version09-14-2017 hubhttps://gdc.xenahubs.net type of datagene expression RNAseq unitlog2(fpkm-uq+1) platformIllumina ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata authorGenomic Data Commons raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80 raw datahttps://api.gdc.cancer.gov/data/ wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed. input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix) 60,484 identifiers X 512 samples

  4. DICOM converted Slide Microscopy images for the TCGA-TGCT collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-TGCT collection [Dataset]. http://doi.org/10.5281/zenodo.12689996
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-TGCT. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    More than 90% of testicular cancer start in the germ cells, which are cells in the testicles and develop into sperm. This type of cancer is known as testicular germ cell cancer. Testicular germ cell cancer can be classified as either seminomas or nonseminomas, which may be identified by microscopy. Nonseminomas typically grow and spread more quickly than seminomas. A testicular germ cell tumor that contains a mix of both these subtypes is classified as a nonseminoma. TCGA studied both seminomas and nonseminomas.

    Testicular germ cell cancer is rare, comprising 1-2% of all tumors in males. However, it is the most common cancer in men ages 15 to 35. The incidence of testicular germ cell cancer has been continuously rising in many countries, including Europe and the U.S. In 2013, about 8,000 American men were estimated to be diagnosed with the cancer. Of those, 370 are predicted to die from the disease. Men who are Caucasian, have an undescended testicle, abnormally developed testicles, or a family history of testicular cancer have a greater risk of developing testicular cancer. Fortunately, testicular germ cell cancer is highly treatable.

    Please see the TCGA-TGCT information page to learn more about the images and to obtain any supporting metadata for this collection.

    Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_tgct-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_tgct-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_tgct-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  5. Pan-Cancer-Nuclei-Seg-DICOM: DICOM converted Dataset of Segmented Nuclei in...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Bridge; Markus Herrmann; David Clunie; David Clunie; Andrey Fedorov; Andrey Fedorov; Christopher Bridge; Markus Herrmann (2024). Pan-Cancer-Nuclei-Seg-DICOM: DICOM converted Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images [Dataset]. http://doi.org/10.5281/zenodo.11099005
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher Bridge; Markus Herrmann; David Clunie; David Clunie; Andrey Fedorov; Andrey Fedorov; Christopher Bridge; Markus Herrmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: Pan-Cancer-Nuclei-Seg-DICOM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    This collection contains automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types earlier published in [2] (https://doi.org/10.7937/TCIA.2019.4A4DKP9U) stored in DICOM Bulk Annotation format. Nuclei annotations are stored as closed polygons along with the area of each nuclei. The annotations correspond to digital pathology images from the TCGA-BLCA,TCGA-BRCA,TCGA-CESC,TCGA-COAD,TCGA-GBM,TCGA-LUAD,TCGA-LUSC,TCGA-PAAD,TCGA-PRAD,TCGA-READ,TCGA-SKCM,TCGA-STAD,TCGA-UCEC,TCGA-UVM collections available in NCI Imaging Data Commons.
    To learn how these files are organized and how to access the content programmatically, see this documentation page: https://highdicom.readthedocs.io/en/latest/ann.html.
    Conversion of the nuclei segmentations from the original CSV format into DICOM ANN format was done using the code available in 10.5281/zenodo.10632181.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pan_cancer_nuclei_seg_dicom-collection_id-idc_v19-aws.s5cmd corresponds to the annotations for th eimages in the collection_id collection introduced in IDC data release v19. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    For each of the collections, the following manifest files are provided:

    1. pan_cancer_nuclei_seg_dicom-: manifest of files available for download from public IDC Amazon Web Services buckets
    2. pan_cancer_nuclei_seg_dicom-: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. pan_cancer_nuclei_seg_dicom-: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).
    [2] Hou, L., Gupta, R., Van Arnam, J. S., Zhang, Y., Sivalenka, K., Samaras, D., Kurc, T., & Saltz, J. H. (2019). Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images of 10 Cancer Types [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.2019.4A4DKP9U
  6. o

    PIVOT - COAD (light)

    • explore.openaire.eu
    Updated Jan 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman (2022). PIVOT - COAD (light) [Dataset]. http://doi.org/10.5281/zenodo.5898163
    Explore at:
    Dataset updated
    Jan 24, 2022
    Authors
    Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman
    Description

    Pre-processed TCGA COAD data used for PIVOT analysis.

  7. f

    Data_Sheet_1_Identification and Validation of a Novel DNA Damage and DNA...

    • datasetcatalog.nlm.nih.gov
    Updated Feb 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye, Li-ping; Wang, Yi; Wu, Wei-dan; Mao, Xin-li; Piao, Song-zhe; Zhou, Xian-bin; Wang, Xue-quan; Li, Shao-wei; Wang, Wei; Xu, Shi-wen (2021). Data_Sheet_1_Identification and Validation of a Novel DNA Damage and DNA Repair Related Genes Based Signature for Colon Cancer Prognosis.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000783288
    Explore at:
    Dataset updated
    Feb 24, 2021
    Authors
    Ye, Li-ping; Wang, Yi; Wu, Wei-dan; Mao, Xin-li; Piao, Song-zhe; Zhou, Xian-bin; Wang, Xue-quan; Li, Shao-wei; Wang, Wei; Xu, Shi-wen
    Description

    Backgrounds: Colorectal cancer (CRC) with high incidence, has the third highest mortality of tumors. DNA damage and repair influence a variety of tumors. However, the role of these genes in colon cancer prognosis has been less systematically investigated. Here, we aim to establish a corresponding prognostic signature providing new therapeutic opportunities for CRC.Method: After related genes were collected from GSEA, univariate Cox regression was performed to evaluate each gene’s prognostic relevance through the TCGA-COAD dataset. Stepwise COX regression was used to establish a risk prediction model through the training sets randomly separated from the TCGA cohort and validated in the remaining testing sets and two GEO datasets (GSE17538 and GSE38832). A 12-DNA-damage-and-repair-related gene-based signature able to classify COAD patients into high and low-risk groups was developed. The predictive ability of the risk model or nomogram were evaluated by different bioinformatics‐ methods. Gene functional enrichment analysis was performed to analyze the co-expressed genes of the risk-based genes.Result: A 12-gene based prognostic signature established within 160 significant survival-related genes from DNA damage and repair related gene sets performed well with an AUC of ROC 0.80 for 5 years in the TCGA-CODA dataset. The signature includes CCNB3, ISY1, CDC25C, SMC1B, MC1R, LSP1P4, RIN2, TPM1, ELL3, POLG, CD36, and NEK4. Kaplan-Meier survival curves showed that the prognosis of the risk status owns more significant differences than T, M, N, and stage prognostic parameters. A nomogram was constructed by LASSO regression analysis with T, M, N, age, and risk as prognostic parameters. ROC curve, C-index, Calibration analysis, and Decision Curve Analysis showed the risk module and nomogram performed best in years 1, 3, and 5. KEGG, GO, and GSEA enrichment analyses suggest the risk involved in a variety of important biological processes and well-known cancer-related pathways. These differences may be the key factors affecting the final prognosis.Conclusion: The established gene signature for CRC prognosis provides a new molecular tool for clinical evaluation of prognosis, individualized diagnosis, and treatment. Therapies based on targeted DNA damage and repair mechanisms may formulate more sensitive and potential chemotherapy regimens, thereby expanding treatment options and potentially improving the clinical outcome of CRC patients.

  8. COAD paired sample gene level read counts

    • figshare.com
    • commons.datacite.org
    application/gzip
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD paired sample gene level read counts [Dataset]. http://doi.org/10.6084/m9.figshare.1061501.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD paired sample gene level read counts from Level 3 RNASeq-v2 data.

  9. TCGA-WSI-Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmood Yousaf 2018 (2024). TCGA-WSI-Dataset [Dataset]. https://www.kaggle.com/datasets/mahmoodyousaf2018/tcga-wsi-svs
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 25, 2024
    Authors
    Mahmood Yousaf 2018
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore the TCGA Whole Slide Image (WSI) SVS files available on Kaggle, offering detailed visual representations of tissue samples from various cancer types. These high-resolution images provide valuable insights into tumor morphology and tissue architecture, facilitating cancer diagnosis, prognosis, and treatment research. Delve into the rich landscape of cancer biology, leveraging the wealth of information contained within these SVS files to drive innovative advancements in oncology. This is a dataset of WSI images downloaded from the TCGA portal.

  10. c

    ROI Masks Defining Low-Grade Glioma Tumor Regions In the TCGA-LGG Image...

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    • +1more
    csv, matlab and zip +2
    Updated Mar 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2017). ROI Masks Defining Low-Grade Glioma Tumor Regions In the TCGA-LGG Image Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2017.BD7SGWCA
    Explore at:
    pdf, n/a, matlab and zip, csvAvailable download formats
    Dataset updated
    Mar 17, 2017
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Mar 17, 2017
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This collection contains 406 ROI masks in MATLAB format defining the low grade glioma (LGG) tumour region on T1-weighted (T1W), T2-weighted (T2W), T1-weighted post-contrast (T1CE) and T2-flair (T2F) MR images of 108 different patients from the TCGA-LGG collection. From this subset of 108 patients, 81 patients have ROI masks drawn for the four MRI sequences (T1W, T2W, T1CE and T2F), and 27 patients have ROI masks drawn for three or less of the four MRI sequences. The ROI masks were used to extract texture features in order to develop radiomic-based multivariable models for the prediction of isocitrate dehydrogenase 1 (IDH1) mutation, 1p/19q codeletion status, histological grade and tumour progression. Clinical data (188 patients in total from the TCGA-LGG collection, some incomplete depending on the clinical attribute), VASARI scores (188 patients in total from the TCGA-LGG collection, 178 complete) with feature keys, and source code used in this study are also available with this collection. Please contact Martin Vallières (mart.vallieres@gmail.com) of the Medical Physics Unit of McGill University for any scientific inquiries about this dataset.

  11. o

    A merged microarray meta-dataset for studying transcriptome-wide changes...

    • omicsdi.org
    xml
    Updated Feb 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael W Rohr (2021). A merged microarray meta-dataset for studying transcriptome-wide changes during neoplastic progression of colorectal cancer [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-MTAB-10089
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Feb 18, 2021
    Authors
    Michael W Rohr
    Variables measured
    Transcriptomics
    Description

    Transcriptional profiling of pre-malignant and malignant colorectal cancer lesions provides a means for temporally monitoring key molecular events underlying neoplastic progression. Unfortunately, the most widely used central dataset for colorectal cancer samples from The Cancer Genome Atlas (TCGA) does not contain adenoma samples, putting a greater reliance of in silico analyses and pre-clinical modelling on a handful of independent microarray experiments. Due to the differences in sample acquisition, preparation, downstream analysis and other parameters, results are often incongruent, hindering consensus building. Here, we developed a microarray meta-dataset consisting of 231 normal, 132 adenoma, and 342 colon cancer tissue samples (705 samples total) sourced from 12 independent microarray studies all using the Affymetrix HG U133 Plus 2.0 (GPL570) chip platform including GSE4183, GSE8671,GSE9348, GSE15960, GSE20916, GSE21510, GSE22598, GSE23194, GSE23878, GSE32323, GSE33113, and GSE37364. Individual datasets were pre-processed and normalized by frozen robust multiarray averaging (fRMA) before merging by matching probe sets. Batch effects were subsequently identified by Principal Component Analysis (PCA) and removed using ComBat. In addition, low variant probes were filtered from the meta-dataset before downstream analysis. Finally, biological signatures corresponding to cancer and adenoma samples were both quantitatively and functionally validated. Quantitative validation was performed by correlation analysis of LogFC values with the TCGA-COAD or other external GEO microarray datasets, respectively. Functional validation was carried out through predictive analyses using Ingenuity Pathway Analysis (IPA) and Gene Set Enrichment Analysis (GSEA). Overall, our meta-dataset provides a powerful tool for studying transcriptome-wide changes which occur during early dysplasia and malignant transformation of adenomas as well as colorectal cancer in general.

  12. FDR results.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Apr 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilaria Cosentini; Daniele Filippo Condorelli; Giorgio Locicero; Alfredo Ferro; Alfredo Pulvirenti; Vincenza Barresi; Salvatore Alaimo (2024). FDR results. [Dataset]. http://doi.org/10.1371/journal.pone.0301591.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ilaria Cosentini; Daniele Filippo Condorelli; Giorgio Locicero; Alfredo Ferro; Alfredo Pulvirenti; Vincenza Barresi; Salvatore Alaimo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multi-layer Complex networks are commonly used for modeling and analysing biological entities. This paper presents the advantage of using COMBO (Combining Multi Bio Omics) to suggest a new role of the chromosomal aberration as a cancer driver factor. Exploiting the heterogeneous multi-layer networks, COMBO integrates gene expression and DNA-methylation data in order to identify complex bilateral relationships between transcriptome and epigenome. We evaluated the multi-layer networks generated by COMBO on different TCGA cancer datasets (COAD, BLCA, BRCA, CESC, STAD) focusing on the effect of a specific chromosomal numerical aberration, broad gain in chromosome 20, on different cancer histotypes. In addition, the effect of chromosome 8q amplification was tested in the same TCGA cancer dataset. The results demonstrate the ability of COMBO to identify the chromosome 20 amplification cancer driver force in the different TCGA Pan Cancer project datasets.

  13. DICOM converted Slide Microscopy images for the TCGA-UVM collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-UVM collection [Dataset]. http://doi.org/10.5281/zenodo.12690042
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-UVM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    Uveal (intraocular or eye) melanoma develops in the pigment cells of the uvea, which is the middle layer of the eye. The uvea consists of three main parts: the iris, ciliary body, and choroid. Compared to tumors of the iris, tumors of the ciliary body and choroid tend to be larger and more likely to spread to other parts of the body. TCGA studied tumors from all three parts of the uvea.

    Please see the TCGA-UVM information page to learn more about the images and to obtain any supporting metadata for this collection.

    Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_uvm-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_uvm-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_uvm-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  14. HISTOPANTUME: Histological Pan-cancer Tumor image dataset

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neda Zamanitajeddin; Neda Zamanitajeddin; Mostafa Jahanifar; Mostafa Jahanifar; fouzia siraj; fouzia siraj; Nasir Rajpoot; Nasir Rajpoot (2025). HISTOPANTUME: Histological Pan-cancer Tumor image dataset [Dataset]. http://doi.org/10.5281/zenodo.14555794
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Neda Zamanitajeddin; Neda Zamanitajeddin; Mostafa Jahanifar; Mostafa Jahanifar; fouzia siraj; fouzia siraj; Nasir Rajpoot; Nasir Rajpoot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HISTOPANTUM is a comprehensive pan-cancer dataset of histology images categorized into Tumor and Non-Tumor classes over 4 different cancer types (domains). This dataset is designed to facilitate domain generalization analysis for tumor detection tasks, serving as a benchmark for foundation models and domain generalization algorithms.

    Dataset Overview

    The dataset comprises histology images sourced from The Cancer Genome Atlas (TCGA), spanning the following four cancer types:

    • Colorectal Cancer
    • Ovarian Cancer
    • Stomach Cancer
    • Uterus Cancer

    Image Specifications

    • Original Resolution: 512 × 512 pixels images are extracted from 0.5 micron-per-pixel resolution.
    • Processed Size: Images are resized to 224 × 224 pixels and saved as JPEG files.

    The dataset is provided in four zipped files, each corresponding to one cancer type. Within each zip file, images are organized into two subfolders:

    • tumour
    • non-tumour

    Each image filename encodes the originating slide and the patch position within the slide, following this naming convention:

    Citation

    If you use this dataset in your research, please cite the following publication:

    @article{zamanitajeddin2024benchmarking,
     title={Benchmarking Domain Generalization Algorithms in Computational Pathology},
     author={Zamanitajeddin, Neda and Jahanifar, Mostafa and Xu, Kesi and Siraj, Fouzia and Rajpoot, Nasir},
     journal={arXiv preprint arXiv:2409.17063},
     year={2024}
    }
    

    For further details, please refer to the linked publication.

  15. Z

    Dataset for tumor infiltrating lymphocyte classification (304,097 image...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaczmarzyk, Jakub R (2022). Dataset for tumor infiltrating lymphocyte classification (304,097 image patches from TCGA) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6604093
    Explore at:
    Dataset updated
    Jun 10, 2022
    Dataset provided by
    Kurc, Tahsin
    Abousamra, Shahira
    Kaczmarzyk, Jakub R
    Gupta, Rajarsi
    Saltz, Joel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of images with or without tumor-infiltrating lymphocytes (TILs). The original images are from Abousamra et al. (2022) and Saltz et al. (2018), and the original whole slide images are from TCGA. This dataset is a subset of the data presented in Abousamra et al. (2022) (with new data partitions).

    If you use this dataset, please cite the following papers, as well as this Zenodo page.

    Abousamra, S., Gupta, M. D., Hou, L., Batiste, R., Zhao, T., Shankar, A., Rao, A., Chen, C., Samaras, D., Kurc, T., & Saltz, J. (2022). Deep Learning-Based Mapping of Tumor Infiltrating Lymphocytes in Whole Slide Images of 23 Types of Cancer. Frontiers in Oncology, 5971. https://doi.org/10.3389/fonc.2021.806603

    Saltz, J., Gupta, R., Hou, L., Kurc, T., Singh, P., Nguyen, V., Samaras, D., Shroyer, K. R., Zhao, T., Batiste, R., & Danilova, L. (2018). Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Reports, 23(1), 181-193.

    The acknowledgements from the Frontiers in Oncology and Cell Reports papers are included below:

    This work was supported by the National Institutes of Health (NIH) and National Cancer Institute (NCI) grants UH3-CA22502103, U24-CA21510904, 1U24CA180924-01A1, 3U24CA215109-02, and 1UG3CA225021-01 as well as generous private support from Bob Beals and Betsy Barton. AR and AS were partially supported by NCI grant R37-CA214955 (to AR), the University of Michigan (U-M) institutional research funds and also supported by ACS grant RSG-16-005-01 (to AR). AS was supported by the Biomedical Informatics & Data Science Training Grant (T32GM141746). This work was enabled by computational resources supported by National Science Foundation grant number ACI-1548562, providing access to the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center, and also a DOE INCITE award joint with the MENNDL team at the Oak Ridge National Laboratory, providing access to Summit high performance computing system. The funders were not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

    We are grateful to all the patients and families who contributed to this study. Funding from the Cancer Research Institute is gratefully acknowledged, as is support from National Cancer Institute (NCI) through U54 HG003273, U54 HG003067, U54 HG003079, U24 CA143799, U24 CA143835, U24 CA143840, U24 CA143843, U24 CA143845,U24 CA143848, U24 CA143858, U24 CA143866, U24 CA143867, U24 CA143882, U24 CA143883, U24 CA144025, P30 CA016672, U24CA180924, U24CA210950, U24CA215109, NCI Contract HHSN261201400007C, and Leidos Biomedical Contract 14X138. A.U.K.R. and P.S were supported by CCSG Bioinformatics Shared Resource P30 CA01667, ITCR U24 Supplement 1U24CA199461-01, a gift from Agilent technologies, CPRIT RP150578, and a Research Scholar Grant from the American Cancer Society (RSG-16-005-01). This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation XSEDE Science Gateways program under grant ACI-1548562 allocation TG-ASC130023. The authors would like to thank Stony Brook Research Computing and Cyberinfrastructure and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance LIred and SeaWulf computing systems, the latter of which was supported by National Science Foundation grant (#1531492).

    This dataset includes 304,097 image patches. All images are 100 x 100 pixels at 0.5 micrometers per pixel. An image is TIL-positive if there are at least two TILs present.

    Refer to images-tcga-tils-metadata.csv for information about each image. That spreadsheet has the following columns:

    partition,study,barcode,label,path,md5

    Partition specifies which partition the image is part of (train, val, test). Study is the TCGA study the image is part of (e.g., acc for TCGA-ACC). Barcode is the TCGA participant barcode. This is used during partitioning, to ensure that images from the same participant are not present in different data partitions. Label is either til-negative or til-positive. An image is til-positive if there are at least two TILs in the image. Path is the path to the PNG image. All images are stored as PNG. Md5 is the md5 hash of the image. This can be used to ensure there are no duplicate images and to verify the integrity of images.

    There are study-specific directories in the directory images-tcga-tils, and there is a directory named pancancer that includes images from all the included TCGA studies. That directory uses symlinks to avoid storing duplicate data.

  16. COAD non-paired sample isoform level read counts

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    application/gzip
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD non-paired sample isoform level read counts [Dataset]. http://doi.org/10.6084/m9.figshare.1059126.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD non-paired sample isoform level read counts from Level 3 RNASeq-v2 data.

  17. Mitosis Dataset for TCGA Diagnostic Slides

    • zenodo.org
    application/gzip
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mostafa Jahanifar; Mostafa Jahanifar (2024). Mitosis Dataset for TCGA Diagnostic Slides [Dataset]. http://doi.org/10.5281/zenodo.14548480
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mostafa Jahanifar; Mostafa Jahanifar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mitosis Detections and Mitotic Network in TCGA

    This dataset contains mitosis detections, mitotic network structures, and social network analysis (SNA) measures derived from 11,161 diagnostic slides in The Cancer Genome Atlas (TCGA). Mitoses were automatically identified using the MDFS algorithm [1], and each detected mitosis was converted into a node within a mitotic network. The resulting graphs are provided in JSON format, with each file representing a single diagnostic slide.

    JSON Data Format

    Each JSON file contains four primary fields:

    1. edge_index
      Two parallel lists representing edges between nodes. The ii-th element in the first list corresponds to the source node index, and the ii-th element in the second list is the target node index.

    2. coordinates
      A list of [x, y] positions for each node (mitosis). The (x,y) coordinates can be used for spatial visualization or further spatial analyses.

    3. feats
      A list of feature vectors, with each row corresponding to a node. These features include:

      • type (an integer representing mitosis type. 1: typical mitosis, 2: atypical mitosis)
      • Node_Degree (the number of nodes connected to the node)
      • Clustering_Coeff (clustering coefficient of the node)
      • Harmonic_Cen (Harmonic centrality of the node)
    4. feat_names
      The names of the features in feats. The order matches the columns in each node’s feature vector.

    Example JSON Snippet

    {
     "edge_index": [[1, 2, 6, 10], [2, 4, 8, 11]],
     "coordinates": [[27689.0, 12005.0], [24517.0, 17809.0], ...],
     "feats": [[1.0, 0.0, 0.0, 0.0], [1.0, 1.0, 0.0, 0.115], ...],
     "feat_names": ["type", "Node_Degree", "Clustering_Coeff", "Harmonic_Cen"]
    }
    

    Loading Data into NumPy

    Below is a sample Python snippet to load one JSON file, extract node coordinates and the type feature, and combine them into a single NumPy array:

    import json
    import numpy as np
    
    # Path to your JSON file
    json_file_path = "example_graph.json"
    
    with open(json_file_path, 'r') as f:
      data = json.load(f)
    
    # Convert coordinates to NumPy
    coordinates = np.array(data["coordinates"])
    
    # Identify the "type" column
    feat_names = data["feat_names"]
    type_index = feat_names.index("type")
    
    # Extract features and isolate the "type" column
    feats = np.array(data["feats"])
    node_types = feats[:, type_index].reshape(-1, 1)
    
    # Combine x, y, and type into a single array (N x 3)
    combined_data = np.hstack([coordinates, node_types])
    
    print(combined_data)
    

    Building a NetworkX Graph

    To visualize or analyze the network structure, you can construct a NetworkX graph as follows:

    import json
    import networkx as nx
    import matplotlib.pyplot as plt
    
    json_file_path = "example_graph.json"
    
    with open(json_file_path, "r") as f:
      data = json.load(f)
    
    # Create a NetworkX Graph
    G = nx.Graph()
    
    # Add each node with position attributes
    for i, (x, y) in enumerate(data["coordinates"]):
      G.add_node(i, pos=(x, y))
    
    # Add edges using the parallel lists in edge_index
    # (Adjust for 1-based indexing if necessary)
    for src, dst in zip(data["edge_index"][0], data["edge_index"][1]):
      G.add_edge(src, dst)
    

    Visualizing mitotic network using TIAToolbox

    Having TIAToolbox installed, one can easily visualize the mitotic network on their respective whole slide images using the following command:

    tiatoolbox visualize --slides path/to/slides --overlays path/to/overlays

    The only thing to consider is that slides and overlays (provided graph json files) should have the same name. For more information, please refer to Visualization Interface Usage - TIA Toolbox 1.5.1 Documentation.

    In case of using this dataset, please cite the following publication:

    @article{jahanifar2024mitosis,
     title={Mitosis detection, fast and slow: robust and efficient detection of mitotic figures},
     author={Jahanifar, Mostafa and Shephard, Adam and Zamanitajeddin, Neda and Graham, Simon and Raza, Shan E Ahmed and Minhas, Fayyaz and Rajpoot, Nasir},
     journal={Medical Image Analysis},
     volume={94},
     pages={103132},
     year={2024},
     publisher={Elsevier}
    }
  18. f

    DataSheet_1_Comprehensive analysis of the prognosis, tumor microenvironment,...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han Nan; Pengkun Guo; Jianing Fan; Wen Zeng; Chonghan Hu; Can Zheng; Bujian Pan; Yu Cao; Yiwen Ge; Xiangyang Xue; Wenshu Li; Kezhi Lin (2023). DataSheet_1_Comprehensive analysis of the prognosis, tumor microenvironment, and immunotherapy response of SDHs in colon adenocarcinoma.pdf [Dataset]. http://doi.org/10.3389/fimmu.2023.1093974.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Han Nan; Pengkun Guo; Jianing Fan; Wen Zeng; Chonghan Hu; Can Zheng; Bujian Pan; Yu Cao; Yiwen Ge; Xiangyang Xue; Wenshu Li; Kezhi Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundSuccinate dehydrogenase (SDH), one of the key enzymes in the tricarboxylic acid cycle, is mainly found in the mitochondria. SDH consists of four subunits encoding SDHA, SDHB, SDHC, and SDHD. The biological function of SDH is significantly related to cancer progression. Colorectal cancer (CRC) is one of the most common malignant tumors globally, whose most common histological subtype is colon adenocarcinoma (COAD). However, the correlation between SDH factors and COAD remains unclear.MethodsThe data on pan-cancer was obtained from The Cancer Genome Atlas (TCGA) database. Kaplan-Meier survival analysis showed the prognostic ability of SDHs. The cBioPortal database reflected genetic variations of SDHs. The correlation analysis was conducted between SDHs and mitochondrial energy metabolism genes (MMGs) and the protein-protein interaction (PPI) network was built. Consequently, Univariate and Multivariate Cox Regression Analysis on SDHs and other clinical characteristics were conducted. A nomogram was established. The ssGSEA analysis visualized the association between SDHs and immune infiltration. Immunophenoscore (IPS) explored the correlation between SDHs and immunotherapy, and the correlation between SDHs and targeted therapy was investigated through Genomics of Drug Sensitivity in Cancer. Finally, qPCR and immunohistochemistry detected SDHs’ expression.ResultsAfter assessing SDHs differential expression in pan-cancer, we found that SDHB, SDHC, and SDHD benefit COAD patients. The cBioPortal database demonstrated that SDHA was the top gene in mutation frequency rank. Correlation analysis mirrored a strong link between SDHs and MMGs. We formulated a nomogram and found that SDHB, SDHC, SDHD, and clinical characteristics correlated with COAD patients’ survival. For T helper cells, Th2 cells, and Tem, SDHA, SDHB, SDHC, and SDHD were significantly enriched in the high expression group. Moreover, COAD patients with high SDHA expression were more suitable for immunotherapy. And COAD patients with different SDHs’ expression have different sensitivity to targeted drugs. Further verifying the gene and protein expression levels of SDHs, we found that the tissues were consistent with the bioinformatics analysis.ConclusionsOur study analyzed the expression and prognostic value of SDHs in COAD, explored the pathway mechanisms involved, and the immune cell correlations, indicating that SDHs might be biomarkers for COAD patients.

  19. h

    TCGA-OV-AS

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin, TCGA-OV-AS [Dataset]. https://huggingface.co/datasets/farrell236/TCGA-OV-AS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Benjamin
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Cancer Genome Atlas Ovarian Cancer for Ascites Segmentation (TCGA-OV-AS)

    This dataset was curated as part of the research 'Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification' (Paper, arXiv). To replicate TCGA-OV-AS, please download TCGA-OV from TCIA using the Descriptive Directory Name download option.

      Converting Images
    

    Convert the DICOMs to NIFTI format using dcm2niix and GNU parallel.

    Create the directory structure… See the full description on the dataset page: https://huggingface.co/datasets/farrell236/TCGA-OV-AS.

  20. f

    Results of GSVA for TCGA-COAD.

    • figshare.com
    xls
    Updated Jul 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongling Wang; Zan Yuan; Yi Lao; Jiangtao He; Shufen Mo; Kangbiao Chen; Yanyan Ye; Lu Huang (2025). Results of GSVA for TCGA-COAD. [Dataset]. http://doi.org/10.1371/journal.pone.0328560.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Yongling Wang; Zan Yuan; Yi Lao; Jiangtao He; Shufen Mo; Kangbiao Chen; Yanyan Ye; Lu Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe exact mechanisms driving colorectal cancer (CRC) are yet to be fully elucidated. This study aims to confirm the reliability of a prognostic model for colon adenocarcinoma (COAD) by analyzing the varied expression levels of Glycolysis & Pyroptosis-Related Differentially Expressed Genes (G&PRDEGs) in COAD using bioinformatics tools.MethodsWe retrieved gene expression data and clinical details for COAD patients from the Cancer Genome Atlas (TCGA) database. These data were analyzed to categorize the samples into pyroptosis-positive and pyroptosis-negative groups based on their expression of G&PRDEGs. A prognostic model for COAD was then developed using LASSO Cox regression analysis, focusing on these differentially expressed genes (DEGs). Kaplan-Meier curves were plotted to assess the differences in survival between the two groups. Furthermore, we conducted multivariate Cox regression analyses to evaluate the influence of clinical parameters and model-derived risk scores. Analyses of pathway enrichment were performed using R software, alongside single-sample gene-set enrichment analysis (ssGSEA) to explore the role of immune cells and functions associated with G&PRDEGs.ResultsA predictive model was developed using 53 G&PRDEGs that were expressed differentially. An examination of survival rates revealed that the high-risk groups exhibited a noticeably diminished overall survival (OS) in comparison to the low-risk groups in the TCGA database (P 

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ

The Cancer Genome Atlas Colon Adenocarcinoma Collection

TCGA-COAD

Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
dicom, n/aAvailable download formats
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

CIP TCGA Radiology Initiative

Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

Search
Clear search
Close search
Google apps
Main menu