100+ datasets found
  1. c

    The Cancer Genome Atlas Colon Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  2. Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky (2023). Table1_ECM–Receptor Regulatory Network and Its Prognostic Role in Colorectal Cancer.XLSX [Dataset]. http://doi.org/10.3389/fgene.2021.782699.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Stepan Nersisyan; Victor Novosad; Narek Engibaryan; Yuri Ushkaryov; Sergey Nikulin; Alexander Tonevitsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactions of the extracellular matrix (ECM) and cellular receptors constitute one of the crucial pathways involved in colorectal cancer progression and metastasis. With the use of bioinformatics analysis, we comprehensively evaluated the prognostic information concentrated in the genes from this pathway. First, we constructed a ECM–receptor regulatory network by integrating the transcription factor (TF) and 5’-isomiR interaction databases with mRNA/miRNA-seq data from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD). Notably, one-third of interactions mediated by 5’-isomiRs was represented by noncanonical isomiRs (isomiRs, whose 5’-end sequence did not match with the canonical miRBase version). Then, exhaustive search-based feature selection was used to fit prognostic signatures composed of nodes from the network for overall survival prediction. Two reliable prognostic signatures were identified and validated on the independent The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) cohort. The first signature was made up by six genes, directly involved in ECM–receptor interaction: AGRN, DAG1, FN1, ITGA5, THBS3, and TNC (concordance index 0.61, logrank test p = 0.0164, 3-years ROC AUC = 0.68). The second hybrid signature was composed of three regulators: hsa-miR-32-5p, NR1H2, and SNAI1 (concordance index 0.64, logrank test p = 0.0229, 3-years ROC AUC = 0.71). While hsa-miR-32-5p exclusively regulated ECM-related genes (COL1A2 and ITGA5), NR1H2 and SNAI1 also targeted other pathways (adhesion, cell cycle, and cell division). Concordant distributions of the respective risk scores across four stages of colorectal cancer and adjacent normal mucosa additionally confirmed reliability of the models.

  3. c

    The Clinical Proteomic Tumor Analysis Consortium Colon Adenocarcinoma...

    • dev.cancerimagingarchive.net
    • cancerimagingarchive.net
    n/a, svs
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Clinical Proteomic Tumor Analysis Consortium Colon Adenocarcinoma Collection [Dataset]. https://dev.cancerimagingarchive.net/collection/cptac-coad/
    Explore at:
    svs, n/aAvailable download formats
    Dataset updated
    Feb 2, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Feb 2, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This collection contains subjects from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium CPTAC Colon Adenocarcinoma cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Radiology and pathology images from CPTAC patients are being collected and made publicly available by The Cancer Imaging Archive to enable researchers to investigate cancer phenotypes which may correlate to corresponding proteomic, genomic and clinical data.

    Imaging from each cancer type will be contained in its own TCIA Collection, with the collection name "CPTAC-cancertype". Radiology imaging is collected from standard of care imaging performed on patients immediately before the pathological diagnosis, and from follow-up scans where available. For this reason the radiology image data sets are heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. Pathology imaging is collected as part of the CPTAC qualification workflow.

    All CPTAC cohorts are released as either a single combined cohort, or split into Discovery and Confirmatory where applicable. There are two main types of proteomic studies: discovery proteomics and targeted proteomics. The term "discovery proteomics" is in reference to "untargeted" identification and quantification of a maximal number of proteins in a biological or clinical sample. The term “targeted proteomics” refers to quantitative measurements on a defined subset of total proteins in a biological or clinical sample, often following the completion of discovery proteomics studies to confirm interesting targets selected. Commonly used proteomic technologies and platforms are different types of mass spectrometry and protein microarrays depending on the needs, throughput and sample input requirement of an analysis, with further development on nanotechnologies and automation in the pipeline in order to improve the detection of low abundance proteins, increase throughput, and selectively reach a target protein in vivo. Once the protein targets of interest are identified, high-throughput targeted assays are developed for confirmatory studies: tests to affirm that the initial tests were accurate. A summary of CPTAC imaging efforts can be found on the CPTAC Imaging Proteomics page.

    CPTAC Imaging Special Interest Group

    You can join the CPTAC Imaging Special Interest Group to be notified of webinars & data releases, collaborate on common data wrangling tasks and seek out partners to explore research hypotheses! Artifacts from previous webinars such as slide decks and video recordings can be found on the CPTAC SIG Webinars page.

  4. Historical NCI Genomic Data Commons data (09-14-2017)

    • zenodo.org
    • data.niaid.nih.gov
    tsv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inge Seim; Inge Seim (2020). Historical NCI Genomic Data Commons data (09-14-2017) [Dataset]. http://doi.org/10.5281/zenodo.1186945
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inge Seim; Inge Seim
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Historical NCI Genomic Data Commons data (v09-14-2017). Clinical ('phenotype') and gene expression (HTSeq FPKM-UQ).

    TCGA-COAD.GDC_phenotype.tsv

    dataset: phenotype - Phenotype

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.GDC_phenotype.tsv.gz; Full metadata
    samples570
    version11-27-2017
    hubhttps://gdc.xenahubs.net
    type of dataphenotype
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-90
    raw datahttps://api.gdc.cancer.gov/data/
    input data formatROWs (samples) x COLUMNs (identifiers) (i.e. clinicalMatrix)
    570 samples X 151 identifiersAll IdentifiersAll Samples

    TCGA-COAD.htseq_fpkm-uq.tsv

    dataset: gene expression RNAseq - HTSeq - FPKM-UQ

    cohortGDC TCGA Colon Cancer (COAD)
    dataset IDTCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv
    downloadhttps://gdc.xenahubs.net/download/TCGA-COAD/Xena_Matrices/TCGA-COAD.htseq_fpkm-uq.tsv.gz; Full metadata
    samples512
    version09-14-2017
    hubhttps://gdc.xenahubs.net
    type of datagene expression RNAseq
    unitlog2(fpkm-uq+1)
    platformIllumina
    ID/Gene Mappinghttps://gdc.xenahubs.net/download/probeMaps/gencode.v22.annotation.gene.probeMap.gz; Full metadata
    authorGenomic Data Commons
    raw datahttps://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-80
    raw datahttps://api.gdc.cancer.gov/data/
    wranglingData from the same sample but from different vials/portions/analytes/aliquotes is averaged; data from different samples is combined into genomicMatrix; all data is then log2(x+1) transformed.
    input data formatROWs (identifiers) x COLUMNs (samples) (i.e. genomicMatrix)
    60,484 identifiers X 512 samples

  5. f

    COAD paired sample isoform level read counts

    • figshare.com
    application/gzip
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD paired sample isoform level read counts [Dataset]. http://doi.org/10.6084/m9.figshare.1059127.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD paired sample isoform level read counts from Level 3 RNASeq-v2 data.

  6. h

    TCGA-Cancer-Variant-and-Clinical-Data

    • huggingface.co
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seq-to-Pheno (2024). TCGA-Cancer-Variant-and-Clinical-Data [Dataset]. https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Seq-to-Pheno
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TCGA Cancer Variant and Clinical Data

      Dataset Description
    

    This dataset combines genetic variant information at the protein level with clinical data from The Cancer Genome Atlas (TCGA) project, curated by the International Cancer Genome Consortium (ICGC). It provides a comprehensive view of protein-altering mutations and clinical characteristics across various cancer types.

      Dataset Summary
    

    The dataset includes:

    Protein sequence data for both mutated and… See the full description on the dataset page: https://huggingface.co/datasets/seq-to-pheno/TCGA-Cancer-Variant-and-Clinical-Data.

  7. Dr (Colon Cancer)

    • zenodo.org
    • search.dataone.org
    • +1more
    zip
    Updated Jul 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QunGuang Jiang; QunGuang Jiang; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li (2022). Dr (Colon Cancer) [Dataset]. http://doi.org/10.5061/dryad.7pvmcvdpc
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    QunGuang Jiang; QunGuang Jiang; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li; Xiaorui Fu; Jinzhong Duanmu; Taiyuan Li
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Colon adenocarcinoma (COAD) is the commonest colon cancer exhibiting high mortality. Due to the association with cancers progression, long noncoding RNAs (lncRNAs) become prognostic biomarkers. This study, using relevant clinic information and expression profiles of lncRNA originating in The Cancer Genome Atlas database, aims to construct a prognostic lncRNA signature to estimate the prognosis for patients. In the training cohort, prognosis related lncRNAs were selected from differently expressed lncRNAs by univariate Cox analysis. Furthermore, the least absolute shrinkage and selection operator (LASSO) regress and multivariate Cox analysis were employed for identifying prognostic lncRNAs. The prognostic signature was constructed by those lncRNAs. Prognostic model was able to calculate each COAD patient's risk score and split the patients to groups of low and high risk. Compared to the low-risk group, the high-risk group had significant poor prognosis. Then, the prognostic signature was validated in validation and all cohorts. The receiver operating characteristic (ROC) curve and c-index were performed in all cohort. Moreover, those prognostic lncRNAs signature were combined with clinicopathological risk factors to construct a nomogram for predicting the prognosis of COAD in clinic. Finally, 7 lncRNAs (CTC-273B12.10, AC009404.2, AC073283.7, RP11-167H9.4, AC007879.7, RP4-816N1.7, RP11-400N13.2) were identified and validated by different cohorts. The Kyoto Encyclopedia of Genes and Genomes analysis of the mRNAs co-expressed with 7 prognostic lncRNAs suggested 4 significantly up-regulated pathways, which are AGE-RAGE signaling pathway, focal adhesion, ECM-receptor interaction and PI3K/Akt signaling pathway. To sum up, our study verified that the mentioned 7 lncRNAs can be biomarkers to predict the prognosis of COAD patients and design personalized treatment.

  8. COAD paired sample gene level read counts

    • figshare.com
    • commons.datacite.org
    application/gzip
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD paired sample gene level read counts [Dataset]. http://doi.org/10.6084/m9.figshare.1061501.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD paired sample gene level read counts from Level 3 RNASeq-v2 data.

  9. DICOM converted Slide Microscopy images for the TCGA-LUAD collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-LUAD collection [Dataset]. http://doi.org/10.5281/zenodo.12689916
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-LUAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    The Cancer Imaging Program (CIP) is working directly with primary investigators from institutes participating in TCGA to obtain and load images relating to the genomic, clinical, and pathological data being stored within the TCGA Data Portal Currently this large CT multi-sequence image collection of lung adenocarcinoma (LUAD) patients can be matched by each unique case identifier with the extensive gene and expression data of the same case from The Cancer Genome Atlas Data Portal to research the link between clinical phenome and tissue genome.


    Please see the TCGA-LUAD page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. tcga_luad-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. tcga_luad-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. tcga_luad-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

  10. S

    Identification of New Gene Labels for Colon Cancer Prognosis Based on Random...

    • scidb.cn
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Identification of New Gene Labels for Colon Cancer Prognosis Based on Random Survival Forest Model [Dataset]. https://www.scidb.cn/en/detail?dataSetId=OA_579f89172a5a4ea887fab117c3ea81ce
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Yi-ming.WANG; Yan-min.WANG; Bing.MA
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objective To investigate the gene expression profile of mRNA in colon cancer and determine the optimal prognostic markers.Method The colon cancer dataset TCGA-COAD was downloaded from the Cancer Genome Atlas (TCGA) database as the training cohort. The random survival forest (RSF) model was used to determine gene labels, and the obtained gene labels were analyzed using the Cox model to construct risk scores. The colon cancer dataset GSE17536 was downloaded from the Gene Expression Database (GEO) as the validation cohort to validate the model, and compared horizontally with similar studies in the past year. Exploring the relationship between gene tags and immune cells through immune cell infiltration.Result A total of 11 gene tags were screened, and the risk score constructed by the multi factor Cox model was an independent prognostic indicator for colon cancer patients. The comparative development of this model is superior to previous studies. Immune cell infiltration revealed a significant correlation (P<0.05) between monocytes and the gene labels used in this study.Conclusion This study identified 11 gene markers with prognostic value for colon cancer, and monocytes may serve as potential therapeutic targets for colon cancer.

  11. c

    Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology...

    • cancerimagingarchive.net
    • dev.cancerimagingarchive.net
    • +1more
    docx, n/a, svs, txt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images [Dataset]. http://doi.org/10.7937/TCIA.2019.4A4DKP9U
    Explore at:
    txt, docx, n/a, svsAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Feb 8, 2020
    Dataset funded by
    National Cancer Institute
    Description

    Detection, segmentation and classification of nuclei are fundamental analysis operations in digital pathology. Existing state-of-the-art approaches demand extensive amounts of supervised training data from pathologists and may still perform poorly in images from unseen tissue types. We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type. Although our synthetic patches are not always of high quality, we harness the motley crew of generated samples through a generally applicable importance sampling method. This proposed approach, for the first time, re-weighs the training loss over synthetic data so that the ideal (unbiased) generalization loss over the true data distribution is minimized. This enables us to use a random polygon generator to synthesize approximate cellular structures (i.e., nuclear masks) for which no real examples are given in many tissue types, and hence, GAN-based methods are not suited. In addition, we propose a hybrid synthesis pipeline that utilizes textures in real histopathology patches and GAN models, to tackle heterogeneity in tissue textures. Compared with existing state-of-the-art supervised models, our approach generalizes significantly better on cancer types without training data. Even in cancer types with training data, our approach achieves the same performance without supervision cost. In this dataset we release code and nucleus segmentations in whole slide tissue images with quality control results for Whole Slide Images (WSI) in The Cancer Genome Atlas (TCGA) repository from 5,204 subjects (6,142 slide images). Within this total, there are two subsets of data: (1) automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types, with quality control results, and (2) manual nucleus segmentation data of 1,356 image patches from the same 10 cancer types plus additional 4 cancer types.

    5,060 Whole Slide Images (WSIs) are from the following 10 cancer types:

    BLCA Bladder urothelial carcinoma BRCA Breast invasive carcinoma CESC Cervical squamous cell carcinoma and endocervical adenocarcinoma GBM Glioblastoma Multiforme LUAD Lung adenocarcinoma LUSC Lung squamous cell carcinoma PAAD Pancreatic adenocarcinoma PRAD Prostate adenocarcinoma SKCM Skin Cutaneous Melanoma UCEC Uterine Corpus Endometrial Carcinoma Note that you can also download segmentation data of following 4 cancer types, although they are not officially verified. COAD Colon adenocarcinoma READ Rectal adenocarcinoma STAD Stomach adenocarcinoma UVM Uveal Melanoma

  12. o

    PIVOT - COAD (light)

    • explore.openaire.eu
    Updated Jan 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman (2022). PIVOT - COAD (light) [Dataset]. http://doi.org/10.5281/zenodo.5898163
    Explore at:
    Dataset updated
    Jan 24, 2022
    Authors
    Malvika Sudhakar; Raghunathan Rengaswamy; Karthik Raman
    Description

    Pre-processed TCGA COAD data used for PIVOT analysis.

  13. COAD samples somatic mutation data

    • figshare.com
    • search.datacite.org
    application/gzip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endre Sebestyén (2016). COAD samples somatic mutation data [Dataset]. http://doi.org/10.6084/m9.figshare.1061910.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Endre Sebestyén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCGA COAD samples somatic mutation data in BED format.

  14. Z

    Increased Expression of Sorbitol Dehydrogenase in Colorectal Cancer Predicts...

    • data.niaid.nih.gov
    Updated Nov 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xia, Bi-Han (2021). Increased Expression of Sorbitol Dehydrogenase in Colorectal Cancer Predicts BetterPrognosis (raw data) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5717526
    Explore at:
    Dataset updated
    Nov 23, 2021
    Dataset provided by
    Liu, Tong
    Qi, Shao-Chong
    Xia, Bi-Han
    Wang, Zi-Jing
    Zhang,Xiao-Shuang
    Yang, Jin-Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COAD/READ/COADREAD_rnaseq_fpkm.txt files contain TCGA RNA-Seq data in FPKM normalisation form for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_rnaseq_tpm.txt files contain TCGA RNA-Seq data in TPM normalisation form for colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_clinical_raw.xlsx files contain TCGA clinical data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    COAD/READ/COADREAD_rnaseq_clinical_raw.xlsx files contain corresponding information of TCGA clinical data and RNA-Seq data for patients with colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) or combined (COADREAD).

    Local_cohort_tumour/adenoma_qPCR_rawdata.xlsx files contain our experimental results of qPCR CT values for SORD and GAPDH (as internal ref), shown as separate values for duplicate wells and average values.

    Local_cohort_tumour_clinical_rawdata.xlsx contains clinical information and calculated SORD relative expression of our recruited patients.

  15. h

    lung-cancer

    • huggingface.co
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorsa Rohani (2024). lung-cancer [Dataset]. https://huggingface.co/datasets/dorsar/lung-cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 13, 2024
    Authors
    Dorsa Rohani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Lung Cancer CT Scan Dataset

      Dataset Description
    

    This dataset contains CT scan images for lung cancer detection and classification. It includes images of four different categories: adenocarcinoma, large cell carcinoma, squamous cell carcinoma, and normal (non-cancerous) lung tissue.

      Classes
    

    Adenocarcinoma Large Cell Carcinoma Normal (non-cancerous) Squamous Cell Carcinoma

      Dataset Statistics
    

    Total number of images: 315 Number of classes: 4 Class… See the full description on the dataset page: https://huggingface.co/datasets/dorsar/lung-cancer.

  16. Pan-Cancer-Nuclei-Seg-DICOM: DICOM converted Dataset of Segmented Nuclei in...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Bridge; Markus Herrmann; David Clunie; David Clunie; Andrey Fedorov; Andrey Fedorov; Christopher Bridge; Markus Herrmann (2024). Pan-Cancer-Nuclei-Seg-DICOM: DICOM converted Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images [Dataset]. http://doi.org/10.5281/zenodo.11099005
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher Bridge; Markus Herrmann; David Clunie; David Clunie; Andrey Fedorov; Andrey Fedorov; Christopher Bridge; Markus Herrmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: Pan-Cancer-Nuclei-Seg-DICOM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    This collection contains automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types earlier published in [2] (https://doi.org/10.7937/TCIA.2019.4A4DKP9U) stored in DICOM Bulk Annotation format. Nuclei annotations are stored as closed polygons along with the area of each nuclei. The annotations correspond to digital pathology images from the TCGA-BLCA,TCGA-BRCA,TCGA-CESC,TCGA-COAD,TCGA-GBM,TCGA-LUAD,TCGA-LUSC,TCGA-PAAD,TCGA-PRAD,TCGA-READ,TCGA-SKCM,TCGA-STAD,TCGA-UCEC,TCGA-UVM collections available in NCI Imaging Data Commons.
    To learn how these files are organized and how to access the content programmatically, see this documentation page: https://highdicom.readthedocs.io/en/latest/ann.html.
    Conversion of the nuclei segmentations from the original CSV format into DICOM ANN format was done using the code available in 10.5281/zenodo.10632181.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pan_cancer_nuclei_seg_dicom-collection_id-idc_v19-aws.s5cmd corresponds to the annotations for th eimages in the collection_id collection introduced in IDC data release v19. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    For each of the collections, the following manifest files are provided:

    1. pan_cancer_nuclei_seg_dicom-: manifest of files available for download from public IDC Amazon Web Services buckets
    2. pan_cancer_nuclei_seg_dicom-: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. pan_cancer_nuclei_seg_dicom-: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).
    [2] Hou, L., Gupta, R., Van Arnam, J. S., Zhang, Y., Sivalenka, K., Samaras, D., Kurc, T., & Saltz, J. H. (2019). Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images of 10 Cancer Types [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.2019.4A4DKP9U
  17. R

    Colon Cancer Dataset

    • universe.roboflow.com
    zip
    Updated Sep 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    colon detection (2023). Colon Cancer Dataset [Dataset]. https://universe.roboflow.com/colon-detection/colon-cancer-mnhkv/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 4, 2023
    Dataset authored and provided by
    colon detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Colon Cancer
    Description

    Colon Cancer

    ## Overview
    
    Colon Cancer is a dataset for classification tasks - it contains Colon Cancer annotations for 618 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. HISTOPANTUME: Histological Pan-cancer Tumor image dataset

    • zenodo.org
    application/gzip
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HISTOPANTUME: Histological Pan-cancer Tumor image dataset [Dataset]. https://zenodo.org/records/14555794
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Neda Zamanitajeddin; Neda Zamanitajeddin; Mostafa Jahanifar; Mostafa Jahanifar; fouzia siraj; fouzia siraj; Nasir Rajpoot; Nasir Rajpoot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HISTOPANTUM is a comprehensive pan-cancer dataset of histology images categorized into Tumor and Non-Tumor classes over 4 different cancer types (domains). This dataset is designed to facilitate domain generalization analysis for tumor detection tasks, serving as a benchmark for foundation models and domain generalization algorithms.

    Dataset Overview

    The dataset comprises histology images sourced from The Cancer Genome Atlas (TCGA), spanning the following four cancer types:

    • Colorectal Cancer
    • Ovarian Cancer
    • Stomach Cancer
    • Uterus Cancer

    Image Specifications

    • Original Resolution: 512 × 512 pixels images are extracted from 0.5 micron-per-pixel resolution.
    • Processed Size: Images are resized to 224 × 224 pixels and saved as JPEG files.

    The dataset is provided in four zipped files, each corresponding to one cancer type. Within each zip file, images are organized into two subfolders:

    • tumour
    • non-tumour

    Each image filename encodes the originating slide and the patch position within the slide, following this naming convention:

    Citation

    If you use this dataset in your research, please cite the following publication:

    @article{zamanitajeddin2024benchmarking,
     title={Benchmarking Domain Generalization Algorithms in Computational Pathology},
     author={Zamanitajeddin, Neda and Jahanifar, Mostafa and Xu, Kesi and Siraj, Fouzia and Rajpoot, Nasir},
     journal={arXiv preprint arXiv:2409.17063},
     year={2024}
    }
    

    For further details, please refer to the linked publication.

  19. R

    Lung Cancer Dataset Dataset

    • universe.roboflow.com
    zip
    Updated Sep 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mendozajpd (2024). Lung Cancer Dataset Dataset [Dataset]. https://universe.roboflow.com/mendozajpd/lung-cancer-dataset-chgjz
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2024
    Dataset authored and provided by
    mendozajpd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Test Bounding Boxes
    Description

    Lung Cancer Dataset

    ## Overview
    
    Lung Cancer Dataset is a dataset for object detection tasks - it contains Test annotations for 8,590 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. DICOM converted Slide Microscopy images for the CPTAC-COAD collection

    • zenodo.org
    bin
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the CPTAC-COAD collection [Dataset]. http://doi.org/10.5281/zenodo.13351625
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: CPTAC-COAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

    Collection description

    This collection contains subjects from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium CPTAC Colon Adenocarcinoma cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

    Please see the CPTAC-COAD wiki page to learn more about the images and to obtain any supporting metadata for this collection.

    Files included

    A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

    1. cptac_coad-idc_v10-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
    2. cptac_coad-idc_v10-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
    3. cptac_coad-idc_v10-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

    Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

    Download instructions

    Each of the manifests include instructions in the header on how to download the included files.

    To download the files using .s5cmd manifests:

    1. install idc-index package: pip install --upgrade idc-index
    2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

    To download the files using .dcf manifest, see manifest header.

    Acknowledgments

    Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

    References

    [1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ

The Cancer Genome Atlas Colon Adenocarcinoma Collection

TCGA-COAD

Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
dicom, n/aAvailable download formats
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

CIP TCGA Radiology Initiative

Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

Search
Clear search
Close search
Google apps
Main menu