97 datasets found

r
Genomic Data Commons Data Portal (GDC Data Portal)
rrid.site
scicrunch.org
+2more
Updated Aug 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014514
Dataset updated
Aug 30, 2025
Description
A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.
c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
stage.cancerimagingarchive.net
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
c
The Cancer Genome Atlas Breast Invasive Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Breast Invasive Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.
c
The Cancer Genome Atlas Stomach Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Stomach Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
f
The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
figshare.com
xlsx
Updated Feb 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namshik Han (2018). The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5851743.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5851743.v1
Dataset updated
Feb 2, 2018
Dataset provided by
figshare
Authors
Namshik Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2).
DICOM converted Slide Microscopy images for the TCGA-SKCM collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-SKCM collection [Dataset]. http://doi.org/10.5281/zenodo.12690040
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12690040
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-SKCM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

Melanoma is a cancer in a type of skin cells called melanocytes. Melanocyes are the cells that produce melanin, which colors the skin. When exposed to sun, these cells make more melanin, causing the skin to darken or tan. Melanoma can occur anywhere on the body and risk factors include fair complexion, family history of melanoma, and being exposed to natural or artificial sunlight over long periods of time. Melanoma is most often discovered because it has metastasized, or spread, to another organ, such as the lymph nodes. In many cases, the primary skin melanoma site is never found. Because of this challenge, TCGA is studying primarily metastatic cases (in contrast to other cancers selected for study, where metastatic cases are excluded). For 2011, it was estimated that there were 70,230 new cases of melanoma and 8,790 deaths from the disease.

Please see the TCGA-SKCM information page to learn more about the images and to obtain any supporting metadata for this collection.

Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_skcm-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_skcm-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_skcm-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
c
The Cancer Genome Atlas Prostate Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Prostate Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Overview of all the TCGA cases used in this study.
plos.figshare.com
datasetcatalog.nlm.nih.gov
doc
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florence L. M. de Groen; Lisette M. Timmer; Renee X. Menezes; Begona Diosdado; Erik Hooijberg; Gerrit A. Meijer; Renske D. M. Steenbergen; Beatriz Carvalho (2023). Overview of all the TCGA cases used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0132495.s001
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0132495.s001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Florence L. M. de Groen; Lisette M. Timmer; Renee X. Menezes; Begona Diosdado; Erik Hooijberg; Gerrit A. Meijer; Renske D. M. Steenbergen; Beatriz Carvalho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional information on the data of 125 cases used from the TCGA. For all cases the sample ID, type of data (platform type), platform used, data level and the corresponding file names are listed. More information on the data used can be found at the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/tcgaDataType.jsp). (DOC)
Clinicopathological parameters in relation to NDRG2 mRNA expression of the...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vera Kloten; Martin Schlensog; Julian Eschenbruch; Janina Gasthaus; Janina Tiedemann; Jolein Mijnes; Timon Heide; Till Braunschweig; Ruth Knüchel; Edgar Dahl (2023). Clinicopathological parameters in relation to NDRG2 mRNA expression of the TCGA data portal. [Dataset]. http://doi.org/10.1371/journal.pone.0159073.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0159073.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Vera Kloten; Martin Schlensog; Julian Eschenbruch; Janina Gasthaus; Janina Tiedemann; Jolein Mijnes; Timon Heide; Till Braunschweig; Ruth Knüchel; Edgar Dahl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clinicopathological parameters in relation to NDRG2 mRNA expression of the TCGA data portal.
f
Data types supported by RTCGAToolbox.
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehmet Kemal Samur (2023). Data types supported by RTCGAToolbox. [Dataset]. http://doi.org/10.1371/journal.pone.0106397.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0106397.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Mehmet Kemal Samur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data types supported by RTCGAToolbox.
TCGA-LUAD
kaggle.com
opendatalab.com
zip
Updated Jul 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nahin Kumar Dey (2021). TCGA-LUAD [Dataset]. https://www.kaggle.com/nahin333/tcgaluad
Explore at:
zip(10283785426 bytes)Available download formats
Dataset updated
Jul 28, 2021
Authors
Nahin Kumar Dey
Description
The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD
c
TCGA Breast Phenotype Research Group Data sets
cancerimagingarchive.net
stage.cancerimagingarchive.net
n/a, xls, zip
Updated Sep 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2018). TCGA Breast Phenotype Research Group Data sets [Dataset]. http://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G
Explore at:
xls, n/a, zipAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G
Dataset updated
Sep 4, 2018
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Sep 4, 2018
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
At the time of our study, 108 cases with breast MRI data were available in the The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) collection. In order to minimize variations in image quality across the multi-institutional cases we included only breast MRI studies acquired on GE 1.5 Tesla magnet strength scanners (GE Medical Systems, Milwaukee, Wisconsin, USA) scanners, yielding a total of 93 cases. We then excluded cases that had missing images in the dynamic sequence (1 patient), or at the time did not have gene expression analysis available in the TCGA Data Portal (8 patients). After these criteria, a dataset of 84 breast cancer patients resulted, with MRIs from four institutions: Memorial Sloan Kettering Cancer Center, the Mayo Clinic, the University of Pittsburgh Medical Center, and the Roswell Park Cancer Institute. The resulting cases contributed by each institution were 9 (date range 1999-2002), 5 (1999-2003), 46 (1999-2004), and 24 (1999-2002), respectively. The dataset of biopsy proven invasive breast cancers included 74 (88%) ductal, 8 (10%) lobular, and 2 (2%) mixed. Of these, 73 (87%) were ER+, 67 (80%) were PR+, and 19 (23%) were HER2+. Various types of analyses were conducted using the combined imaging, genomic, and clinical data. Those analyses are described within several manuscripts created by the group (cited below). Additional information about the methodology for how the Radiologist Annotations file can be found on the TCGA Breast Image Feature Scoring Project page.
c
The Cancer Genome Atlas Ovarian Cancer Collection
cancerimagingarchive.net
dicom, n/a
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
Dataset updated
May 29, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.
h
MLOmics
huggingface.co
Updated Apr 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI for Bio Informatics and Care (2025). MLOmics [Dataset]. https://huggingface.co/datasets/AIBIC/MLOmics
Explore at:
Dataset updated
Apr 5, 2025
Dataset authored and provided by
AI for Bio Informatics and Care
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MLOmics: Cancer Multi-Omics Database for Machine Learning

Abstract

Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA)… See the full description on the dataset page: https://huggingface.co/datasets/AIBIC/MLOmics.
f
Current Firehose data content (Some of these data may not be accessible due...
plos.figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehmet Kemal Samur (2023). Current Firehose data content (Some of these data may not be accessible due to TCGA data restrictions, full data table can be accessible via http://gdac.broadinstitute.org/runs/stddata_2014_03_16/ingested_data.html). [Dataset]. http://doi.org/10.1371/journal.pone.0106397.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0106397.t001
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Mehmet Kemal Samur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Current Firehose data content (Some of these data may not be accessible due to TCGA data restrictions, full data table can be accessible via http://gdac.broadinstitute.org/runs/stddata_2014_03_16/ingested_data.html).
R
TCGA case study for ASTERICS
entrepot.recherche.data.gouv.fr
csv +4
Updated Sep 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathalie Vialaneix; Nathalie Vialaneix (2022). TCGA case study for ASTERICS [Dataset]. http://doi.org/10.15454/YNMQUY
Explore at:
text/x-r-source(1088), csv(2148636), type/x-r-syntax(864), csv(1003176), csv(2752164), csv(1003170), csv(33405040), csv(812120), txt(8901), text/comma-separated-values(808595)Available download formats
Unique identifier
https://doi.org/10.15454/YNMQUY
Dataset updated
Sep 26, 2022
Dataset provided by
Recherche Data Gouv
Authors
Nathalie Vialaneix; Nathalie Vialaneix
License
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.15454/YNMQUYhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.15454/YNMQUY
Time period covered
Sep 15, 2020 - Aug 18, 2021
Dataset funded by
Région Occitanie
Description
This dataset is issued from the public repository TCGA (https://portal.gdc.cancer.gov/) and contain several files, each corresponding to a given omic on the same individuals with breast cancer. Raw data have been obtained from the mixOmics case study described in http://mixomics.org/mixdiablo/case-study-tcga/ [link accessed on August 18, 2021] and were made available by the package authors at http://mixomics.org/wp-content/uploads/2016/08/TCGA.normalised.mixDIABLO.RData_.zip (R data format). Data in the zip file had been normalised for technical biases by the package authors. Data from the train and test sets were exported as TXT/CSV files and completed with miRNA expression on the smae individuals and toy datasets to handle missing value cases and alike. They serve as a basis for the illustration of the web data analysis tool ASTERICS (Project 20008788 funded by Région Occitanie).
O
TCGA-KICH
opendatalab.com
zip
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GDC Data Portal (2023). TCGA-KICH [Dataset]. https://opendatalab.com/OpenDataLab/TCGA-KICH
Explore at:
zipAvailable download formats
Dataset updated
Apr 20, 2023
Dataset provided by
GDC Data Portal
License
https://portal.gdc.cancer.gov/projects/TCGA-KICHhttps://portal.gdc.cancer.gov/projects/TCGA-KICH
Description
TCGA - KICH cancer CT image is a dataset related to adenoma and adenocarcinoma, which contains a total of 2325 data files from 113 people. Test results, prescriptions and treatments. This dataset is published by GDC Data Portal.
f
Metadata record for the article: A subset of lung cancer cases shows robust...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miklos Diossy; Zsofia Sztupinszki; Judit Borcsok; Marcin Krzystanek; Viktoria Tisza; Sandor Spisak; Orsolya Rusz; Jozsef Timar; István Csabai; Janos Fillinger; Judit Moldvay; Anders Gorm Pedersen; David Szuts; Zoltan Szallasi (2023). Metadata record for the article: A subset of lung cancer cases shows robust signs of homologous recombination deficiency associated genomic mutational signatures [Dataset]. http://doi.org/10.6084/m9.figshare.14452854.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14452854.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Authors
Miklos Diossy; Zsofia Sztupinszki; Judit Borcsok; Marcin Krzystanek; Viktoria Tisza; Sandor Spisak; Orsolya Rusz; Jozsef Timar; István Csabai; Janos Fillinger; Judit Moldvay; Anders Gorm Pedersen; David Szuts; Zoltan Szallasi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary

This metadata record provides details of the data supporting the claims of the related article: “A subset of lung cancer cases shows robust signs of homologous recombination deficiency associated genomic mutational signatures”.

The related study analysed all available whole genome sequencing data from the TCGA lung adenocarcinoma (LUAD) and squamous lung cancer (LUSC) cohorts and determined which of a list of mutational signatures were present in these cases, analysing whole genome and whole exome data to estimate the frequency of potentially homologous recombination (HR) deficient lung cancer cases.

Type of data: single nucleotide variation; binary alignment maps

Subject of data: Eukaryotic cell lines; Homo sapiens

Population characteristics: lung cancer cases

Recruitment: Cancer cell lines were sourced from Cancer Cell Line Encyclopedia, Genomics of Drug Sensitivity in Cancer data portal. The exceptional responder was identified as part of a larger ongoing study to understand the determinants of treatment response to platinum based therapy.

Data access

The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga, and the LUAD and LUSC data are available at ICGC (https://dcc.icgc.org/) and GDC (https://portal.gdc.cancer.gov/) data portals. A comprehensive list of the file names underlying the figures and supplementary materials of the related article, along with direct links to the data in the above sources, is provided in the file ‘Diossy_et_al_2021_underlying_data_list.xlsx’, which is included with this data record.

Sample single nucleotide variation analysis of a stage IVA lung squamous carcinoma case with a durable (> 20 months), symptom-free survival in response to platinum-based treatment (H75T) has been deposited in the European Variation Archive under accession https://identifiers.org/ebi/bioproject:PRJEB45238.

Corresponding author(s) for this study

Zoltan Szallasi, Computational Health Informatics Program (CHIP) Boston Children’s Hospital, Harvard Medical School, 300 Longwood Ave., Boston Massachusetts, USA, 02215, e-mail: Zoltan.szallasi@childrens.harvard.edu, +1-617-355-2179.

Study approval

The Hungarian Scientific and Research Ethics Committee of the Medical Research Council, No 2285-1/2019/EUIG és 2307-3/2020/EUIG has approved the study.
tcnibmg
figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samanwoy Mukhopadhyay; Saroj Mohapatra; Himanshu Tripathi (2023). tcnibmg [Dataset]. http://doi.org/10.6084/m9.figshare.8118413.v3
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8118413.v3
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Samanwoy Mukhopadhyay; Saroj Mohapatra; Himanshu Tripathi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an R data package. This contains transcriptome data from TCGA (https://portal.gdc.cancer.gov/) for 17 cancers.
Multi-omic and survival datasets used for "DeepProg: an ensemble of...
figshare.com
application/x-gzip
Updated Jun 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olivier Poirion; Lana Garmire; Kumard Deep; Sijia Huang; Zheng Jing (2021). Multi-omic and survival datasets used for "DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data" [Dataset]. http://doi.org/10.6084/m9.figshare.14832813.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14832813.v1
Dataset updated
Jun 24, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Olivier Poirion; Lana Garmire; Kumard Deep; Sijia Huang; Zheng Jing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We obtained the 32 cancer multi-omic datasets from NCBI using TCGA portal (https://tcgadata.nci.nih.gov/tcga/). We used the package TCGA-Assembler (versions 2.0.5) and wrote custom scripts to download RNA-Seq (UNC IlluminaHiSeq RNASeqV2), miRNA Sequencing (BCGSC IlluminaHiSeq, Level 3), and DNA methylation (JHU-USC HumanMethylation450) data from the TCGA website on November 4-14th, 2017. We also obtained the survival information from the portal: https://portal.gdc.cancer.gov/. We used the same preprocessing steps as detailed in our previous study. We first downloaded RNA-Seq, miRNA-Seq and methylation data using the functions DownloadRNASeqData, DownloadmiRNASeqData, and DownloadMethylationData from TCGAAssembler, respectively. Then, we processed the data with the functions ProcessRNASeqData, ProcessmiRNASeqData, and ProcessMethylation450Data. In addition, we processed the methylation data with the function CalculateSingleValueMethylationData. Finally, for each omic data type, we created a gene-by-sample data matrix in the Tabular Separated Value (TSV) format using a custom script.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Genomic Data Commons Data Portal (GDC Data Portal) [Dataset]. http://identifiers.org/RRID:SCR_014514

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal

Explore at:

81 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_014514

Dataset updated

Aug 30, 2025

Description

A unified data repository of the National Cancer Institute (NCI)'s Genomic Data Commons (GDC) that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the Cancer Genome Characterization Initiative (CGCI). The GDC Data Portal provides a platform for efficiently querying and downloading high quality and complete data. The GDC also provides a GDC Data Transfer Tool and a GDC API for programmatic access.

Clear search

Close search

Google apps

Main menu

Genomic Data Commons Data Portal (GDC Data Portal)

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

The Cancer Genome Atlas Breast Invasive Carcinoma Collection

CIP TCGA Radiology Initiative

The Cancer Genome Atlas Stomach Adenocarcinoma Collection

CIP TCGA Radiology Initiative

The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis

DICOM converted Slide Microscopy images for the TCGA-SKCM collection

Collection description

Files included

Download instructions

Acknowledgments

References

The Cancer Genome Atlas Prostate Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Overview of all the TCGA cases used in this study.

Clinicopathological parameters in relation to NDRG2 mRNA expression of the...

Data types supported by RTCGAToolbox.

TCGA-LUAD

TCGA Breast Phenotype Research Group Data sets

The Cancer Genome Atlas Ovarian Cancer Collection

CIP TCGA Radiology Initiative

MLOmics

Current Firehose data content (Some of these data may not be accessible due...

TCGA case study for ASTERICS

TCGA-KICH

Metadata record for the article: A subset of lung cancer cases shows robust...

tcnibmg

Multi-omic and survival datasets used for "DeepProg: an ensemble of...

Genomic Data Commons Data Portal (GDC Data Portal)

RRID:SCR_014514, Genomic Data Commons Data Portal (GDC Data Portal) (RRID:SCR_014514), Genomic Data Commons Data Portal, GDC Data Portal