https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Following the same steps that we used in the previous course we downloaded the TCGA-BRCA using R and Bioconductor and in particular the TCGABiolinks package. We downloaded transcriptome profiling of gene expression quantification where the experimental strategy is (RNAseq) and the workflow type is HTSeq-FPKM-UQ and only primary solid tumor data of the affymetrix GPL86 profile and clinical data.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
At the time of our study, 108 cases with breast MRI data were available in the The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) collection. In order to minimize variations in image quality across the multi-institutional cases we included only breast MRI studies acquired on GE 1.5 Tesla magnet strength scanners (GE Medical Systems, Milwaukee, Wisconsin, USA) scanners, yielding a total of 93 cases. We then excluded cases that had missing images in the dynamic sequence (1 patient), or at the time did not have gene expression analysis available in the TCGA Data Portal (8 patients). After these criteria, a dataset of 84 breast cancer patients resulted, with MRIs from four institutions: Memorial Sloan Kettering Cancer Center, the Mayo Clinic, the University of Pittsburgh Medical Center, and the Roswell Park Cancer Institute. The resulting cases contributed by each institution were 9 (date range 1999-2002), 5 (1999-2003), 46 (1999-2004), and 24 (1999-2002), respectively. The dataset of biopsy proven invasive breast cancers included 74 (88%) ductal, 8 (10%) lobular, and 2 (2%) mixed. Of these, 73 (87%) were ER+, 67 (80%) were PR+, and 19 (23%) were HER2+. Various types of analyses were conducted using the combined imaging, genomic, and clinical data. Those analyses are described within several manuscripts created by the group (cited below). Additional information about the methodology for how the Radiologist Annotations file can be found on the TCGA Breast Image Feature Scoring Project page.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-BRCA. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
Please see the TCGA-BRCA page to learn more about the images and to obtain any supporting metadata for this collection.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced.
For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the
collection_id
collection introduced in IDC data
release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of
the corresponding collection was introduced.
tcga_brca-idc_v8-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketstcga_brca-idc_v8-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketstcga_brca-idc_v8-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference
files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The subgroup data (BPS-LumA and WPS-LumA) of the 415 TCGA luminal-A breast cancer samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data downloaded from TCGA.
This dataset was created by RAJIB BAG_1
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Arya Z.E.
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Arya Z.E.
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The legend of supplemental file 1 is "raw data downloaded from TCGA database for TCGA-BRCA dataset”. The legend of supplemental file 2 is "merged data of raw data downloaded from GEO database for BRCA-dataset". The legend of supplemental file 3 is "The intersection of the anoikis-related genes from MSigDB and GeneCards". The legend of "data1.R" is "the code of R for LASSO prognostic model". The legend of "GEOcombine_all.R" is "the code of R for combining of GSE42568, GSE20685, and GSE102484". The legend of "GSEA.GSVA.R" is "the code of R for GSEA and GSVA analysis". The legend of "m.R" is "the code of R for grouping information of TCGA-BRCA". The legend of "boxplot.R" is "the code of R for boxplot of TCGA-BRCA".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed TCGA BRCA data used for PIVOT analysis.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Arya Z.E.
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the submission accompanying raw result files for multiple-layer SOM (ml-SOM) analysis for the paper "Transcriptome patterns of BRCA1- and BRCA2- mutated breast and ovarian cancers".
The dataset contains the results of the ml-SOM analysis of RNA-sequencing data from TCGA-OV (ovarian cancer) and TCGA-BRCA (breast cancer) projects.
The dataset is organized as follows:
Folder "12.BC.40 - Results" - ml-SOM analysis of TCGA-BRCA (breast cancer) dataset
Folder "12.OV.40 - Results" - ml-SOM analysis of TCGA-OV (ovarian cancer) dataset
File "12.BC.40.RData" - R data file that contains ml-SOM environment for breast cancer
File "12.OV.40.RData" - R data file that contains ml-SOM environment for breast cancer
For detailed instructions on browsing the results and their interpretation please refer to the oposSOM package manual [1], as well as original publications [2-4].
References
Henry Loeffler-Wirth, Hoang Thanh Le and Martin Kalcheropos. SOM.Comprehensive analysis of transcriptome data. DOI: 10.18129/B9.bioc.oposSOM
Löffler-Wirth H, Kalcher M, Binder H. oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on Bioconductor. Bioinformatics. 2015 Oct 1;31(19):3225-7. DOI: 10.1093/bioinformatics/btv342. Epub 2015 Jun 10.
Wirth H, von Bergen M, Binder H. Mining SOM expression portraits: feature selection and integrating concepts of molecular function. BioData Min. 2012 Oct 8;5(1):18. DOI: 10.1186/1756-0381-5-18.
Wirth H, Löffler M, von Bergen M, Binder H. Expression cartography of human tissues using self-organizing maps. BMC Bioinformatics. 2011 Jul 27;12:306. DOI: 10.1186/1471-2105-12-306.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The current standard of care for many patients with HER2-positive breast cancer is neoadjuvant chemotherapy in combination with anti-HER2 agents, based on HER2 amplification as detected by in situ hybridization (ISH) or protein immunohistochemistry (IHC). However, hematoxylin & eosin (H&E) tumor stains are more commonly available, and accurate prediction of HER2 status and anti-HER2 treatment response from H&E would reduce costs and increase the speed of treatment selection. Computational algorithms for H&E have been effective in predicting a variety of cancer features and clinical outcomes, including moderate success in predicting HER2 status. We trained a CNN classifier on 188 H&E whole slide images (WSIs) manually annotated for tumor regions of interest (ROIs) by our pathology team. Our classifier achieved an area under the curve (AUC) of 0.90 in cross-validation of slide-level HER2 status and 0.81 on an independent TCGA test set. Moreover, we trained our classifier on pre-treatment samples from 187 HER2+ patients that subsequently received trastuzumab therapy. Our classifier achieved an AUC of 0.80 in a five-fold cross validation. Our work provides an H&E-based algorithm that can predict HER2 status and trastuzumab response in breast cancer at an accuracy that may benefit clinical evaluations. Here, we are providing the datasets used in the study to facilitate development of other HER2+ diagnosis and trastuzumab response applications.
Annotation of digital slides was performed, circling areas of invasive carcinoma (Region of Interests, ROIs). The manual annotation of ROIs significantly enhances the prediction accuracy and reduces the need for extensively large datasets. Regions of necrosis, in situ carcinoma or benign stroma and epithelium were excluded. The images were annotated with ROIs associated to HER2+/- tumor area (TA) by a senior breast pathologist. The annotations were marked tumor boundaries and annotated by Aperio ImageScope software. The annotations were exported from the Aperio software in The Extensible Markup Language (XML) format, including X and Y coordinates corresponding to the annotated regions. We used these coordinates for each slide image to tile these regions separately from the rest of the image, labeled as HER2+ or HER2- class.
This dataset presents 192 cases of HER2 positive and negative invasive breast carcinomas H&E slides from the Yale Pathology electronic database. All tissues and data were retrieved under permission from the Yale Human Investigation Committee protocol #9505008219 to DLR. HER2 positive cases defined as those with 3+ score by immunohistochemistry (IHC) or an equivocal (2+) IHC score with subsequent amplification by fluorescence in situ hybridization (FISH) as defined by American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) clinical practice guidelines. H&E slides generated at Yale School of Medicine include 93 HER2+ and 99 HER2- slides. The slides were scanned at Yale Pathology Tissue Services and underwent a slide quality check before they went into the scanner. The tissue samples were scanned using Vectra Polaris by Perkin-Elmer scanner using bright field whole slides scanning at 20× magnification at Brady Memorial Laboratory Rimm’s lab.
85 response cohort cases were identified also by retrospective search of the Yale Pathology electronic database. Cases included those patients with a pre-treatment breast core biopsy with HER2 positive invasive breast carcinoma who then received neoadjuvant targeted therapy with trastuzumab +/- pertuzumab prior to definitive surgery. HER2 positivity was defined as previously described for the HER2 negative/positive cohort. The response to targeted therapy was obtained from the pathology reports of the surgical resection specimens and dichotomized into responders or non-responders. Those with a complete pathologic response, defined as no residual invasive, lymphovascular invasion or metastatic carcinoma, were designated as responders (n=36). Cases with only residual in situ carcinoma were included in the responder category. Those cases with any amount of residual invasive carcinoma, lymphovascular invasion or metastatic carcinoma were categorized as non-responders (n=49).
A total of 668 TCGA-BRCA HER2+/- samples with available HER2 status were downloaded from the GDC portal (see "Additional Resources for this Dataset" below). Slides were visually inspected by our pathology team to exclude low quality samples with tissue folding or those that appeared to be from frozen tissue. A total of 182 samples (90 HER2- and 92 HER2+) were retained for use as independent test set. Information about which specific samples were retained can be found the TCGA_BRCA Filtered folder of the dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the SNP data downloaded from Xena public database
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Data Integration & Imaging Informatics (DI-Cubed) project explored the issue of lack of standardized data capture at the point of data creation, as reflected in the non-image data accompanying 4 TCIA breast cancer collections (Multi-center breast DCE-MRI data and segmentations from patients in the I-SPY 1/ACRIN 6657 trials (ISPY1), BREAST-DIAGNOSIS, Single site breast DCE-MRI data and segmentations from patients undergoing neoadjuvant chemotherapy (Breast-MRI-NACT-Pilot), The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA)) and the Ivy Glioblastoma Atlas Project (IvyGAP) brain cancer collection. The work addressed the desire for semantic interoperability between various NCI initiatives by aligning on common clinical metadata elements and supporting use cases that connect clinical, imaging, and genomics data. Accordingly, clinical and measurement data imported into I2B2 were cross-mapped to industry standard concepts for names and values including those derived from BRIDG, CDISC SDTM, DICOM Structured Reporting models and using NCI Thesaurus, SNOMED CT and LOINC controlled terminology. A subset of the standardized data was then exported from I2B2 in SDTM compliant SAS transport files. The SDTM data was derived from data taken from both the curated TCIA spreadsheets as well as tumor measurements and dates from the TCIA Restful API. Due to the nature of the available data not all SDTM conformance rules were applicable or adhered to. These Study Data Tabulation Model format (SDTM) datasets were validated using Pinnacle 21 CDISC validation software. The validation software reviews datasets according to their degree of conformance to rules developed for the purposes of FDA submissions of electronic data. Iterative refinements were made to the datasets based upon group discussions and feedback from the validation tool. Export datasets for the following SDTM domains were generated:
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
F The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) RNA-seq dataset; Kaplan-Meier curves of overall survival (OS) in ER- breast cancer. Red-high ESRP1 expression; Black-low ESRP1 expression. A log rank test was used to calculate P=0.19 (n=100, number of events=17).. List of tagged entities: ESRP1 (ncbigene:54845), human (taxonomy:9606), , gene expression assay (bao:BAO_0002785),RNA sequencing (obi:OBI_0001177),survival curve (obi:OBI_0000889), BRCA,Breast Invasive Carcinoma,ER- breast cancer
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundPatients with systemic lupus erythematosus (SLE) have a lower risk of breast cancer (BRCA) than the general population. In this study, we explored the underlying molecular mechanism that is dysregulated in both diseases.MethodsWeighted gene coexpression network analysis (WGCNA) was executed with the SLE and BRCA datasets from the Gene Expression Omnibus (GEO) website and identified the potential role of membrane metalloendopeptidase (MME) in both diseases. Then, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of related proteins and miRNAs were performed to investigate the potential molecular pathways.ResultsWGCNA revealed that MME was positively related to SLE but negatively related to BRCA. In BRCA, MME expression was significantly decreased in tumor tissues, especially in luminal B and infiltrating ductal carcinoma subtypes. Receiver operating characteristic (ROC) analysis identified MME as a valuable diagnostic biomarker of BRCA, with an area under the curve (AUC) value equal to 0.984 (95% confidence interval = 0.976–0.992). KEGG enrichment analysis suggested that MME-related proteins and targeted miRNAs may reduce the incidence of BRCA in SLE patients via the PI3K/AKT/FOXO signaling pathway. Low MME expression was associated with favorable relapse-free survival (RFS) but no other clinical outcomes and may contribute to resistance to chemotherapy in BRCA, with an AUC equal to 0.527 (P value < 0.05).ConclusionsIn summary, MME expression was significantly decreased in BRCA but positively correlated with SLE, and it might reduce the incidence of BRCA in SLE patients via the PI3K/AKT/FOXO signaling pathway.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Summary
This metadata record provides details of the data supporting the claims of the related article: “Signaling of MK2 Sustains Robust AP1 Activity for Triple Negative Breast Cancer Tumorigenesis through Direct Phosphorylation of JAB1”.
The related study showed that p38MAPK signalling pathway regulation of activator protein 1 (AP1) activity involves both MAPKAPK2 (MK2) and JAB1, a known JUN binding protein.
Type of data: signalling pathway activity
Subject of data: antibodies; Eukaryotic cell lines (ATCC); Mus musculus (Foxn1nu, female, Catalog# 007850, The Jackson laboratory)
Data access
The following files underlying the figures in the related manuscript are openly available with this data record:
Fig.1 data.xlsx (Fig.1a,1b of the related article)
Fig.2 data.xlsx (Fig.2c,2d and 2e)
Fig.3 data.xlsx (Fig.3e and 3f)
Fig.4 data.xlsx (Fig.4f,4g and 4f)
Fig.5 data.xlsx (Fig.5b,5c and 5d)
Fig.6 data.xlsx
Fig.7 data.xlsx (Fig. 7e and 7f)
Fig.8 data.xlsx (Fig.8a and 8c)
All other data supporting the related study can be found in the supplementary information file of the related article, and the corresponding author can make any materials available upon request. Un-cropped gels and western blots for Fig. to Fig.5 were included in Supplementary Materials (Fig.S11).”
JAB1 expression in different breast cancer subtypes were downloaded from https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz, and https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/BRCA_clinicalMatrix.gz. For analysis of p38MAPK activity in breast cancer, Reverse Phase Protein Array (RPPA) z score and corresponding clinical data from TCGA Breast Cancer Invasive Carcinoma, PanCancer Atlas were first downloaded through cBioportal (https://www.cbioportal.org/).
Corresponding author(s) for this study
Shuang Huang, Department of Anatomy and Cell Biology, University of Florida College of Medicine, Gainesville, FL 32610. E-mail: shuanghuang@ufl.edu
Study approval
University of Florida Institutional Animal Care and Use Committee.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: Pan-Cancer-Nuclei-Seg-DICOM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pan_cancer_nuclei_seg_dicom-collection_id-idc_v19-aws.s5cmd
corresponds to the annotations for th eimages in the collection_id
collection introduced in IDC data release v19. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.
For each of the collections, the following manifest files are provided:
pan_cancer_nuclei_seg_dicom-
: manifest of files available for download from public IDC Amazon Web Services bucketspan_cancer_nuclei_seg_dicom-
: manifest of files available for download from public IDC Google Cloud Storage bucketspan_cancer_nuclei_seg_dicom-
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.