Dataset Card for "metabric"
Metabric dataset from pycox package. More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 47 underlined genes are also among the top 100 genes of the mitotic CIN attractor (Table 2).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Breast Cancer (METABRIC)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/gunesevitan/breast-cancer-metabric on 12 November 2021.
--- Dataset description provided by original source is as follows ---
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
--- Original source retains full ownership of the source dataset ---
This dataset was created by Damilare Akin-Oladejo
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The list of top 5,000 genes with the highest variability in the METABRIC dataset.
Survival analysis (disease-specific survival) in the METABRIC dataset (univariate and multivariate analysis); whole population and ER-negative population.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data utilised in Survival outcomes are associated with genomic instability in luminal breast cancers.
https://ega-archive.org/dacs/EGAC00001000484https://ega-archive.org/dacs/EGAC00001000484
Metabric breast cancer samples (Expression raw data)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refined subtype labels and intrinsic probes. The refined breast cancer subtype labels defined for each sample in the METABRIC dataset are listed in Table S1. Table S2 shows the annotated probes selected in the CM1 list and the average occurrence of each probe. (XLSX 58 kb)
Dataset Card for "metabric"
More Information needed
Description from EGA:
"Metabric breast cancer samples (Genotype raw data)"
Part of the METABRIC data access committee study, accession number EGAC00001000484, as well as the "METABRIC" study, accession number EGAS00000000098
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Wassila Rezig
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proportion of mutations in the TCGA and METABRIC databases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of the processed multi-omics data and clinical labels used for training and evaluating Moanna (https://github.com/rlupat/moanna). This dataset is processed based on raw files downloaded from cbioportal.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
A. Primary human breast cancers of the METABRIC dataset were stratified according to high or low YAP activity signature [47] and by TP53 mutational status, and then the levels of the cycline signature score were determined in the four groups. Cyclin activity is significantly higher in mut-p53 tumors with high levels of the YAP signature, as visualized by box-plot. Signature scores have been obtained summarizing the standardized expression levels of signature genes into a combined score with zero mean [7]. The values shown in graphs are thus adimensional. The bottom and top of the box are the first and third quartiles, and the band inside the box is the median; whiskers represent 1st and 99th percentiles; values lower and greater are shown as circles (p<0.0001, n=701).. List of tagged entities: TP53 (ncbigene:7157), YAP1 (ncbigene:10413), , computational analysis
IntroductionHER2-positive breast cancer (BC) is a heterogeneous group of aggressive breast cancers, the prognosis of which has greatly improved since the introduction of treatments targeting HER2. However, these tumors may display intrinsic or acquired resistance to treatment, and classifiers of HER2-positive tumors are required to improve the prediction of prognosis and to develop novel therapeutic interventions.MethodsWe analyzed 2893 primary human breast cancer samples from 21 publicly available datasets and developed a six-metagene signature on a training set of 448 HER2-positive BC. We then used external public datasets to assess the ability of these metagenes to predict the response to chemotherapy (Ignatiadis dataset), and prognosis (METABRIC dataset).ResultsWe identified a six-metagene signature (138 genes) containing metagenes enriched in different gene ontologies. The gene clusters were named as follows: Immunity, Tumor suppressors/proliferation, Interferon, Signal transduction, Hormone/survival and Matrix clusters. In all datasets, the Immunity metagene was less strongly expressed in ER-positive than in ER-negative tumors, and was inversely correlated with the Hormonal/survival metagene. Within the signature, multivariate analyses showed that strong expression of the “Immunity” metagene was associated with higher pCR rates after NAC (OR = 3.71[1.28–11.91], p = 0.019) than weak expression, and with a better prognosis in HER2-positive/ER-negative breast cancers (HR = 0.58 [0.36–0.94], p = 0.026). Immunity metagene expression was associated with the presence of tumor-infiltrating lymphocytes (TILs).ConclusionThe identification of a predictive and prognostic immune module in HER2-positive BC confirms the need for clinical testing for immune checkpoint modulators and vaccines for this specific subtype. The inverse correlation between Immunity and hormone pathways opens research perspectives and deserves further investigation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code are provided in one directory. This annotation document is divided into notes for those who wish to reuse data, and those who wish to rerun analysis code.
Data for reuse:
The data comprise three types:
All full stacks and masks are tiff images. Each IMC acquisition (image) is associated with six images in total: the full stack image itself and five image masks (whole cell, nucleus, cytoplasm, tumour and vessel). The naming convention for these images is MB####_###_ImageType.tiff, where:
Notes:
The order of image layers in full stack images corresponds to the markerStackOrder.csv file, which identifies each image layer with its corresponding isotope and epitope.
Masks are grayscale images where each discrete region is identified by a set of contiguous pixels associated with a single integer value. These tend to be sequential from the top to the bottom of the image (this is why a mask appears as a gradation of gray and white when opened in an image viewer). The processed single cell data ‘ObjectNumber’ column corresponds to whole cell masks, where the integer values of each cell maps to ‘ObjectNumber’, allowing for marker values and other features to be mapped to images.
Two processed data files:
SingleCells.csv where each row represents a cell, and columns are data associated with each cell. Each observation is uniquely identified by the combination of ImageNumber and ObjectNumber. These data have already been spillover corrected.
CellNeighbours.csv where each row represents a cell-cell interaction. The data are in graph format, with columns labelled ‘from’ and ‘to’ meaning from an index cell to a neighbouring cell (despite this convention, the data are undirected); the integers within these columns map to ObjectNumber in SingleCells.csv.
Note: The convention for `is_` variables in processed data files is that 0 is FALSE and 1 TRUE.
Two column annotation files:
Two corresponding annotation files that contain details on the content of each column in processed tabular files are also provided, they are SingleCellsAnnotation.xlsx and CellNeighboursAnnotation.xlsx
Other files are annotation and processed data files required by the code in the Code directory; they can be ignored unless you plan to rerun analyses.
Code and reproducibility
Analysis code and corresponding processed data are also provided in the directory. The code was run within a conda environment, details of which are provided in the file CondaEnv.yml. Processed metadata from the METABRIC study are among the files provided. It is, however, recommended that additional analyses that rely on METABRIC metadata, use data downloaded from their original publications or a public repository as these data are subject to updates, and the user may wish to process them differently.
Code is separated by figures. The code must be run in the order figures appear in the paper, as later code relies on derived files created earlier. The code must also be run within the directory as relative paths rely on its structure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: METABRIC D and METABRIC V are discovery and validation datasets from METABRIC study [91], and Other datasets represented by GSE ID are available from NCBI GEO database.There are eight published signatures in the study including BRsig70 [3], BRsig76) [2], ONCO (Oncotype DX) [4], [5], TAMR13 [6], PAM50 [9], Genius [7], PIK3(PIK3CAGS278) [10], and GGI [8].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clinicopathological parameters of MMP-9 from our data and METABRIC data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Table S1: CDK12 expression in relation to clinicopathological parameters for the unselected TMA series; Supplementary Table S2: CDK12 expression in relation to clinicopathological parameters for the HER2-positive Herceptin treated series; Supplementary Table S3: CDK12 expression in relation to clinicopathological parameters for the METABRIC TMA series; Supplementary Table S4: Univariate and multivariate analysis of CDK12 in the TMA cohorts; Supplementary Table S5: CDK12 mutations in breast cancer. Taken from cBioportal (42,43); Supplementary Table S6: Correlations of CDK12 mutations, methylation, gene expression and ERBB2 copy number in primary breast cancers from TCGA; Supplementary Table S7: Correlations of CDK12 mutations and gene expression of DNA repair genes in primary tumors from METABRIC. P values from heteroscedastic 2-tailed, t-test; Supplementary Table S8: Correlations of CDK12 protein expression, and miRNA expression in primary tumors from METABRIC. Wilcoxon rank P values are corrected for multiple testing; Supplementary Table S9: Correlations of CDK12 protein expression and gene expression of DNA repair genes in primary tumors from METABRIC. Limma analysis corrected for multiple testing; Supplementary Table S10: Association of CDK12 absent and intermediate (0, 2-6) versus high (7-8) expression with DNA repair proteins in unselected and TNBC. P values from Fishers exact test.
Dataset Card for "metabric"
Metabric dataset from pycox package. More Information needed