COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lists of genes to be included in the tensor probability model. The cancer gene census list is taken from the COSMIC cancer gene census.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer is sometimes depicted as a reversion to single cell behavior in cells adapted to live in a multicellular assembly. If this is the case, one would expect that mutation in cancer disrupts functional mechanisms that suppress cell-level traits detrimental to multicellularity. Such mechanisms should have evolved with or after the emergence of multicellularity. This leads to two related, but distinct hypotheses: 1) Somatic mutations in cancer will occur in genes that are younger than the emergence of multicellularity (1000 million years [MY]); and 2) genes that are frequently mutated in cancer and whose mutations are functionally important for the emergence of the cancer phenotype evolved within the past 1000 million years, and thus would exhibit an age distribution that is skewed to younger genes. In order to investigate these hypotheses we estimated the evolutionary ages of all human genes and then studied the probability of mutation and their biological function in relation to their age and genomic location for both normal germline and cancer contexts. We observed that under a model of uniform random mutation across the genome, controlled for gene size, genes less than 500 MY were more frequently mutated in both cases. Paradoxically, causal genes, defined in the COSMIC Cancer Gene Census, were depleted in this age group. When we used functional enrichment analysis to explain this unexpected result we discovered that COSMIC genes with recessive disease phenotypes were enriched for DNA repair and cell cycle control. The non-mutated genes in these pathways are orthologous to those underlying stress-induced mutation in bacteria, which results in the clustering of single nucleotide variations. COSMIC genes were less common in regions where the probability of observing mutational clusters is high, although they are approximately 2-fold more likely to harbor mutational clusters compared to other human genes. Our results suggest this ancient mutational response to stress that evolved among prokaryotes was co-opted to maintain diversity in the germline and immune system, while the original phenotype is restored in cancer. Reversion to a stress-induced mutational response is a hallmark of cancer that allows for effectively searching “protected” genome space where genes causally implicated in cancer are located and underlies the high adaptive potential and concomitant therapeutic resistance that is characteristic of cancer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutational call table for genes included in COSMIC cancer gene census, the FoundationOne® gene list, genes part of the Memorial Sloan Kettering IMPACT platform, and the list of reported BC driver genes. "SNV.table" reports on aforementioned mutation list for each sample. "SNV.LOGICAL" reports on filtered mutations (i.e. high and moderate impact + missense mutations) for each sample (logical: 0 wild-type, 1 mutated). "SNV.CATEGORICAL.GAIN.LOSS" reports on pairwise mutational changes between recurrent and primary tumors (categorical: -1 loss, 0 no change, +1 gain).
This dataset was used for Figure 3 in the following manuscript: "Proteogenomics decodes the evolution of human ipsilateral breast cancer". De Marchi T, Pyl PT, Sjöström M, Reinsbach SE, DiLorenzo S, Nystedt B, Tran L, Pekar G, Wärnberg F, Fredriksson I, Malmström P, Fernö M, Malmström L, Malmström J, Nimèus E. accepted for publication
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Significance of association between Moonlight’s gene sets and genes from the Cancer Gene Census (CGC) evaluated using Fisher’s exact test in three cancer (sub)types: basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma. The gene sets from Moonlight were found using Moonlight’s primary layer and Moonlight’s secondary layer through the Gene Methylation Analysis (GMA) functionality. p-values and odds ratios from Fisher’s exact test are includned.
The current multistep carcinogenesis models of colon cancer do not fully capture the genetic heterogeneity of the disease, which is additionally complicated by the presence of passenger and driver genetic alterations. The aim of the present study was to select in the context of this significant heterogeneity additional genes functionally related to colon cancer development. Methods: High-throughput copy number and gene expression data of 36 microsatellite stable sporadic colon cancers resected from patients of a single Institution characterized for mutations in APC, KRAS, TP53 and loss of 18q were analyzed. Genes whose expression correlated with the underlying copy number pattern were selected and their association with the above listed mutations and overall survival was evaluated. Results: Gain of 20q was strongly associated with TP53 mutation, and overall survival with alterations on 7p, 8p, 13q, 18q and 20q. An association with 18q loss and gain of 8q24 was also observed. New candidate genes with a potential role in colon cancer are PLCG1 on 20q, DBC1 on 8q21 and NDGR1 on 8p24. In addition an unexpected pattern of loss and mutability was found in the region upstream of KRAS gene. Conclusions: By integrating copy number alterations with gene expression and mutations in colon cancer associated genes we have developed a strategy that identifies previously known molecular features and additional players in the molecular landscape of colon cancer. Overall design: A total of 48 sporadic colon cancer samples were analyzed by Affymetrix Mapping 250K Nsp SNP Arrays and 36 of them were also analyzed by Affymetrix Human Exon 1.0 ST Array [transcript (gene) version]. Short summary: Expression data was correlated to copy number data to identify genes whose expression was induced by copy number changes. Gene dosage candidates were then evaluated for their association with gene mutation status of APC, KRAS and TP53, loss of heterozigosity of 18q and overall survival. Long summary: Raw intensity .CEL files of the SNP arrays were processed with Chromosome Copy Number Analysis Tool (v.1.5.6 Affymetrix, Santa Clara, CA) to identify chromosomal gains and losses. Forty eight normal samples from the HapMap project supplied by Affymetrix were used as an un-paired reference set [http://www.affymetrix.com/support/technical/sample_data/500k_data.affx]. All genomic coordinates of the SNP array probes were mapped to the Human Mar. 2006 assembly the UCSC genome browser. Raw intensity .CEL files of the exon arrays were processed with the Robust Multi-Array implementation of Affymetrix Power Tools (v.1.8.6) using the core set of features (22011 probesets). All plots and analysis steps following this processing were done using the R programming language version 2.9.0Bioconductor packages. The identification of statistically significant segments of aberration in the copy number data was performed using the default parameters of the KC-SMART algorithm and 1000 permutations (KCsmart v.2.2.0). The identification and annotation of genes within each aberrant segment was performed using biomaRt v.2.0.0. The list of genes was further annotated by using the cancerGenes resource, Cancer Gene Census and the list of breast and colon CAN genes listed in Wood et al. The genomic landscapes of human breast and colorectal cancers. Science 2007;318:1108-1113. Gene dosage effects across 36 samples for which gene expression and copy number data was available was assessed by evaluating the Spearman correlation of the raw continuous copy number (log-ratios) expression (log-intensity) for each genomic region surrounding each gene falling within the segments identified by KC-SMART. Prior to performing this task, to reduce the total number of correlation tests to perform, we filtered the gene expression dataset by removing all entries having no Gene Symbol annotation 17291 probesets and half the remaining dataset exhibiting the lowest variance resulting in 8645 probesets. Category package version 2.10.0 was used to apply a linear model-based test to detect enrichment of systematic high correlation in specific chromosomal bands taking into consideration the hierarchical structure of the bands. Gene expression class comparisons and survival analysis of the selected gene dosage candidates were performed using two sample t-tests and Cox proportional hazards regression. P-value adjustment of the correlation tests and the gene expression associations was performed using the step-up false discovery rate (FDR) controlling procedure of Benjamini and Hochberg. Raw data, supplementary methods and figures supplied as reproducible documentation (Sweave file) available from the Web Link. Clinical information of samples contains: gender (F=female, M=male), age (years), dimension (cm), stage (Duke's I-IV), status (0=Alive, 1=Dead), survival (months), apc (0=wt, 1=mut), kras (0=wt, 1=mut), tp53 (0=wt, 1=mut), chr18qloh: (0=no LOH, 1=LOH).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Driver mutations are the genetic variants responsible for oncogenesis, but how specific somatic mutational events arise in cells remains poorly understood. Mutational signatures derive from the frequency of mutated trinucleotides in a given cancer sample, and they provide an avenue for investigating the underlying mutational processes that operate in cancer. Here we analyse somatic mutations from 7,815 cancer exomes from The Cancer Genome Atlas (TCGA) across 26 cancer types. We curate a list of 50 known cancer driver mutations by analysing recurrence in our cohort and annotations of known cancer-associated genes from the Cancer Gene Census, IntOGen database and Cancer Genome Interpreter. We then use these datasets to perform binary univariate logistic regression and establish the statistical relationship between individual driver mutations and known mutational signatures across different cancer types. Our analysis led to the identification of 39 significant associations between driver mutations and mutational signatures (P < 0.004, with a false discovery rate of < 5%). We first validate our methodology by establishing statistical links for known and novel associations between driver mutations and the mutational signature arising from Polymerase Epsilon proofreading deficiency. We then examine associations between driver mutations and mutational signatures for AID/APOBEC enzyme activity and deficient mismatch repair. We also identify negative associations (odds ratio < 1) between mutational signatures and driver mutations, and here we examine the role of aging and cigarette smoke mutagenesis in the generation of driver mutations in IDH1 and KRAS in brain cancers and lung adenocarcinomas respectively. Our study provides statistical foundations for hypothesised links between otherwise independent biological processes and we uncover previously unexplored relationships between driver mutations and mutagenic processes during cancer development. These associations give insights into how cancers acquire advantageous mutations and can provide direction to guide further mechanistic studies into cancer pathogenesis.
Supplementary Note 1:
S1 Text: Oncogenomic comparisons between SB candidate Trunk driver genes and their direct orthologs in human Cancer Gene Census; Pyrosequencing analysis of SB-driven keratinocyte cancer models; References.
Supplementary Figures 1-11:
S1 Fig: Overview of genetic crosses to generate SB|Trp53|Onc3 mouse model.
S2 Fig: SB insertion patterns in activated and inactivated drivers.
S3 Fig: Evaluating the reproducibility of SBCapSeq results from bulk cuSCC and normal skin specimens.
S4 Fig. Hierarchical two-dimensional clustering of recurrent events in cuKA and cuSCC.
S5 Fig. Curated biological pathways and processes enriched within SB-induced cuSCC.
S7 Fig: ZMIZ1 metagene within the TCGA Head & Neck Squamous Cell Carcinoma (hnSCC) RNA-seq dataset.
S8 Fig: Clonally selected SB insertions affect trunk driver proto-oncogene expression in SB-cuSCC genomes.
S9 Fig: Clonally selected SB insertions affect trunk driver genes by inactivating expre...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Association of gene age and COSMIC gene status with evolutionarily important regions for genome rearrangement.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prediction of effects of mutations at the protein-protein interface using mCSM protein-protein in BRAF.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overlap of COSMIC genes with cluster hotspots (i.e. clustering of clusters) in both normal peripheral blood and tumors based on clusters that overlap genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Article: Implementing a Functional Precision Medicine Tumor Board for Acute Myeloid Leukemia
Cancer Discovery, DOI: 10.1158/2159-8290.CD-21-0410
Data Types:
1. Clinical summary
2. Drug response data
3. Exome-sequencing data
4. RNA-sequencing data
1. Clinical summary
File_0: Common sample annotation including patient and sample IDs, stage of the disease, tissue type and availability of different data types.
File_1.1: Clinical data for 186 AML patients including clinical diagnosis, disease classification, gender, age at diagnosis, treatments, cytogenetic and molecular details. The description of the variables/column titles is given below the clinical data.
File_1.2: Description of the clinical variables in File_1.1.
2. Drug response data for 164 AML patient samples and 17 healthy samples
File_2: Drug library details for 515 chemical compounds. The compound collection includes drugs names, drug class defined by molecular targets or mode of action, concentration range used for drug testing, supplier information, solvent information and vendor information.
File_3: Drug response data including selective drug sensitivity scores (sDSS) for 515 compounds across 181 samples (164 AML patient samples and 17 healthy control samples). The DSS is modified area under the curve values and are calculated as shown in Yadav et al publication (1). The selective drug sensitivity scores (sDSS) is healthy control normalized DSS that gives estimated cancer-selective drug responses. The higher the sDSS values indicate drug sensitivities and negative sDSS values represent drug resistance.
Note: We recommend using selective DSS values instead of raw values (% inhibition, IC50, DSS given in the online manuscript supplementary data).
Note: If the value is missing, the drug was not tested for that given sample.
File_4: Drug sensitivity and resistance testing (DSRT) assay details for 181 samples (164 AML patient samples and 17 healthy control samples). The information includes medium (MCM or CM) used for the drug testing, % cell viability after 72 h without drug testing and blast cell percentage of each sample.
Note: Column E is the ratio of luminescence values at 72 h and 0 h. The fold change in the cell viability without drug treatment was calculated as % cell viability. That is why the value could be more than 100% e.g. 70% cell viability meaning that 30% cells died during 72 h and 300% cell viability meaning that cells grew 3 times in 72 h incubation period.
3. Exome-sequencing data for 225 AML patient samples
Note: The number of samples in the manuscript is 226. The correct number used in the analyses is 225.
Mutation data. The cancer specific gene list was prepared by combining AML related genes from TCGA(2) (n=23), InToGen(3) (n=32), Papaemmanuil et al.(4) (n=111) and Census database(5) (n=616). Out of these genes, we found 340 genes as mutated across 225 AML patient samples. The mutation was called with P-values less than 0.05.
File_5: VAF (variant allele frequency) of 340 cancer-specific genes across 225 AML patient samples. The VAF was calculated using paired skin samples as a control from the same AML patient.
File_6: Binary data for 57 cancer specific genes frequently mutated (a given mutation detected in 5 or more samples) across 225 AML patient samples.
4. RNA-sequencing data for 163 AML patient samples and 4 healthy
CPM (count per million) data: The CPM values are batch corrected values used for direct comparison of gene expression.
File_7: Log2CPM values for 18,202 protein coding genes across 167 samples (163 AML patient samples and 4 healthy CD34+ samples).
File_8: Raw read count data RNA-seq library information for all 60,619 genes across 167 samples (163 AML patient samples and 4 healthy CD34+ samples). The raw read count data was used to calculate differential gene expression.
File_9: RNA-seq library information including RNA extraction method and sequencing library preparation information for 167 samples (163 AML patient samples and 4 healthy CD34+ samples).
References
1. Yadav B, Pemovska T, Szwajda A, Kulesskiy E, Kontro M, Karjalainen R, et al. Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies. Scientific Reports 2014;4:5193.
2. Ley TJ, Miller C, Ding L, Raphael BJ, Mungall AJ, Robertson A, et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 2013;368(22):2059-74.
3. Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, et al. IntOGen-mutations identifies cancer drivers across tumor types. Nature Methods 2013;10(11):1081-2.
4. Papaemmanuil E, Gerstung M, Bullinger L, Gaidzik VI, Paschka P, Roberts ND, et al. Genomic classification and prognosis in acute myeloid leukemia. New England Journal of Medicine 2016;374(23):2209-21.
5. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Research 2019;47(D1):D941-D7.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identification of driver genes, whose mutations cause the development of tumors, is crucial for the improvement of cancer research and precision medicine. To overcome the problem that the traditional frequency-based methods cannot detect lowly recurrently mutated driver genes, researchers have focused on the functional impact of gene mutations and proposed the function-based methods. However, most of the function-based methods estimate the distribution of the null model through the non-parametric method, which is sensitive to sample size. Besides, such methods could probably lead to underselection or overselection results. In this study, we proposed a method to identify driver genes by using functional impact prediction neural network (FI-net). An artificial neural network as a parametric model was constructed to estimate the functional impact scores for genes, in which multi-omics features were used as the multivariate inputs. Then the estimation of the background distribution and the identification of driver genes were conducted in each cluster obtained by the hierarchical clustering algorithm. We applied FI-net and other 22 state-of-the-art methods to 31 datasets from The Cancer Genome Atlas project. According to the comprehensive evaluation criterion, FI-net was powerful among various datasets and outperformed the other methods in terms of the overlap fraction with Cancer Gene Census and Network of Cancer Genes database, and the consensus in predictions among methods. Furthermore, the results illustrated that FI-net can identify known and potential novel driver genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer types, abbreviations and cohort sizes for the whole-exome sequenced samples analysed in this study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Co-localization of cluster hotspots with evolutionarily important regions for genome rearrangement in cancer genomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1: A list of 616 cancer genes from Cancer Gene Census (CGC, 09/26/2016).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Copy number alterations on chromosome 1, 14 and 22 with the related genes of Cancer Gene Census
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although recurrent somatic mutations in the splicing factor U2AF1 (also known as U2AF35) have been identified in multiple cancer types, the effects of these mutations on the cancer transcriptome have yet to be fully elucidated. Here, we identified splicing alterations associated with U2AF1 mutations across distinct cancers using DNA and RNA sequencing data from The Cancer Genome Atlas (TCGA). Using RNA-Seq data from 182 lung adenocarcinomas and 167 acute myeloid leukemias (AML), in which U2AF1 is somatically mutated in 3–4% of cases, we identified 131 and 369 splicing alterations, respectively, that were significantly associated with U2AF1 mutation. Of these, 30 splicing alterations were statistically significant in both lung adenocarcinoma and AML, including three genes in the Cancer Gene Census, CTNNB1, CHCHD7, and PICALM. Cell line experiments expressing U2AF1 S34F in HeLa cells and in 293T cells provide further support that these altered splicing events are caused by U2AF1 mutation. Consistent with the function of U2AF1 in 3′ splice site recognition, we found that S34F/Y mutations cause preferences for CAG over UAG 3′ splice site sequences. This report demonstrates consistent effects of U2AF1 mutation on splicing in distinct cancer cell types.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Zebrafish genes with T2/OncZ insertion in tumor tissue in the present study.#Human Gene: BRD2, bromodomain containing 2; CBFB, core-binding factor, beta subunit, CBL, Cas-Br-M (murine) ecotropic retroviral transforming sequence; EXT1, exostosin 1; FGF8, fibroblast growth factor 8 (androgen-induced); GATA2, GATA binding protein 2; HEXIM1, hexamethylene bis-acetamide inducible 1; MAP2K5, mitogen-activated protein kinase kinase 5; MMP14, matrix metallopeptidase 14 (membrane-inserted); MSI2, musashi homolog 2 (Drosophila); NCOA2, nuclear receptor coactivator 2; NCOA4, nuclear receptor coactivator 4; PBX1, pre-B-cell leukemia homeobox 1; PRDX5, peroxiredoxin 5; SLC30A5, solute carrier family 30 (zinc transporter), member 5; SMARCB1, SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily b, member 1; SNAPC3, small nuclear RNA activating complex, polypeptide 3, 50 kDa; SOX4, SRY (sex determining region Y)-box 4; SOX5, SRY (sex determining region Y)-box 5; TSHR, thyroid stimulating hormone receptor.%Gene identified as a human Cancer Gene in the Cancer Gene Census List.∧Gene identified as a CIS in mouse tumor tissues by retroviral or T2/Onc transposon insertion. X, present in Human Cancer Gene Census or mouse RTCGD: -, absent.$PBX1 is mutated by translocation in human pre B-ALL, myoepithelioma.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Co-localization of cluster hotspots with evolutionarily important regions for genome rearrangement in normal peripheral blood.
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer