Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics. The authors employed machine learning models to predict genome-wide DNA methylation (DNAm) levels in cancerous tissues (CTs) and paracancerous tissues (PTs) when one of them is difficult to obtain. The proposed model based on a single CpG site achieves an improvement of mean absolute error at more than 68% of CpGs. A multiple-CpG-based XGBoost model can further improve the predictive performance when there is considerable variability between individuals. The detected CpG sites in differential methylation analysis are statistically more significant by combining the measured and predicted PTs to enlarge the sample size. When using CTs as predictors instead of PTs, the prediction models have better performance. The aggressiveness of cancers and patient outcome may be predictable using well-predicted DNAm profiles in CT/PT. Functional enrichment analysis based on highly correlated CpG sites identified important pathways involved in cancer progression. The cross-tumor DNAm prediction model has the potential to be applied to an external cancer dataset for a subset of probes with high correlation in both cancers.
Facebook
TwitterCo-occurrence and mutual exclusivity (COME) of DNA methylation refer to two or more genes that tend to be positively or negatively correlated in DNA methylation among different samples. Although COME of gene mutations in pan-cancer have been well explored, little is known about the COME of DNA methylation in pan-cancer. Here, we systematically explored the COME of DNA methylation profile in diverse human cancer. A total of 5,128,332 COME events were identified in 14 main cancers types in The Cancer Genome Atlas (TCGA). We also identified functional epigenetic modules of the zinc finger gene family in six cancer types by integrating the gene expression and DNA methylation data and the frequently occurred COME network. Interestingly, most of the genes in those functional epigenetic modules are epigenetically repressed. Strikingly, those frequently occurred COME events could be used to classify the patients into several subtypes with significant different clinical outcomes in six cancers as well as pan-cancer (p-value ≤ = 0.05). Moreover, we observed significant associations between different COME subtypes and clinical features (e.g., age, gender, histological type, neoplasm histologic grade, and pathologic stage) in distinct cancers. Taken together, we identified millions of COME events of DNA methylation in pan-cancer and detected functional epigenetic COME events that could separate tumor patients into different subtypes, which may benefit the diagnosis and prognosis of pan-cancer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer is an aging-associated disease but the underlying molecular links between these processes are still largely unknown. Gene promoters that become hypermethylated in aging and cancer share a common chromatin signature in ES cells. In addition, there is also global DNA hypomethylation in both processes. However, any similarities of the regions where this loss of DNA methylation occurs is currently not well characterized, nor is it known whether such regions also share a common chromatin signature in aging and cancer. To address this issue we analysed TCGA DNA methylation data from a total of 2,311 samples, including control and cancer cases from patients with breast, kidney, thyroid, skin, brain and lung tumors and healthy blood, and integrated the results with histone, chromatin state and transcription factor binding site data from the NIH Roadmap Epigenomics and ENCODE projects. We identified 98,857 CpG sites differentially methylated in aging, and 286,746 in cancer. Hyper- and hypomethylated changes in both processes each had a similar genomic distribution across tissues and displayed tissue-independent alterations. The identified hypermethylated regions in aging and cancer shared a similar bivalent chromatin signature. In contrast, hypomethylated DNA sequences occurred in very different chromatin contexts. DNA hypomethylated sequences were enriched at genomic regions marked with the activating histone posttranslational modification H3K4me1 in aging, whilst in cancer, loss of DNA methylation was primarily associated with the repressive H3K9me3 mark.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The metastatic cancer of unknown primary (CUP) sites remains a leading cause of cancer death with few therapeutic options. The aberrant DNA methylation (DNAm) is the most important risk factor for cancer, which has certain tissue specificity. However, how DNAm alterations in tumors differ among the regulatory network of multi-omics remains largely unexplored. Therefore, there is room for improvement in our accuracy in the prediction of tumor origin sites and a need for better understanding of the underlying mechanisms. In our study, an integrative analysis based on multi-omics data and molecular regulatory network uncovered genome-wide methylation mechanism and identified 23 epi-driver genes. Apart from the promoter region, we also found that the aberrant methylation within the gene body or intergenic region was significantly associated with gene expression. Significant enrichment analysis of the epi-driver genes indicated that these genes were highly related to cellular mechanisms of tumorigenesis, including T-cell differentiation, cell proliferation, and signal transduction. Based on the ensemble algorithm, six CpG sites located in five epi-driver genes were selected to construct a tissue-specific classifier with a better accuracy (>95%) using TCGA datasets. In the independent datasets and the metastatic cancer datasets from GEO, the accuracy of distinguishing tumor subtypes or original sites was more than 90%, showing better robustness and stability. In summary, the integration analysis of large-scale omics data revealed complex regulation of DNAm across various cancer types and identified the epi-driver genes participating in tumorigenesis. Based on the aberrant methylation status located in epi-driver genes, a classifier that provided the highest accuracy in tracing back to the primary sites of metastatic cancer was established. Our study provides a comprehensive and multi-omics view of DNAm-associated changes across cancer types and has potential for clinical application.
Facebook
TwitterDatabase to study interplay of DNA methylation, gene expression and cancer that hosts both highly integrated data of DNA methylation, cancer-related gene, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from our large-scale sequencing. Interconnections between different data types were analyzed and presented. Search tool and graphical MethyView are developed to help users access all the data and data connections and view DNA methylation in context of genomics and genetics data. The search tool and graphical MethyView are developed to help users access all the data and data connections and view DNA methylation in context of genomics and genetics data. As part of the Cancer Epigenomics Project in China, MethyCancer serves as a platform for sharing data and analytical results from the Cancer Genome/Epigenome Project in China with colleagues all over the world.
Facebook
Twitterhttps://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
This dataset contains genome-wide DNA methylation data generated from 142 pediatric acute myeloid leukemia (AML) samples originating from bone marrow or peripheral blood samples taken at AML diagnosis (N=123) or relapse (N=19). Further details regarding the samples are available in Supplementary Table S1 from Krali and Palle et. al., 2021 (https://doi.org/10.3390/genes12060895).Genome-wide DNA methylation was analyzed at the SNP&SEQ Technology Platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. 200ng of bisulfite converted DNA was amplified, fragmented and hybridised to Illumina Infinium Human Methylation450k Beadchip using the standard protocol from Illumina (iScan SQ instrument).This metadata record contains information about the raw idat files generated from the Infinium DNA methylation arrays. The Methylprep Python library was used to generate and normalize the beta-value matrix (https://pypi.org/project/methylprep/1.3.3/).The raw idat files along with a samplesheet, processed beta-value matrix, annotation file for CpG annotation, and signal intensities matrix will be made available upon request. Limited phenotype information is available in the Supplemental Table 1 of the manuscript. All scripts that give a walk-through from data preprocessing from the raw idat files until the modelling process with Machine Learning can be found on the following GitHub repository: https://github.com/Molmed/Krali-Palle_2021.Terms for accessThe DNA methylation dataset is only to be used for research that is seeking to advance the understanding of the influence of epigenetic factors on leukemia etiology and biology.The data should not be used for other purposes, i.e. investigating the epigenetic signatures that may lead to identification of a person.For retrieving the data used for the scope of this publication, please contact datacentre@scilifelab.se.
Facebook
Twitterhttps://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
This dataset contains genome-wide DNA methylation data generated from 384 pediatric acute lymphoblastic leukemia (ALL) samples originating from bone marrow or peripheral blood samples taken at ALL diagnosis (n = 384). Further details regarding the samples are available in Supplementary Table S2 from Krali et al., 2023 (https://doi.org/10.1038/s41698-023-00479-5).Genome-wide DNA methylation was analyzed at the SNP&SEQ Technology Platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. 250 ng of bisulfite converted DNA was amplified, fragmented and hybridised to Illumina Infinium Human Methylation450k Beadchip using the standard protocol from Illumina (iScan SQ instrument).This metadata record contains information about the raw idat files generated from the Infinium DNA methylation arrays. The raw idat files were processed with Methylation Module (1.8.5) software in Genome Studio (V2010.3). Peak-based correction was used to normalize the beta-value matrix.The raw idat files along with a samplesheet, processed beta-value matrix, annotation file for CpG annotation will be made available upon request. Limited phenotype information is available in the Supplemental Table S2 of the manuscript. All scripts that give a walk-through to our project, including the modelling process with Machine Learning can be found in our GitHub repository.Terms for accessThe DNA methylation dataset is only to be used for research that is seeking to advance the understanding of the influence of epigenetic factors on leukemia etiology and biology.The data should not be used for other purposes, i.e. investigating the epigenetic signatures that may lead to identification of a person.For retrieving the data used for the scope of this publication, please contact datacentre@scilifelab.se.
Facebook
TwitterAdditional file 1. Hazard ratios (95% CIs) for lymphatic-hematopoietic, solid and overall cancers in the Strong Heart Study and the Framingham Heart Study.
Facebook
TwitterBackgroundSmoking was strongly associated with breast cancer in previous studies. Whether smoking promotes breast cancer through DNA methylation remains unknown.MethodsTwo-sample Mendelian randomization (MR) analyses were conducted to assess the causal effect of smoking-related DNA methylation on breast cancer risk. We used 436 smoking-related CpG sites extracted from 846 middle-aged women in the ARIES project as exposure data. We collected summary data of breast cancer from one of the largest meta-analyses, including 69,501 cases for ER+ breast cancer and 21,468 cases for ER− breast cancer. A total of 485 single-nucleotide polymorphisms (SNPs) were selected as instrumental variables (IVs) for smoking-related DNA methylation. We further performed an MR Steiger test to estimate the likely direction of causal estimate between DNA methylation and breast cancer. We also conducted colocalization analysis to evaluate whether smoking-related CpG sites shared a common genetic causal SNP with breast cancer in a given region.ResultsWe established four significant associations after multiple testing correction: the CpG sites of cg2583948 [OR = 0.94, 95% CI (0.91–0.97)], cg0760265 [OR = 1.07, 95% CI (1.03–1.11)], cg0420946 [OR = 0.95, 95% CI (0.93–0.98)], and cg2037583 [OR =1.09, 95% CI (1.04–1.15)] were associated with the risk of ER+ breast cancer. All the four smoking-related CpG sites had a larger variance than that in ER+ breast cancer (all p < 1.83 × 10−11) in the MR Steiger test. Further colocalization analysis showed that there was strong evidence (based on PPH4 > 0.8) supporting a common genetic causal SNP between the CpG site of cg2583948 [with IMP3 expression (PPH4 = 0.958)] and ER+ breast cancer. There were no causal associations between smoking-related DNA methylation and ER− breast cancer.ConclusionsThese findings highlight potential targets for the prevention of ER+ breast cancer. Tissue-specific epigenetic data are required to confirm these results.
Facebook
TwitterAdditional file 2. Network nodes and network edges for the protein-protein interaction network.
Facebook
TwitterGastric cancer (GC) is one of the leading types of fatal cancer worldwide. Epigenetic manipulation of cancer cells is a useful tool to better understand gene expression regulatory mechanisms and contributes to the discovery of novel biomarkers. Our research group recently reported a list of 83 genes that are potentially modulated by DNA methylation in GC cell lines. Herein, we further explored the regulation of one of these genes, LRRC37A2, in clinical samples. LRRC37A2 expression was evaluated by RT-qPCR, and DNA methylation was studied using next-generation bisulphite sequencing in 36 GC and paired adjacent nonneoplastic tissue samples. We showed that both reduced LRRC37A2 mRNA levels and increased LRRC37A2 exon methylation were associated with undifferentiated and poorly differentiated tumours. Moreover, LRRC37A2 gene expression and methylation levels were inversely correlated at the +45 exon CpG site. We suggest that DNA hypermethylation may contribute to reducing LRRC37A2 expression in undifferentiated and poorly differentiated GC. Therefore, our results show how some genes may be useful to stratify patients who are more likely to benefit from epigenetic therapy.Abbreviations: AR: androgen receptor; 5-AZAdC: 5-aza-2'-deoxycytidine; B2M: beta-2-microglobulin; GAPDH: glyceraldehyde-3-phosphate dehydrogenase; GC: gastric cancer; GLM: general linear model; LRRC37A2: leucine-rich repeat containing 37 member A2; SD: standard deviation; TFII-I: general transcription factor II-I; TSS: transcription start site; XBP1: X-box binding protein 1
Facebook
TwitterDNA methylation is a vital epigenetic change that regulates gene transcription and helps to keep the genome stable. The deregulation hallmark of human cancer is often defined by aberrant DNA methylation which is critical for tumor formation and controls the expression of several tumor-associated genes. In various cancers, methylation changes such as tumor suppressor gene hypermethylation and oncogene hypomethylation are critical in tumor occurrences, especially in breast cancer. Detecting DNA methylation-driven genes and understanding the molecular features of such genes could thus help to enhance our understanding of pathogenesis and molecular mechanisms of breast cancer, facilitating the development of precision medicine and drug discovery. In the present study, we retrospectively analyzed over one thousand breast cancer patients and established a robust prognostic signature based on DNA methylation-driven genes. Then, we calculated immune cells abundance in each patient and lower immune activity existed in high-risk patients. The expression of leukocyte antigen (HLA) family genes and immune checkpoints genes were consistent with the above results. In addition, more mutated genes were observed in the high-risk group. Furthermore, a in silico screening of druggable targets and compounds from CTRP and PRISM databases was performed, resulting in the identification of five target genes (HMMR, CCNB1, CDC25C, AURKA, and CENPE) and five agents (oligomycin A, panobinostat, (+)-JQ1, voxtalisib, and arcyriaflavin A), which might have therapeutic potential in treating high-risk breast cancer patients. Further in vitro evaluation confirmed that (+)-JQ1 had the best cancer cell selectivity and exerted its anti-breast cancer activity through CENPE. In conclusion, our study provided new insights into personalized prognostication and may inspire the integration of risk stratification and precision therapy.
Facebook
TwitterBreast cancer (BC) is the most diagnosed cancer and the leading cause of cancer-related deaths in women. The purpose of this study was to develop a prognostic model based on BC-related DNA methylation pattern. A total of 361 BC incidence-related probes (BCIPs) were differentially methylated in blood samples from women at high risk of BC and BC tissues. Twenty-nine of the 361 BCIPs that significantly correlated with BC outcomes were selected to establish the BCIP score. BCIP scores based on BC-related DNA methylation pattern were developed to evaluate the mortality risk of BC. The correlation between overall survival and BCIP scores was assessed using Kaplan–Meier, univariate, and multivariate analyses. In BC, the BCIP score was significantly correlated with malignant BC characteristics and poor outcomes. Furthermore, we assessed the BCIP score-related gene expression profile and observed that genes with expressions associated with the BCIP score were involved in the process of cancer immunity according to GO and KEGG analyses. Using the ESTIMATE and CIBERSORT algorithms, we discovered that BCIP scores were negatively correlated with both T cell infiltration and immune checkpoint inhibitor response markers in BC tissues. Finally, a nomogram comprising the BCIP score and BC prognostic factors was used to establish a prognostic model for patients with BC, while C-index and calibration curves were used to evaluate the effectiveness of the nomogram. A nomogram comprising the BCIP score, tumor size, lymph node status, and molecular subtype was developed to quantify the survival probability of patients with BC. Collectively, our study developed the BCIP score, which correlated with poor outcomes in BC, to portray the variation in DNA methylation pattern related to BC incidence.
Facebook
TwitterObjectiveTo identify DNA methylation related biomarkers in patients with breast cancer (BC).Materials and MethodsA total of seven BC methylation studies including 1,438 BC patients or breast tissues were included in this study. An elastic net regularized Cox proportional hazards regression (CPH) model was used to build a multi-5′-C-phosphate-G-3′ methylation panel. The diagnosis and prognosis power of the panel was evaluated and validated using a Kaplan–Meier curve, univariate and multivariable CPH, subgroup analysis. A nomogram containing the panel was developed. The relationships between the panel-based methylation risk and the immune landscape and genomic metrics were investigated.ResultsSixty-eight CpG sites were significantly correlated with the overall survival (OS) of BC patients, and based on the result of penalized CPH, a 28-CpG site based multi CpG methylation panel was found. The prognosis and diagnosis role of the panel was validated in the discovery set, validation set, and six independent cohorts, which indicated that higher methylation risk was associated with poor OS, and the panel outperformed currently available biomarkers and remained an independent factor after adjusting for other clinical features. The methylation risk was negatively correlated with innated and adaptive immune cells, and positively correlated with total mutation load, SCNA, and MATH.ConclusionsWe validated a multi CpG methylation panel that could independently predict the OS of BC patients. The Th2-mediated tumor promotion effect—suppression of innate and adaptive immunity—participated in the progression of high-risk BC. Patients with high methylation risk were associated with tumor heterogeneity and poor survival.
Facebook
TwitterAdditional file 7: Table S5. Correlations among DNA methylation-related enzymes in blood and leukemia. The RNA-Seq gene expression data of 7 DNA methylation-related enzymes were obtained from the GTEx and TCGA dataset. The correlations among the expression levels of the 7 enzymes are analyzed and shown.
Facebook
TwitterAdditional file 4: Supplementary Table 3. Differentially methylated regions (DMRs) between the PCa T cell DNA and healthy control T cell DNA. The data was obtained using DMR function in the ChAMP pipeline and input into the IGV browser to identify nearby genes and location of each DMR manually. Reference genome H19 was used for alignment. TSS, Transcription start site.
Facebook
TwitterSupplementary Table 3 from Identification of Breast Cancer DNA Methylation Markers Optimized for Fine-Needle Aspiration Samples
Facebook
TwitterBaseline characteristics for the study population, overall and by cancer types, for Black participants in ARIC
Facebook
TwitterAdditional file3 (XLSX 435 KB)
Facebook
TwitterSummary of 44 retained introns
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics. The authors employed machine learning models to predict genome-wide DNA methylation (DNAm) levels in cancerous tissues (CTs) and paracancerous tissues (PTs) when one of them is difficult to obtain. The proposed model based on a single CpG site achieves an improvement of mean absolute error at more than 68% of CpGs. A multiple-CpG-based XGBoost model can further improve the predictive performance when there is considerable variability between individuals. The detected CpG sites in differential methylation analysis are statistically more significant by combining the measured and predicted PTs to enlarge the sample size. When using CTs as predictors instead of PTs, the prediction models have better performance. The aggressiveness of cancers and patient outcome may be predictable using well-predicted DNAm profiles in CT/PT. Functional enrichment analysis based on highly correlated CpG sites identified important pathways involved in cancer progression. The cross-tumor DNAm prediction model has the potential to be applied to an external cancer dataset for a subset of probes with high correlation in both cancers.