Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Because of its ability to generate biological hypotheses, metabolomics offers an innovative and promising approach in many fields, including clinical research. However, collecting specimens in this setting can be difficult to standardize, especially when groups of patients with different degrees of disease severity are considered. In addition, despite major technological advances, it remains challenging to measure all the compounds defining the metabolic network of a biological system. In this context, the characterization of samples based on several analytical setups is now recognized as an efficient strategy to improve the coverage of metabolic complexity. For this purpose, chemometrics proposes efficient methods to reduce the dimensionality of these complex datasets spread over several matrices, allowing the integration of different sources or structures of metabolic information. Bioinformatics databases and query tools designed to describe and explore metabolic network models offer extremely useful solutions for the contextualization of potential biomarker subsets, enabling mechanistic hypotheses to be considered rather than simple associations. In this study, network principal component analysis was used to investigate samples collected from three cohorts of patients including multiple stages of chronic kidney disease. Metabolic profiles were measured using a combination of four analytical setups involving different separation modes in liquid chromatography coupled to high resolution mass spectrometry. Based on the chemometric model, specific patterns of metabolites, such as N-acetyl amino acids, could be associated with the different subgroups of patients. Further investigation of the metabolic signatures carried out using genome-scale network modeling confirmed both tryptophan metabolism and nucleotide interconversion as relevant pathways potentially associated with disease severity. Metabolic modules composed of chemically adjacent or close compounds of biological relevance were further investigated using carbon transfer reaction paths. Overall, the proposed integrative data analysis strategy allowed deeper insights into the metabolic routes associated with different groups of patients to be gained. Because of their complementary role in the knowledge discovery process, the association of chemometrics and bioinformatics in a common workflow is therefore shown as an efficient methodology to gain meaningful insights in a clinical context.
Facebook
TwitterIntracellular bacterial pathogens are metabolically adapted to grow within mammalian cells. While these adaptations are fundamental to the ability to cause disease, we know little about the relationship between the pathogen's metabolism and virulence. Here we used an integrative Metabolic Analysis Tool that combines transcriptome data with genome-scale metabolic models to define the metabolic requirements of Listeria monocytogenes during infection. Twelve metabolic pathways were identified as differentially active during L. monocytogenes growth in macrophage cells. Intracellular replication requires de novo synthesis of histidine, arginine, purine, and branch chain amino acids (BCAAs), as well as catabolism of L-rhamnose and glycerol. The importance of each metabolic pathway during infection was confirmed by generation of gene knockout mutants in the respective pathways. Next, we investigated the association of these metabolic requirements in the regulation of L. monocytogenes virulence. Here we show that limiting BCAA concentrations, primarily isoleucine, results in robust induction of the master virulence activator gene, prfA, and the PrfA-regulated genes. This response was specific and required the nutrient responsive regulator CodY, which is known to bind isoleucine. Further analysis demonstrated that CodY is involved in prfA regulation, playing a role in prfA activation under limiting conditions of BCAAs. This study evidences an additional regulatory mechanism underlying L. monocytogenes virulence, placing CodY at the crossroads of metabolism and virulence.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1
Facebook
TwitterEvolutionary studies are often limited by missing data that are critical to understanding the history of selection. Selection experiments, which reproduce rapid evolution under controlled conditions, are excellent tools to study how genomes evolve under selection. Here we present a genomic dissection of the Longshanks selection experiment, in which mice were selectively bred over 20 generations for longer tibiae relative to body mass, resulting in 13% longer tibiae in two replicates. We synthesized evolutionary theory, genome sequences and molecular genetics to understand the selection response and found that it involved both polygenic adaptation and discrete loci of major effect, with the strongest loci tending to be selected in parallel between replicates. We show that selection may favor de-repression of bone growth through inactivating two limb enhancers of an inhibitor, Nkx3-2. Our integrative genomic analyses thus show that it is possible to connect individual base-pair changes to the overall selection response.
Facebook
TwitterApplying differentially expressed genes (DEGs) to identify feasible biomarkers in diseases can be a hard task when working with heterogeneous datasets. Expression data are strongly influenced by technology, sample preparation processes, and/or labeling methods. The proliferation of different microarray platforms for measuring gene expression increases the need to develop models able to compare their results, especially when different technologies can lead to signal values that vary greatly. Integrative meta-analysis can significantly improve the reliability and robustness of DEG detection. The objective of this work was to develop an integrative approach for identifying potential cancer biomarkers by integrating gene expression data from two different platforms. Pancreatic ductal adenocarcinoma (PDAC), where there is an urgent need to find new biomarkers due its late diagnosis, is an ideal candidate for testing this technology. Expression data from two different datasets, namely Affymetrix and Illumina (18 and 36 PDAC patients, respectively), as well as from 18 healthy controls, was used for this study. A meta-analysis based on an empirical Bayesian methodology (ComBat) was then proposed to integrate these datasets. DEGs were finally identified from the integrated data by using the statistical programming language R. After our integrative meta-analysis, 5 genes were commonly identified within the individual analyses of the independent datasets. Also, 28 novel genes that were not reported by the individual analyses (‘gained’ genes) were also discovered. Several of these gained genes have been already related to other gastroenterological tumors. The proposed integrative meta-analysis has revealed novel DEGs that may play an important role in PDAC and could be potential biomarkers for diagnosing the disease.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains p-values and statistical significance data derived from analyzing various metabolic and dietary states in mice. The data supports research investigating the effects of diet and metabolic conditions on localized variables in specific regions of mice. The files included are:
Data Collection Methods The data was collected by analyzing correlations between variables within localized regions of the mice. These variables were consistent within individuals but showed variation dependent on dietary or metabolic states. Data collection involved the following steps: 1. Selection of experimental groups based on dietary and metabolic conditions. 2. Quantitative measurement of specific variables in localized regions of mice. 3. Statistical analysis to determine the significance of correlations across the groups.
Data Generation and Processing 1. Generation: Measurements were obtained through laboratory analysis using standardized protocols for each dietary/metabolic condition. 2. Processing: - Statistical tests were performed to identify significant correlations (e.g., t-tests, ANOVA). - P-values were computed to quantify the significance of the relationships observed. - Data was compiled into Excel sheets for organization and clarity. Technical and Non-Technical Information - Technical Details: Each file contains tabular data with headers indicating the variable pairs analyzed, their respective p-values, and the significance level (e.g., p<0.05, p<0.01).
Facebook
TwitterIntroductionNeuroimaging technology has experienced explosive growth and transformed the study of neural mechanisms across health and disease. However, given the diversity of sophisticated tools for handling neuroimaging data, the field faces challenges in method integration, particularly across multiple modalities and species. Specifically, researchers often have to rely on siloed approaches which limit reproducibility, with idiosyncratic data organization and limited software interoperability.MethodsTo address these challenges, we have developed Quantitative Neuroimaging Environment & Toolbox (QuNex), a platform for consistent end-to-end processing and analytics. QuNex provides several novel functionalities for neuroimaging analyses, including a “turnkey” command for the reproducible deployment of custom workflows, from onboarding raw data to generating analytic features.ResultsThe platform enables interoperable integration of multi-modal, community-developed neuroimaging software through an extension framework with a software development kit (SDK) for seamless integration of community tools. Critically, it supports high-throughput, parallel processing in high-performance compute environments, either locally or in the cloud. Notably, QuNex has successfully processed over 10,000 scans across neuroimaging consortia, including multiple clinical datasets. Moreover, QuNex enables integration of human and non-human workflows via a cohesive translational platform.DiscussionCollectively, this effort stands to significantly impact neuroimaging method integration across acquisition approaches, pipelines, datasets, computational environments, and species. Building on this platform will enable more rapid, scalable, and reproducible impact of neuroimaging technology across health and disease.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General description:
Supplementary information belonging to the study "Deep Integrated Network Analysis – a tool to discover and characterize disease pathways in the liver".
Files:
1) Supplementary Figure 1 _ TLN .pdf
Contains the Tree-and-Leaf (TLN) network on which the leaves have been classified according to Gene Ontology Biological Processes.
2) Supplementary Table 1 _ Datasets.xlsx
Contains the list of datasets included in Liver DINA Resource.
For each dataset the GEO series, title, taxonomy, and liver sample count are shown, as well as the classification of dataset condition.
3) Supplementary Table 2 _ Top1000 subset _gene interaction networks.xlsx
Contains the results from the analysis of the 1,000 gene-gene interactions with the highest statistical weight in the Liver DINA Resource.
4) Supplementary Table 3_ TLN modules.xlsx
Contrains the classification of the leafs in the Liver DINA Resource Tree-and-Leaf Network (TLN).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Patients with cardiovascular disease show a panel of differentially regulated serum biomarkers indicative of modulation of several pathways from disease onset to progression. Few of these biomarkers have been proposed for multimarker risk prediction methods. However, the underlying mechanism of the expression changes and modulation of the pathways is not yet addressed in entirety. Our present work focuses on understanding the regulatory mechanisms at transcriptional level by identifying the core and specific transcription factors that regulate the coronary artery disease associated pathways. Using the principles of systems biology we integrated the genomics and proteomics data with computational tools. We selected biomarkers from 7 different pathways based on their association with the disease and assayed 24 biomarkers along with gene expression studies and built network modules which are highly regulated by 5 core regulators PPARG, EGR1, ETV1, KLF7 and ESRRA. These network modules in turn comprise of biomarkers from different pathways showing that the core regulatory transcription factors may work together in differential regulation of several pathways potentially leading to the disease. This kind of analysis can enhance the elucidation of mechanisms in the disease and give better strategies of developing multimarker module based risk predictions.
Facebook
TwitterObjectives: Genome-wide association studies (GWASs) have revealed many candidate SNPs, but the mechanisms by which these SNPs influence diseases are largely unknown. In order to decipher the underlying mechanisms, several methods have been developed to predict disease-associated genes based on the integration of GWAS and eQTL data (e.g., Sherlock and COLOC). A number of studies have also incorporated information from gene networks into GWAS analysis to reprioritize candidate genes. Methods: Motivated by these two different approaches, we have developed a statistical framework to integrate information from GWAS, eQTL, and protein-protein interaction (PPI) data to predict disease-associated genes. Our approach is based on a hidden Markov random field (HMRF) model, and we called the resulting computational algorithm GeP-HMRF (a GWAS-eQTL-PPI-based HMRF). Results: We compared the performance of GeP-HMRF with Sherlock, COLOC, and NetWAS methods on 9 GWAS datasets, using the disease-related genes in the MalaCards database as the standard, and found that GeP-HMRF significantly improves the prediction accuracy. We also applied GeP-HMRF to an age-related macular degeneration disease (AMD) dataset. Among the top 50 genes predicted by GeP-HMRF, 7 are reported by the MalaCards database to be AMD-related with an enrichment p value of 3.61 × 10–119. Among the top 20 genes predicted by GeP-HMRF, CFHR1, CGHR3, HTRA1, and CFH are AMD-related in the MalaCards database, and another 9 genes are supported by the literature. Conclusions: We built a unified statistical model to predict disease-related genes by integrating GWAS, eQTL, and PPI data. Our approach outperforms Sherlock, COLOC, and NetWAS in simulation studies and 9 GWAS datasets. Our approach can be generalized to incorporate other molecular trait data beyond eQTL and other interaction data beyond PPI.
Facebook
TwitterLegumeIP is an integrative database and bioinformatics platform for comparative genomics and transcriptomics to facilitate the study of gene function and genome evolution in legumes, and ultimately to generate molecular based breeding tools to improve quality of crop legumes. LegumeIP currently hosts large-scale genomics and transcriptomics data, including: * Genomic sequences of three model legumes, i.e. Medicago truncatula, Glycine max (soybean) and Lotus japonicus, including two reference plant species, Arabidopsis thaliana and Poplar trichocarpa, with the annotation based on UniProt TrEMBL, InterProScan, Gene Ontology and KEGG databases. LegumeIP covers a total 222,217 protein-coding gene sequences. * Large-scale gene expression data compiled from 104 array hybridizations from L. japonicas, 156 array hybridizations from M. truncatula gene atlas database, and 14 RNA-Seq-based gene expression profiles from G. max on different tissues including four common tissues: Nodule, Flower, Root and Leaf. * Systematic synteny analysis among M. truncatula, G. max, L. japonicus and A. thaliana. * Reconstruction of gene family and gene family-wide phylogenetic analysis across the five hosted species. LegumeIP features comprehensive search and visualization tools to enable the flexible query on gene annotation, gene family, synteny, relative abundance of gene expression.
Facebook
TwitterAn increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pathway Multi-Omics Simulated Data
These are synthetic variations of the TCGA COADREAD data set (original data available at http://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript "pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples".
There are 100 sets (stored as 100 sub-folders, the first 50 in "pt1" and the second 50 in "pt2") of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled "sim001", "sim002", ..., "sim100". Each folder contains the following contents: 1) "indicatorMatricesXXX_ls.RDS" is a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) "CNV_partitionA_deltaB.RDS" is the synthetically modified copy number variation data (where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) "RNAseq_partitionA_deltaB.RDS" is the synthetically modified gene expression data (same parameter legend as CNV), and (4) "Prot_partitionA_deltaB.RDS" is the synthetically modified protein expression data (same parameter legend as CNV).
Supplemental Files
The file "cluster_pathway_collection_20201117.gmt" is the collection of gene sets used for the simulation study in Gene Matrix Transpose format. Scripts to create and analyze these data sets available at: https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CMO is a gene-level association test that can identify many significant and novel genes ignored by many benchmark methods. Specifically, CMO integrates genetically regulated DNAm in enhancers, promoters, and the gene body to identify additional disease-associated genes. This repo contains the necessary models for CMO test.
The corresponding software: https://github.com/ChongWuLab/CMO
Thank you for using this software! Let me (cwu3@fsu.edu) know if you have any questions!
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All generated networks and analysis for the paper with the same title.
Facebook
TwitterObjective
Metabolic signatures have emerged as valuable signaling molecules in the biochemical process of type 2 diabetes (T2D). To summarize and identify metabolic biomarkers in T2D, we performed a systematic review and meta-analysis of the associations between metabolites and T2D using high-throughput metabolomics techniques.
Methods
We searched relevant studies from MEDLINE (PubMed), Embase, Web of Science, and Cochrane Library as well as Chinese databases (Wanfang, Vip, and CNKI) inception through 31 December 2018. Meta-analysis was conducted using STATA 14.0 under random effect. Besides, bioinformatic analysis was performed to explore molecule mechanism by MetaboAnalyst and R 3.5.2.
Results
Finally, 46 articles were included in this review on metabolites involved amino acids, acylcarnitines, lipids, carbohydrates, organic acids, and others. Results of meta-analysis in prospective studies indicated that isoleucine, leucine, valine, tyrosine, phenylalanine, glutamate, alanine, v...
Facebook
TwitterThe primary aim of this study is to evaluate the effect of transient knock down of P53 as a tool to increase the efficiency of a non-integrative methodology for reprogramming adult human normal dermal fibroblasts. This study demonstrate that transient knockdown of P53 is an efficient way to produce iPSC containing minimal genomic alterations, which meets the increased demand for iPSC in personalized drug screening campaigns. Total RNA was isolated from 3 iPS cell lines generated without P53 knockdown and 3 generated with P53 knockdown. In addition total RNA was isolated from the parental normal human dermal fibroblasts and from a reference human iPS cell line from Systembio (SBI).
Facebook
TwitterAdditional file 2: Table S2. Running time and memory comparison between MAESTRO and other tools for scATAC-seq analysis.
Facebook
Twitterhttps://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/
Purpose: To investigate the sex-dependence of liver transcriptome in Diversity Outbred (DO)-F1 mice Methods: Total RNA was extracted from snap-frozen liver using miRVana total RNA isolation kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s protocol. The quality and amount of liver RNA were evaluated using a Bioanalyzer (Agilent, Inc., Santa Clara, CA). The average RNA-integrity score for 162 DO-F1 liver samples was 9.01 ± 0.4. RNA samples from 85 females and 77 males were submitted to the UC Davis DNA Technologies Core at the Genome Center. The RNA-seq libraries were constructed from 1 µg total RNA after poly-A library preparation. To minimize technical variability, all samples were assigned to each lane and the pooled libraries were sequenced on two lanes of the Illumina NovaSeq 6000 sequencing (Illumina Inc., San Diego, CA, USA) to achieve paired-end reads of at least 25 million 150 bp. Only R1 was used in the analysis and only R1 was submitted. Results: Our results demonstrate the tremendous effects of sex on hepatic gene expression. In support of this, genetic loci associated with the transcripts frequently showed sex specificity. We revealed sex-specific candidate genes that were mapped to the quantitative trait loci for aortic lesion area and whose expression was regulated locally regulated via global liver transcriptome. Conclusions: Our study provide a valuable data resource to the research community and show that liver transcriptomic analysis identified diet- or strain-specific pathways to pathogenesis of metabolic syndrome. Overall design: Liver mRNA profiles of 24-week old Diversity Outbred-F1 mice
Facebook
TwitterBackground/Aims: Chronic kidney disease (CKD) is a worldwide public health problem. Regardless of the underlying primary disease, CKD tends to progress to end-stage kidney disease, resulting in unsatisfactory and costly treatment. Its common pathogenesis, however, remains unclear. The aim of this study was to provide an unbiased catalog of common gene-expression changes of CKD and reveal the underlying molecular mechanism using an integrative bioinformatics approach. Methods: We systematically collected over 250 Affymetrix microarray datasets from the glomerular and tubulointerstitial compartments of healthy renal tissues and those with various types of established CKD (diabetic kidney disease, hypertensive nephropathy, and glomerular nephropathy). Then, using stringent bioinformatics analysis, shared differentially expressed genes (DEGs) of CKD were obtained. These shared DEGs were further analyzed by the gene ontology (GO) and pathway enrichment analysis. Finally, the protein-protein interaction networks(PINs) were constructed to further refine our results. Results: Our analysis identified 176 and 50 shared DEGs in diseased glomeruli and tubules, respectively, including many transcripts that have not been previously reported to be involved in kidney disease. Enrichment analysis also showed that the glomerular and tubulointerstitial compartments underwent a wide range of unique pathological changes during chronic injury. As revealed by the GO enrichment analysis, shared DEGs in glomeruli were significantly enriched in exosomes. By constructing PINs, we identified several hub genes (e.g. OAS1, JUN, and FOS) and clusters that might play key roles in regulating the development of CKD. Conclusion: Our study not only further reveals the unifying molecular mechanism of CKD pathogenesis but also provides a valuable resource of potential biomarkers and therapeutic targets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Because of its ability to generate biological hypotheses, metabolomics offers an innovative and promising approach in many fields, including clinical research. However, collecting specimens in this setting can be difficult to standardize, especially when groups of patients with different degrees of disease severity are considered. In addition, despite major technological advances, it remains challenging to measure all the compounds defining the metabolic network of a biological system. In this context, the characterization of samples based on several analytical setups is now recognized as an efficient strategy to improve the coverage of metabolic complexity. For this purpose, chemometrics proposes efficient methods to reduce the dimensionality of these complex datasets spread over several matrices, allowing the integration of different sources or structures of metabolic information. Bioinformatics databases and query tools designed to describe and explore metabolic network models offer extremely useful solutions for the contextualization of potential biomarker subsets, enabling mechanistic hypotheses to be considered rather than simple associations. In this study, network principal component analysis was used to investigate samples collected from three cohorts of patients including multiple stages of chronic kidney disease. Metabolic profiles were measured using a combination of four analytical setups involving different separation modes in liquid chromatography coupled to high resolution mass spectrometry. Based on the chemometric model, specific patterns of metabolites, such as N-acetyl amino acids, could be associated with the different subgroups of patients. Further investigation of the metabolic signatures carried out using genome-scale network modeling confirmed both tryptophan metabolism and nucleotide interconversion as relevant pathways potentially associated with disease severity. Metabolic modules composed of chemically adjacent or close compounds of biological relevance were further investigated using carbon transfer reaction paths. Overall, the proposed integrative data analysis strategy allowed deeper insights into the metabolic routes associated with different groups of patients to be gained. Because of their complementary role in the knowledge discovery process, the association of chemometrics and bioinformatics in a common workflow is therefore shown as an efficient methodology to gain meaningful insights in a clinical context.