Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metabolomics data analysis depends on the utilization of bioinformatics tools. To meet the evolving needs of metabolomics research, several integrated platforms have been developed. Our group has developed a desktop platform IP4M (integrated Platform for Metabolomics Data Analysis) which allows users to perform a nearly complete metabolomics data analysis in one-stop. With the extensive usage of IP4M, more and more demands were raised from users worldwide for a web version and a more customized workflow. Thus, iMAP (integrated Metabolomics Analysis Platform) was developed with extended functions, improved performances, and redesigned structures. Compared with existing platforms, iMAP has more methods and usage modes. A new module was developed with an automatic pipeline for train-test set separation, feature selection, and predictive model construction and validation. A new module was incorporated with sufficient editable parameters for network construction, visualization, and analysis. Moreover, plenty of plotting tools have been upgraded for highly customized publication-ready figures. Overall, iMAP is a good alternative tool with complementary functions to existing metabolomics data analysis platforms. iMAP is freely available for academic usage at https://imap.metaboprofile.cloud/ (License MPL 2.0).
The Center develops conceptual models, computational infrastructure, an integrated knowledge repository, and query and analysis tools that enable scientists to effectively access and integrate the wealth of biological data. The National Center for Integrative Biomedical Informatics (NCIBI) was founded in October 2005 and is one of seven National Centers for Biomedical Computing (NCBC) in the NIH Roadmap. NCIBI is based at the University of Michigan as a part of the Center for Computational Medicine and Biology (CCMB). NCIBI is composed of biomedical researchers, computational biologists, computer scientists, developers and human-computer interaction specialists organized into seven major core functions. They work in interdisciplinary teams to collectively develop tools that are not only computationally powerful but also biologically relevant and meaningful. The four initial Driving Biological Projects (prostate cancer progression, Type 1 and type 2 diabetes and bipolar disorder) provide the nucleation point from which tool development is informed, launched, and tested. In addition to testing tools for function, a separate team is dedicated to testing usability and user interaction that is a unique feature of this Center. Once tools are developed and validated the goal of the Center is to share and disseminate data and software throughout the research community both internally and externally. This is achieved through various mechanisms such as training videos, tutorials, and demonstrations and presentations at national and international scientific conferences. NCIBI is supported by NIH Grant # U54-DA021519.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison with existing tools for analysis and interpretation of metabolomic data.
Utilizing multimodal mass spectrometry imaging (MSI) combined with machine learning techniques, this study investigates the molecular heterogeneity of amyloid-β (Aβ) plaques and associated lipid profiles in post-mortem brain samples from Alzheimer’s disease (AD) and amyloid-positive cognitively unaffected (AP-CU) individuals. Our analytical approach permitted single-plaque level investigation, revealing distinct populations of amyloid plaques characterized by differential Aβ and lipid compositions. Notably, the integration of MSI data with machine learning based feature extraction enabled the identification of Aβ38 and ganglioside GM1 as significant molecular markers differentiating AD from AP-CU pathology. These findings suggest that the heterogeneity in Aβ metabolism and lipid homeostasis, as revealed through precise analysis, is a key factor in the pathogenesis of AD and implies that total amyloid burden alone is an insufficient marker for the disease. The application of MSI and machine learning based feature extraction in this context exemplifies a progressive analytic strategy to unravel complex biochemical phenomena, offering potential pathways for the refinement of diagnostic tools and deepening the understanding of neurodegenerative diseases from an analytical chemistry perspective.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1
The salt acclimation process of the euryhaline model cyanobacterium Synechocystis sp. PCC 6803 was analyzed by combining transcriptomic, proteomic and metabolomic methods. The comparison of salt-induced proteome and transcriptome changes revealed that most stably up-regulated proteins also showed elevated mRNA levels. The Pearson correlation coefficient for salt-induced abundance changes of 1749 transcript/protein pairs was r = 0.58. In addition to the rapid and stable upregulation of compatible solute biochemistry, a dynamic reorganization of the transcriptome occurred during the first hours after salt shock, which probably involves the action of small regulatory RNAs. Based on these data, an extended salt stimulon can be defined comprising many proteins directly or indirectly related to compatible solute metabolism, ion and water movements as well as a defined set of small regulatory RNAs. Our comprehensive data set provides the basis for future attempts to engineer cyanobacterial salt tolerance and to search for processes regulating this important environmental acclimation process.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pathway Multi-Omics Simulated Data
These are synthetic variations of the TCGA COADREAD data set (original data available at http://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript "pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples".
There are 100 sets (stored as 100 sub-folders, the first 50 in "pt1" and the second 50 in "pt2") of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled "sim001", "sim002", ..., "sim100". Each folder contains the following contents: 1) "indicatorMatricesXXX_ls.RDS" is a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) "CNV_partitionA_deltaB.RDS" is the synthetically modified copy number variation data (where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) "RNAseq_partitionA_deltaB.RDS" is the synthetically modified gene expression data (same parameter legend as CNV), and (4) "Prot_partitionA_deltaB.RDS" is the synthetically modified protein expression data (same parameter legend as CNV).
Supplemental Files
The file "cluster_pathway_collection_20201117.gmt" is the collection of gene sets used for the simulation study in Gene Matrix Transpose format. Scripts to create and analyze these data sets available at: https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains p-values and statistical significance data derived from analyzing various metabolic and dietary states in mice. The data supports research investigating the effects of diet and metabolic conditions on localized variables in specific regions of mice. The files included are:
Data Collection Methods The data was collected by analyzing correlations between variables within localized regions of the mice. These variables were consistent within individuals but showed variation dependent on dietary or metabolic states. Data collection involved the following steps: 1. Selection of experimental groups based on dietary and metabolic conditions. 2. Quantitative measurement of specific variables in localized regions of mice. 3. Statistical analysis to determine the significance of correlations across the groups.
Data Generation and Processing 1. Generation: Measurements were obtained through laboratory analysis using standardized protocols for each dietary/metabolic condition. 2. Processing: - Statistical tests were performed to identify significant correlations (e.g., t-tests, ANOVA). - P-values were computed to quantify the significance of the relationships observed. - Data was compiled into Excel sheets for organization and clarity. Technical and Non-Technical Information - Technical Details: Each file contains tabular data with headers indicating the variable pairs analyzed, their respective p-values, and the significance level (e.g., p<0.05, p<0.01).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Following on emerging understanding of the diversification process, many recent workers have considered infraspecific taxa as valuable for formally recognizing incompletely speciated entities. The distinction between a species and an infraspecific taxon represents a fundamentally subjective weighting of evidence, yet this points further to the need for an evidential basis for these decisions. We explore these concepts in Heuchera longiflora (Saxifragaceae), which is morphologically variable and has a disjunct range across several physiographic provinces in the eastern U.S. using a tiered sampling approach and a combination of Sanger and next generation sequencing (NGS) techniques. We investigated 56 populations with seven markers to investigate population structure, and for a subset of 12 representative populations we sequenced 277 nuclear markers to characterize gene tree discord. Using a variety of methods to overcome high levels of gene conflict, we find evidence for the traditionally recognized taxa H. longiflora var. longiflora and H. longiflora var. aceroides in molecular and morphological data partitions. Our results comprise a case study on the use of multiple sources of data and analytical methods to delimit infraspecific taxa.
Castro et al - Pedigreed Longshanks Phenotypic DataPhenotypic data from the Longshanks selection experiment, consisting of the Ctrl and Longshanks replicates 1 and 2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the input and output files necessary to reproduce the case studies reported in the manuscript "MUUMI: an R package for statistical and network-based meta-analysis for MUlti-omics data Integration". MUUMI is an R package implementing network-based data integration and statistical meta-analysis within a single analytical framework. MUUMI allows the identification of robust molecular signatures through multiple meta-analytic methods, inference and analysis of molecular interactomes and the integration of multiple omics layers. The functionalities of MUUMI are showcased in two case studies in which we analysed 17 transcriptomic datasets on idiopathic pulmonary fibrosis (IPF) from both microarray and RNA-Seq platforms and multi-omics data of THP-1 macrophages exposed to different polarising stimuli. Part of the data reported in this repository derive from the Zenodo entry https://doi.org/10.5281/zenodo.10692129 (Curated and harmonised transcriptomics datasets of interstitial lung disease patients). Other data derive from the following publication: Migliaccio G, Morikka J, del Giudice G, Vaani M, Möbus L, Serra A, Federico A, Greco D. Methylation and transcriptomic profiling reveals short term and long term regulatory responses in polarized macrophages, Comp and Struct Biotech J, 2024(25), 143-152. doi: 10.1016/j.csbj.2024.08.018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CMO is a gene-level association test that can identify many significant and novel genes ignored by many benchmark methods. Specifically, CMO integrates genetically regulated DNAm in enhancers, promoters, and the gene body to identify additional disease-associated genes. This repo contains the necessary models for CMO test.
The corresponding software: https://github.com/ChongWuLab/CMO
Thank you for using this software! Let me (cwu3@fsu.edu) know if you have any questions!
A rapid increase of new nanomaterial products poses new challenges for their risk assessment. Current traditional methods for estimating potential adverse health effect of nanomaterials (NMs) are complex, time consuming and expensive. In order to develop new prediction tests for nanotoxicity evaluation, a systems biology approach and data from high-throughput omics experiments can be used. We present a computational approach that combines reverse engineering techniques, network analysis and pathway enrichment analysis for inferring the transcriptional regulation landscape and its functional interpretation. To illustrate this approach, we used published transcriptomic data derived from mice lung tissue exposed to carbon nanotubes (NM-401 and NRCWE-26). Because fibrosis is the most common adverse effect of these NMs, we included in our analysis the data for bleomycin (BLM) treatment, which is a well-known fibrosis inducer. We inferred gene regulatory networks for each NM and BLM to captur...
Objective
Metabolic signatures have emerged as valuable signaling molecules in the biochemical process of type 2 diabetes (T2D). To summarize and identify metabolic biomarkers in T2D, we performed a systematic review and meta-analysis of the associations between metabolites and T2D using high-throughput metabolomics techniques.
Methods
We searched relevant studies from MEDLINE (PubMed), Embase, Web of Science, and Cochrane Library as well as Chinese databases (Wanfang, Vip, and CNKI) inception through 31 December 2018. Meta-analysis was conducted using STATA 14.0 under random effect. Besides, bioinformatic analysis was performed to explore molecule mechanism by MetaboAnalyst and R 3.5.2.
Results
Finally, 46 articles were included in this review on metabolites involved amino acids, acylcarnitines, lipids, carbohydrates, organic acids, and others. Results of meta-analysis in prospective studies indicated that isoleucine, leucine, valine, tyrosine, phenylalanine, glutamate, alanine, v...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General description:
Supplementary information belonging to the study "Deep Integrated Network Analysis – a tool to discover and characterize disease pathways in the liver".
Files:
1) Supplementary Figure 1 _ TLN .pdf
Contains the Tree-and-Leaf (TLN) network on which the leaves have been classified according to Gene Ontology Biological Processes.
2) Supplementary Table 1 _ Datasets.xlsx
Contains the list of datasets included in Liver DINA Resource.
For each dataset the GEO series, title, taxonomy, and liver sample count are shown, as well as the classification of dataset condition.
3) Supplementary Table 2 _ Top1000 subset _gene interaction networks.xlsx
Contains the results from the analysis of the 1,000 gene-gene interactions with the highest statistical weight in the Liver DINA Resource.
4) Supplementary Table 3_ TLN modules.xlsx
Contrains the classification of the leafs in the Liver DINA Resource Tree-and-Leaf Network (TLN).
Integrative taxonomy has emerged as a methodologically sound method for discovering new taxa. By coupling different sources of information (e.g. molecular and morphological data) and by combining analytical approaches (e.g. multivariate analyses, phylogenetics), systematists have the tools to uncover cryptic diversity and draw more reliable species hypotheses. Integratively-delimited taxa may be more prone to formal description compared to taxa delimited with a single source of information, because more detailed species hypotheses can be drawn. Integrative taxonomy has therefore the potential to reduce the so-called “taxonomic impediment†(i.e. the gap between the number of taxa that remain to be described and the number of taxa that are formally named). However, bridging fields such as molecular systematics and morphological taxonomy may remain challenging, as it either requires the training of versatile scientists, or the establishment of collaborations among scientists from these fie...
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Integrative Health Coaching market has emerged as a transformative force in the realm of wellness and health management, combining traditional health practices with modern coaching techniques to foster holistic well-being. This approach emphasizes personalized care, addressing not just physical health but also e
The role of microRNAs (miRNAs) in multiple myeloma (MM) has yet to be fully elucidated. To identify miRNAs that are potentially deregulated in MM, we investigated those mapping within transcription units, based on evidence that intronic miRNAs are frequently coexpressed with their host genes. To this end, we monitored host transcript expression values in a panel of 20 human MM cell lines (HMCLs) profiled on Affymetrix U133A oligonucleotide microarrays and focused on transcripts whose expression varied significantly across the dataset. We identified transcripts specific to six miRNA host genes (CCPG1, GULP1, EVL, TACSTD1, MEST, and TNIK) whose average changes in expression varied at least 2-fold from the mean of the examined dataset. We evaluated the expression levels of the corresponding intronic miRNAs by quantitative real-time RT-PCR. There was a significant correlation between the expression levels of MEST, EVL, and GULP1 and those of the corresponding miRNAs miR-335, miR-342-3p, and miR-561, respectively. Genome-wide profiling of the 20 HMCLs with the Affymetrix GeneChip human mapping 250K array set indicated that the increased expression of the three host genes and their corresponding intronic miRNAs was not correlated with local copy number variations. Notably, miRNAs and their host genes were overexpressed in a fraction of primary tumors with respect to normal plasma cells; however, this finding was not correlated with known molecular myeloma groups. The predicted putative miRNA targets, as well as the transcriptional profiles associated with the primary tumors, suggest that MEST/miR-335 and EVL/miR-342-3p may play a role in plasma cell homing and/or interactions with the bone marrow microenvironment. Our data support the idea that intronic miRNAs and their host genes are regulated dependently, and may contribute to the understanding of their biological roles in cancer. To our knowledge, this is the first evidence of deregulated miRNA expression in MM, providing insights that may lead to the identification of new biomarkers and altered molecular pathways of the disease. Keywords: integrative genomic analysis of miR-335, miR-342, and miR-561 in multiple myeloma This series of microarray experiments contains the genome-wide profiles of 20 HMCLs. 250 nanograms of genomic DNA was processed and, in accordance with the manufacturer's protocols, 90 micrograms of fragmented biotin-labelled DNA were hybridized on GeneChip® Human Mapping 250K NspI Arrays (Affymetrix Inc.). The arrays were scanned using the GeneChip® Scanner 3000 7G. The images were acquired using Affymetrix GeneChip® Operating Software (GCOS version 1.4). Copy number values for individual SNPs were extracted and converted from CEL files into signal intensities using GTYPE 4.1 and Affymetrix Copy Number Analysis Tool (CNAT 4.0.1) softwares. Raw data were extracted using the Hidden Markov Model Genomic Smoothing window was set to 0. After the preprocessing, piecewise constant estimates of the underlying local DNA CN variation was calculated using the DNA copy Bioconductor package, which looks for optimal breakpoints using circular binary segmentation (CBS).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metabolomics data analysis depends on the utilization of bioinformatics tools. To meet the evolving needs of metabolomics research, several integrated platforms have been developed. Our group has developed a desktop platform IP4M (integrated Platform for Metabolomics Data Analysis) which allows users to perform a nearly complete metabolomics data analysis in one-stop. With the extensive usage of IP4M, more and more demands were raised from users worldwide for a web version and a more customized workflow. Thus, iMAP (integrated Metabolomics Analysis Platform) was developed with extended functions, improved performances, and redesigned structures. Compared with existing platforms, iMAP has more methods and usage modes. A new module was developed with an automatic pipeline for train-test set separation, feature selection, and predictive model construction and validation. A new module was incorporated with sufficient editable parameters for network construction, visualization, and analysis. Moreover, plenty of plotting tools have been upgraded for highly customized publication-ready figures. Overall, iMAP is a good alternative tool with complementary functions to existing metabolomics data analysis platforms. iMAP is freely available for academic usage at https://imap.metaboprofile.cloud/ (License MPL 2.0).