Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PFam Domains and biological process GO categories for the four rhizobia strains. Predicted proteins related to multiple GO biological process categories are joined together with the pipe character. (XLSX 639Â kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Non duplicated AMP precursor protein libraries from 7 databases of Cnidaria: Db1 – 6 proteomes derived from sequenced genomes of Anthozoa Db2 – 2 proteomes derived from sequenced genomes of Medusozoa Db3 – 46 whole body/non-specific transcriptomes of Anthozoa Db4 – 24 whole body/non specific transcriptomes of Medusozoa Db5 – 25 transcriptomes specific to the tentacles of Anthozoa Db6 – 7 transcriptomes specific to the tentacles of Medusozoa Db7 – 2 transcriptomes specific to the nematocysts of Anthozoa
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a tool for multi-omics data analysis that enables simultaneous visualization of up to four types of omics data on organism-scale metabolic network diagrams. The tool’s interactive web-based metabolic charts depict the metabolic reactions, pathways, and metabolites of a single organism as described in a metabolic pathway database for that organism; the charts are constructed using automated graphical layout algorithms. The multi-omics visualization facility paints each individual omics dataset onto a different “visual channel” of the metabolic-network diagram. For example, a transcriptomics dataset might be displayed by coloring the reaction arrows within the metabolic chart, while a companion proteomics dataset is displayed as reaction arrow thicknesses, and a complementary metabolomics dataset is displayed as metabolite node colors. Once the network diagrams are painted with omics data, semantic zooming provides more details within the diagram as the user zooms in. Datasets containing multiple time points can be displayed in an animated fashion. The tool will also graph data values for individual reactions or metabolites designated by the user. The user can interactively adjust the mapping from data value ranges to the displayed colors and thicknesses to provide more informative diagrams.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mangroves are dominant flora of intertidal zones along tropical and subtropical coastline around the world that offer important ecological and economic value. Recently, the genomes of mangroves have been decoded, and massive omics data were generated and deposited in the public databases. Reanalysis of multi-omics data can provide new biological insights excluded in the original studies. However, the requirements for computational resource and lack of bioinformatics skill for experimental researchers limit the effective use of the original data. To fill this gap, we uniformly processed 942 transcriptome data, 386 whole-genome sequencing data, and provided 13 reference genomes and 40 reference transcriptomes for 53 mangroves. Finally, we built an interactive web-based database platform MangroveDB (https://github.com/Jasonxu0109/MangroveDB), which was designed to provide comprehensive gene expression datasets to facilitate their exploration and equipped with several online analysis tools, including principal components analysis, differential gene expression analysis, tissue-specific gene expression analysis, GO and KEGG enrichment analysis. MangroveDB not only provides query functions about genes annotation, but also supports some useful visualization functions for analysis results, such as volcano plot, heatmap, dotplot, PCA plot, bubble plot, population structure etc. In conclusion, MangroveDB is a valuable resource for the mangroves research community to efficiently use the massive public omics datasets.
Facebook
TwitterA panel of 60 human cancer cell lines used for screening anticancer drugs.
Facebook
TwitterAs an economically important crop, apple is one of the most cultivated fruit trees in temperate regions worldwide. Recently, a large number of high-quality transcriptomic and epigenomic datasets for apple were made available to the public, which could be helpful in inferring gene regulatory relationships and thus predicting gene function at the genome level. Through integration of the available apple genomic, transcriptomic, and epigenomic datasets, we constructed co-expression networks, identified functional modules, and predicted chromatin states. A total of 112 RNA-seq datasets were integrated to construct a global network and a conditional network (tissue-preferential network). Furthermore, a total of 1,076 functional modules with closely related gene sets were identified to assess the modularity of biological networks and further subjected to functional enrichment analysis. The results showed that the function of many modules was related to development, secondary metabolism, hormone response, and transcriptional regulation. Transcriptional regulation is closely related to epigenetic marks on chromatin. A total of 20 epigenomic datasets, which included ChIP-seq, DNase-seq, and DNA methylation analysis datasets, were integrated and used to classify chromatin states. Based on the ChromHMM algorithm, the genome was divided into 620,122 fragments, which were classified into 24 states according to the combination of epigenetic marks and enriched-feature regions. Finally, through the collaborative analysis of different omics datasets, the online database AppleMDO (http://bioinformatics.cau.edu.cn/AppleMDO/) was established for cross-referencing and the exploration of possible novel functions of apple genes. In addition, gene annotation information and functional support toolkits were also provided. Our database might be convenient for researchers to develop insights into the function of genes related to important agronomic traits and might serve as a reference for other fruit trees.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the Multi-omics for Understanding Climate Change (MUCC database) version 2.0.0. This current version is based on amplicon and metagenomic sequencing of Old Woman Creek (OWC), Prairie Pothole Region(PPR7 and PPR8), Jean Lafitte National Historical Park and Preserve (JLA), AmeriFlux site US-LA2 (LA2), Stordalen Mire (STM-fen and STM-bog), AmeriFlux site-ID US-Twt (TWI), and Peatland Responses Under Changing Environments (SPRUCE) and wetland soils. Additionally, this includes metatranscriptome sequencing from OWC. In the future, this will be expanded to include more data from these sites and from additional wetlands.
OWC, PPR, JLA and LA2 data are deposited in NCBI Bioproject PRJNA1007388
Stordalen Mire MAGs are deposited in BioProject PRJNA386538
AmeriFlux site-ID US-Twt are deposited in SRA SRP003022, SRA SRP010671, SRP010730, SRP010738, SRP010741, SRP010747, SRP010748, SRP010751, SRP010862, SRP010870, and SRP011309.
SPRUCE data are deposited in PRJNA638786 and PRJNA638601
Files and datasets included here:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MLOmics: Cancer Multi-Omics Database for Machine Learning
Abstract
Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA)… See the full description on the dataset page: https://huggingface.co/datasets/AIBIC/MLOmics.
Facebook
Twitterhttps://enanomapper.adma.ai/about/omicshttps://enanomapper.adma.ai/about/omics
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
omics metadata project data: Nanosafety-relevant omics data - a database covering metadata for transcriptomics, proteomics and microRNA expression data relevant to safety assessment analyses of nanomaterials
Facebook
TwitterNewt-omics is a database, which enables researchers to locate, retrieve and store data sets dedicated to the molecular characterization of newts. Newt-omics is a transcript-centered database, based on an Expressed Sequence Tag (EST) data set from the newt, covering ~50,000 Sanger sequenced transcripts and a set of high-density microarray data, generated from regenerating hearts. Newt-omics also contains a large set of peptides identified by mass spectrometry, which was used to validate 13,810 ESTs as true protein coding. Newt-omics is open to implement additional high-throughput data sets without changing the database structure. Via a user-friendly interface Newt-omics allows access to a huge set of molecular data without the need for prior bioinformatical expertise. The newt Notopthalmus viridescens is the master of regeneration. This organism is known for more than 200 years for its exceptional regenerative capabilities. Newts can completely replace lost appendages like limb and tail, lens and retina and parts of the central nervous system. Moreover, after cardiac injury newts can rebuild the functional myocardium with no scar formation. To date only very limited information from public databases is available. Newt-Omics aims to provide a comprehensive platform of expressed genes during tissue regeneration, including extensive annotations, expression data and experimentally verified peptide sequences with yet no homology to other publicly available gene sequences. The goal is to obtain a detailed understanding of the molecular processes underlying tissue regeneration in the newt, that may lead to the development of approaches, efficiently stimulating regenerative pathways in mammalians. * Number of contigs: 26594 * Number of est in contigs: 48537 * Number of transcripts with verified peptide: 5291 * Number of peptides: 15169
Facebook
TwitterBiodiversity changes due to human activities highlight the need for efficient biodiversity monitoring approaches. Environmental DNA (eDNA) metabarcoding offers a non-invasive method used for biodiversity monitoring and ecosystem assessment, but its accuracy depends on comprehensive DNA reference databases. Natural history collections often contain rare or difficult-to-obtain samples that can serve as a valuable resource to fill gaps in eDNA reference databases. Here, we discuss the utility of specimens from natural history collections in supporting future eDNA applications. Museomics—the application of -omics techniques to museum specimens—offers a promising avenue for improving eDNA reference databases by increasing species coverage. Furthermore, museomics can provide transferable methodological advancements for extracting genetic material from samples with low and degraded DNA. The integration of natural history collections, museomics, and eDNA approaches has the potential to signific..., Dataset for analyzing the potential of museum specimens to improve the DNA reference database To examine the cumulative number of species sequenced for a given DNA barcode/mitochondrial genome (also referred to as mitogenome) over the years, we retrieved all data available from NCBI using the R package rentrez v1.2.3 (Winter 2017). We searched the nucleotide database for the rRNA 12S, rRNA 16S, rRNA 18S, cytochrome B (cytB), cytochrome oxidase I (COI) barcodes, as well as for the complete mitogenomes for all fish orders. In addition, we also retrieved all the fish species with available data on the sequence read archive (SRA) using the Entrez Direct (Kans 2024), which provides access to the NCBI databases from a Unix terminal window. To highlight the potential of museum specimens for increasing the number of species with an available barcode/mitogenome sequence, we first downloaded all available datasets on the Global Biodiversity Information Facility (GBIF) listing fish specimens store..., , # Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring
The dataset consists of a main folder, data.zip.
Various
barcodes_data
output from the cumul_barcodes_plot.R script.
occurence_data
contains a different type of list of species (museum, 12S availability, etc.)
museum_potential/1_process_gbif_datasets.R. Contains all the species of fish found in the main natural ...,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the Multi-omics for Understanding Climate Change (MUCC database) version 1.0.0. This current version is based on metagenomic and metatranscriptomic sequencing of Old Woman Creek wetland soils, but will be expanded in the future to include data from additional wetlands. Files and datasets included here:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3:
Facebook
TwitterPopulus trichocarpa (P. trichocarpa) is a model tree for the investigation of wood formation. In recent years, researchers have generated a large number of high-throughput sequencing data in P. trichocarpa. However, no comprehensive database that provides multi-omics associations for the investigation of secondary growth in response to diverse stresses has been reported. Therefore, we developed a public repository that presents comprehensive measurements of gene expression and post-transcriptional regulation by integrating 144 RNA-Seq, 33 ChIP-seq, and six single-molecule real-time (SMRT) isoform sequencing (Iso-seq) libraries prepared from tissues subjected to different stresses. All the samples from different studies were analyzed to obtain gene expression, co-expression network, and differentially expressed genes (DEG) using unified parameters, which allowed comparison of results from different studies and treatments. In addition to gene expression, we also identified and deposited pre-processed data about alternative splicing (AS), alternative polyadenylation (APA) and alternative transcription initiation (ATI). The post-transcriptional regulation, differential expression, and co-expression network datasets were integrated into a new P. trichocarpa Stem Differentiating Xylem (PSDX) database, which further highlights gene families of RNA-binding proteins and stress-related genes. The PSDX also provides tools for data query, visualization, a genome browser, and the BLAST option for sequence-based query. Much of the data is also available for bulk download. The availability of PSDX contributes to the research related to the secondary growth in response to stresses in P. trichocarpa, which will provide new insights that can be useful for the improvement of stress tolerance in woody plants.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mining cancer-omics databases deepens our understanding of cancer biology and can lead to potential breakthroughs in cancer treatment. Here, we propose an integrative analytical approach to reveal across-cancer expression patterns and identify potential clinical impacts for genes of interest from five representative public databases. Using ribonucleotide reductase (RR), a key enzyme in DNA synthesis and cancer-therapeutic targeting, as an example, we characterized the mRNA expression profiles and inter-component associations of three RR subunit genes and assess their differing pathological and prognostic significance across over 30-types of cancers and their related subtypes. Findings were validated by immunohistochemistry with clinical tissue samples (n = 211) collected from multiple cancer centers in China and with clinical follow-up. Underlying mechanisms were further explored and discussed using co-expression gene network analyses. This framework represents a simple, efficient, accurate, and comprehensive approach for cancer-omics resource analysis and underlines the necessity to separate the tumors by their histological or pathological subtypes during the clinical evaluation of molecular biomarkers.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, proteomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different proteins active in a cell. Thus, the Chromosome-centric Human Proteome Project (C-HPP) is promoting a full characterization of the human proteome combining high-throughput proteomics with the data derived from genome-wide expression profiling of protein-coding genes. Here we present a full proteomic profiling of a human lymphoma B-cell line (Ramos) performed using a nanoUPLC-LTQ-Orbitrap Velos proteomic platform, combined to an in-depth transcriptomic profiling of the same cell type. Data are available via ProteomeXchange with identifier PXD001933. Integration of the proteomic and transcriptomic data sets revealed a 94% overlap in the proteins identified by both -omics approaches. Moreover, functional enrichment analysis of the proteomic profiles showed an enrichment of several functions directly related to the biological and morphological characteristics of B-cells. In turn, about 30% of all protein-coding genes present in the whole human genome were identified as being expressed by the Ramos cells (stable average of 30% genes along all the chromosomes), revealing the size of the protein expression-set present in one specific human cell type. Additionally, the identification of missing proteins in our data sets has been reported, highlighting the power of the approach. Also, a comparison between neXtProt and UniProt database searches has been performed. In summary, our transcriptomic and proteomic experimental profiling provided a high coverage report of the expressed proteome from a human lymphoma B-cell type with a clear insight into the biological processes that characterized these cells. In this way, we demonstrated the usefulness of combining -omics for a comprehensive characterization of specific biological systems.
Facebook
TwitterA collection of Pathway/Genome Databases which describes the genome and metabolic pathways of a single organism. The BioCyc collection of Pathway/Genome Databases (PGDBs) provides an electronic reference source on the genomes and metabolic pathways of sequenced organisms. BioCyc PGDBs are generated by software that predicts the metabolic pathway complements of completely sequenced organisms from their genome sequences. They also include the results of a number of other computational inference procedures applied to these genomes, including predictions of which genes code for missing enzymes in metabolic pathways, and predicted operons. The BioCyc Web site provides a suite of software tools for database searching and visualization, for omics data analysis, and for comparative genomics and comparative pathway questions. The databases within the BioCyc collection are organized into tiers according to the amount of manual review and updating they have received. Tier 1 PGDBs have been created through intensive manual efforts, and receive continuous updating. Tier 2 PGDBs were computationally generated by the PathoLogic program, and have undergone moderate amounts of review and updating. Tier 3 PGDBs were computationally generated by the PathoLogic program, and have undergone no review and updating. There are 967 DBs in Tier 3. The downloadable version of BioCyc that includes the Pathway Tools software provides more speed and power than the BioCyc Web site.
Facebook
TwitterDiabetesOmic is a multi-omics database designed to collect and analyze transcriptional regulatory information across five high-throughput sequencing modalities, including ChIP-seq, RNA-seq, ATAC-seq, scATAC-seq, and scRNA-seq. This database's identifiers each represent a single sample. It contains clinical complication annotations including diabetic nephropathy, retinopathy, and atherosclerosis to enhance translational relevance. It enables the identification of disease-associated regulatory elements, epigenetic modifications, and cell type-specific molecular signatures, providing valuable insights into the molecular mechanisms of diabetes and its complications.
Facebook
TwitterCLIM4OMICS Analytics and Database is Improved database of G2F data repository that contains OMICs (genetic and phenotypic) and environmental data for maize yield predictability across 84 experimental fields in the U.S. and province of ON in Canada between 2014-2021. The goal of this pipeline is to aggregate, improve, and synthesize multi-dimensional G2F data including Geno-type, Phenotype and Environmental data for GxE modeling. This dataset contains 79,122 phenotype measurements, 378 genotypes of maize lines, environmental data of 178 locations and Python Scripts for Quality control (QC), Consistency control (CC) steps and ML models for GxE interactions. The Environmental data is extracted from NWS, DayMet and NSRDB databases and processed for QC and CC. The environmental dataset contains the minimum temperature (Tmin), average temperature (Tmean), maximum temperature (Tmax), minimum dew point (DPmin), average dew point (DPmean), maximum dew point (DPmax), minimum relative humidity (RHmin), average relative humidity (RHmean), maximum relative humidity (RHmax), minimum solar radiation (SRmin), average solar radiation (SRmean), maximum solar radiation (SRmax), accumulative rainfall (Racc), average wind speed (WSmean), and average wind direction (WDmean). This package also contains the raw G2F data and preprocessing pipeline.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PFam Domains and biological process GO categories for the four rhizobia strains. Predicted proteins related to multiple GO biological process categories are joined together with the pipe character. (XLSX 639Â kb)