40 datasets found
  1. Data from: Literature consistency of bioinformatics sequence databases is...

    • zenodo.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Reda Bouadjenek; Mohamed Reda Bouadjenek; Karin Verspoor; Karin Verspoor; Justin Zobel; Justin Zobel (2020). Literature consistency of bioinformatics sequence databases is effective for assessing record quality [Dataset]. http://doi.org/10.5281/zenodo.1238858
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohamed Reda Bouadjenek; Mohamed Reda Bouadjenek; Karin Verspoor; Karin Verspoor; Justin Zobel; Justin Zobel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using principal component analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that one record out of four is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.

  2. r

    Bioinformatics Links Directory

    • rrid.site
    • scicrunch.org
    • +3more
    Updated Oct 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Bioinformatics Links Directory [Dataset]. http://identifiers.org/RRID:SCR_008018/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Oct 21, 2025
    Description

    Database of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.

  3. f

    Data_Sheet_1_The COMPARE Database: A Public Resource for Allergen...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronald van Ree; Dexter Sapiter Ballerda; M. Cecilia Berin; Laurent Beuf; Alexander Chang; Gabriele Gadermaier; Paul A. Guevera; Karin Hoffmann-Sommergruber; Emir Islamovic; Liisa Koski; John Kough; Gregory S. Ladics; Scott McClain; Kyle A. McKillop; Shermaine Mitchell-Ryan; Clare A. Narrod; Lucilia Pereira Mouriès; Syril Pettit; Lars K. Poulsen; Andre Silvanovich; Ping Song; Suzanne S. Teuber; Christal Bowman (2023). Data_Sheet_1_The COMPARE Database: A Public Resource for Allergen Identification, Adapted for Continuous Improvement.pdf [Dataset]. http://doi.org/10.3389/falgy.2021.700533.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Ronald van Ree; Dexter Sapiter Ballerda; M. Cecilia Berin; Laurent Beuf; Alexander Chang; Gabriele Gadermaier; Paul A. Guevera; Karin Hoffmann-Sommergruber; Emir Islamovic; Liisa Koski; John Kough; Gregory S. Ladics; Scott McClain; Kyle A. McKillop; Shermaine Mitchell-Ryan; Clare A. Narrod; Lucilia Pereira Mouriès; Syril Pettit; Lars K. Poulsen; Andre Silvanovich; Ping Song; Suzanne S. Teuber; Christal Bowman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Motivation: The availability of databases identifying allergenic proteins via a transparent and consensus-based scientific approach is of prime importance to support the safety review of genetically-modified foods and feeds, and public safety in general. Over recent years, screening for potential new allergens sequences has become more complex due to the exponential increase of genomic sequence information. To address these challenges, an international collaborative scientific group coordinated by the Health and Environmental Sciences Institute (HESI), was tasked to develop a contemporary, adaptable, high-throughput process to build the COMprehensive Protein Allergen REsource (COMPARE) database, a publicly accessible allergen sequence data resource along with bioinformatics analytical tools following guidelines of FAO/WHO and CODEX Alimentarius Commission.Results: The COMPARE process is novel in that it involves the identification of candidate sequences via automated keyword-based sorting algorithm and manual curation of the annotated sequence entries retrieved from public protein sequence databases on a yearly basis; its process is meant for continuous improvement, with updates being transparently documented with each version; as a complementary approach, a yearly key-word based search of literature databases is added to identify new allergen sequences that were not (yet) submitted to protein databases; in addition, comments from the independent peer-review panel are posted on the website to increase transparency of decision making; finally, sequence comparison capabilities associated with the COMPARE database was developed to evaluate the potential allergenicity of proteins, based on internationally recognized guidelines, FAO/WHO and CODEX Alimentarius Commission

  4. Content of the Bioinformatics for Dentistry, with its respective primary...

    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin (2024). Content of the Bioinformatics for Dentistry, with its respective primary sources. [Dataset]. http://doi.org/10.1371/journal.pone.0303628.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Content of the Bioinformatics for Dentistry, with its respective primary sources.

  5. r

    BioCreative

    • rrid.site
    Updated May 2, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2006). BioCreative [Dataset]. http://identifiers.org/RRID:SCR_006311
    Explore at:
    Dataset updated
    May 2, 2006
    Description

    Community-wide effort (Challenge) for evaluating text mining and information extraction systems applied to the biological domain. It is focused on the comparison of methods and the community assessment of scientific progress, rather than on the purely competitive aspects. There is a considerable difficulty in constructing suitable gold standard data for training and testing new information extraction systems which handle life science literature. Thus the data sets derived from the BioCreAtIvE challenge - because they have been examined by biological database curators and domain experts - serve as useful resources for the development of new applications as well as helping to improve existing ones. Two main issues are addressed at BioCreAtIvE, both concerned with the extraction of biologically relevant and useful information from the literature. The first one is concerned with the detection of biologically significant entities (names) such as gene and protein names and their association to existing database entries. The second one is concerned with the detection of entity-fact associations (e.g. protein - functional term associations ).

  6. d

    ODB - Operon database

    • dknet.org
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ODB - Operon database [Dataset]. http://identifiers.org/RRID:SCR_007827
    Explore at:
    Dataset updated
    Sep 10, 2024
    Description

    ODB (Operon DataBase) aims to collect known operons in multiple species and to offer a system to predict operons by user definitions. All the known operons are derived from the literature and from publicly available database including operon information. This system provides candidates of operons based on the conditions that users choice and also provide its prediction accuracy. This database integrates both known literature-based operons and as well as operon prediction, to provide a useful system for bioinformatics researchers and experimental biologists.

  7. DrugSig

    • figshare.com
    txt
    Updated May 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongyu Wu (2017). DrugSig [Dataset]. http://doi.org/10.6084/m9.figshare.4893320.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 5, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Hongyu Wu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We collected drug response microarray data and annotated related drug and targets information from public databases and scientific literature. By selecting top 500 up-regulated and down-regulated genes as drug signatures, we manually established the DrugSig database. Currently DrugSig contains more than 1300 drugs, 7000 microarray and 800 targets. Database URL: http://biotechlab.fudan.edu.cn/database/drugsig/.

  8. Data from: Saccharomyces genome database informs human biology

    • ckan.grassroots.tools
    pdf
    Updated Aug 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Bioinformatics Institute (2019). Saccharomyces genome database informs human biology [Dataset]. https://ckan.grassroots.tools/ar/dataset/caad40fb-3205-466e-b09f-731e61510c8c
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 7, 2019
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD.

  9. n

    European Nucleotide Archive (ENA)

    • neuinfo.org
    • rrid.site
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515
    Explore at:
    Description

    Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

  10. Data_Sheet_1_Embracing the Dark Side: Computational Approaches to Unveil the...

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Terezinha Souza; Panuwat Trairatphisan; Janet Piñero; Laura I. Furlong; Julio Saez-Rodriguez; Jos Kleinjans; Danyel Jennen (2023). Data_Sheet_1_Embracing the Dark Side: Computational Approaches to Unveil the Functionality of Genes Lacking Biological Annotation in Drug-Induced Liver Injury.PDF [Dataset]. http://doi.org/10.3389/fgene.2018.00527.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Terezinha Souza; Panuwat Trairatphisan; Janet Piñero; Laura I. Furlong; Julio Saez-Rodriguez; Jos Kleinjans; Danyel Jennen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In toxicogenomics, functional annotation is an important step to gain additional insights into genes with aberrant expression that drive pathophysiological mechanisms. Nevertheless, there exists a gap on annotation of these genes which often hampers the interpretation of results and limits their applicability in translational medicine. In this study, we evaluated the coverage of functional annotations of differentially expressed genes (DEGs) induced by 10 selected compounds from the TG-GATEs database identified as high- or no-risk in causing drug-induced liver injury (most-DILI or no-DILI, respectively) using in vitro human data. Functional roles of DEGs not present in the most common biological annotation databases – termed “dark genes” – were unveiled via literature mining and via the identification of shared regulatory transcription factors or signaling pathways. Our results demonstrated that there were approximately 13% of dark genes induced by these compounds in vitro and we were able to obtain additional relevant information for up to 76% of those. Using interactome data from several sources, we have uncovered genes such as LRBA, and WDR26 as highly connected in the protein network that play roles in drug response. Genes such as MALAT1, H19, and MIR29C – whose links to hepatotoxicity have been confirmed – were identified as markers for the most-DILI group and appeared as top hits across all literature-based mining methods. Furthermore, we investigated the potential impact of dark genes on liver toxicity by identifying their rat orthologs in combination with their correlation to drug-induced liver pathologies observed in vivo following chemical exposure. We identified a set of important regulatory transcription factors of dark genes for all most-DILI compounds including E2F1 and JUND with supporting evidences in literature and we found Magee1 correlated with chemically induced bile duct hyperplasia and adverse responses at 29 days in rats in vivo. In conclusion, in this study we show the potential role of these poorly annotated genes in mechanisms underlying hepatotoxicity and offer a number of computational approaches that may help to minimize current gaps in gene annotation and highlight their values as potential biomarkers in toxicological studies.

  11. D

    Knowledge Discovery in Biological Databases for Revealing Candidate Genes...

    • ckan.grassroots.tools
    html, pdf
    Updated Aug 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rothamsted Research (2019). Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes [Dataset]. https://ckan.grassroots.tools/bg/dataset/bf47bbcd-d26b-40a1-a86b-144f37570967
    Explore at:
    pdf, htmlAvailable download formats
    Dataset updated
    Aug 7, 2019
    Dataset provided by
    Rothamsted Research
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Description

    jats:titleAbstract/jats:titlejats:pGenetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future./jats:p

  12. r

    Arabidopsis Hormone Database

    • rrid.site
    • neuinfo.org
    • +1more
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Arabidopsis Hormone Database [Dataset]. http://identifiers.org/RRID:SCR_001792
    Explore at:
    Dataset updated
    May 30, 2013
    Description

    Database providing a systematic and comprehensive view of morphological phenotypes regulated by plant hormones, as well as regulatory genes participating in numerous plant hormone responses. By integrating the data from mutant studies, transgenic analysis and gene ontology annotation, genes related to the stimulus of eight plant hormones were identified, including abscisic acid, auxin, brassinosteroid, cytokinin, ethylene, gibberellin, jasmonic acid and salicylic acid. Another pronounced characteristics of this database is that a phenotype ontology was developed to precisely describe all kinds of morphological processes regulated by plant hormones with standardized vocabularies. To increase the coverage of phytohormone related genes, the database has been updated from AHD to AHD2.0 adding and integrating several pronounced features: (1) added 291 newly published Arabidopsis hormone related genes as well as corrected information (e.g. the arguable ABA receptors) based on the recent 2-year literature; (2) integrated orthologues of sequenced plants in OrthoMCLDB into each gene in the database; (3) integrated predicted miRNA splicing site in each gene in the database; (4) provided genetic relationship of these phytohormone related genes mining from literature, which represents the first effort to construct a relatively comprehensive and complex network of hormone related genes as shown in the home page of our database; (5) In convenience to in-time bioinformatics analysis, they also provided links to a powerful online analysis platform Weblab that they have recently developed, which will allow users to readily perform various sequence analysis with these phytohormone related genes retrieved from AHD2.0; (6) provided links to other protein databases as well as more expression profiling information that would facilitate users for a more systematic analysis related to phytohormone research. Please help to improve the database with your contributions.

  13. S

    Crop trait regulating-genes knowledge graph dataset

    • scidb.cn
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhang dan dan (2025). Crop trait regulating-genes knowledge graph dataset [Dataset]. http://doi.org/10.57760/sciencedb.agriculture.00175
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Science Data Bank
    Authors
    zhang dan dan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    In the scientific research of crop breeding, breeding new crop varieties with various excellent traits has always been the direction of efforts of breeders. At present, with the accelerated application of information technology in the field of crop breeding, the multi-dimensional scientific data related to crop breeding has shown exponential growth. These semi-structured and structured scientific data are distributed in scientific databases in different fields and lack the association and fusion of multi-dimensional scientific data across species. It hindered the transfer and reuse of existing crop breeding knowledge and maximized the value of crop breeding scientific data, which brought challenges to the knowledge discovery of crop trait regulation genes. Therefore, more and more crop breeding research work is based on the reorganization, correlation, analysis and utilization of existing breeding scientific data, so as to achieve the discovery of crop trait regulation gene knowledge.The dataset of knowledge map of crop trait regulatory genes was selected from PubMed literature database, Phytozome (genomic information of 4 species) and Ensembl (European Molecular Biology Laboratory's European) Bioinformatics Institute (Bioinformatics Institute) plants (Genome information of 4 species), UniProt (Universal Protein) (protein Annotation information of 4 species), Rice Genome Annotation (RGAP) Project), STRING (protein interaction information for 4 species), Pfam (Protein family analysis and modeling) (protein family information for 4 species), KEGG (Kyoto Encyclopedia of Genes) The entities and relationships of the multi-source scientific data with different data formats were extracted using the and Genomes (pathway annotation information of the 4 species) and the GO (Gene Ontology) domain scientific database as the data sources. It mainly includes mapping knowledge extraction for structured data. For XML semi-structured data, knowledge extraction based on Kettle data analysis is adopted. For FASTA semi-structured data, knowledge extraction based on BLAST model is adopted. For Text unstructured data, knowledge extraction based on large language model is adopted. On the basis of the above entity and relationship extraction, the association fusion of multi-source crop breeding knowledge was realized based on entity mapping and specific attribute association. Finally, the crop trait regulatory gene knowledge map dataset was formed, which consisted of 13 entity datasets and 16 entity relationship datasets.The crop trait -egulating gene knowledge graph dataset provides a key semantic model and important data basis for crop breeding knowledge discovery, such as excellent pleiotropic gene discovery, cross-species gene function prediction and potential discovery of pathway gene network.

  14. Z

    Neuro.OralCard Database

    • data.niaid.nih.gov
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martins, Jorge Emanuel; Simões, Joana (2023). Neuro.OralCard Database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10102337
    Explore at:
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    University of Lisbon, Faculty of Medicine
    Authors
    Martins, Jorge Emanuel; Simões, Joana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    INTRODUCTION: This study focuses (Martins, 2021) on a database literature review (Sousa, Martins et al., 2016) of the neuronal proteome found both in neurotypical subjects and in the ICD-11 chapters 'Diseases of the Nervous System' and "Mental and Behavior Disorders", the Neuro.OralCard. This database offers an update of the OralCard (Arrais et al., 2013), revising 2416 neuronal proteins identified in neuroproteomics, from which 765 are present, detected, and identifiable in the saliva proteome. The saliva proteome database, published as the bioinformatics tool OralCard (Arrais et al., 2013), was obtained from SalivaTec, CIS, UCP (Rosa et al., 2012). This database, as a set of UniprotKB codes, was cross-mapped to the set of UniprotKB obtaining the Neuro.OralCard, the neuronal proteome database. METHODOLOGY: A systematic bibliographic search was performed using the search tool at PubMed's portal, complemented by Google Scholar searching utilities (Falagas et al., 2008). The following keywords were used as filters: "MeSH (Medical Subject Headings)", "neurological diseases", "psychiatric diseases", and "salivary diagnosis." In order to perform an ordered review based on the publications on molecular biology and the nervous system, additional keyword correlations and associations were searched. Only the publications from 2000 to 2021 were selected. Likewise, in order to perform an ordered review based on the correlation between the salivary proteins/nervous system proteins and physiological and pathologic conditions of neuropsychiatric conditions, the following search filters were also applied: "Humans" and "Clinical trial" in order to include, exclusively, articles of an experimental nature concerning the species Homo sapiens. Based on the extensive analysis (Neuro.OralCard References (2020). https://tinyurl.com/Neuro-OralCardReferences) of the 79 filtered and compiled articles, a survey of a representative sample of neuronal and salivary proteins identified in subjects with (i) neuropsychiatric conditions and (ii) healthy mental functioning was carried out. With the results obtained from this analysis, it was possible to create a neuronal proteome database named Neuro.OralCard. The full database has not yet been deposited in the oral proteome databank of the OralCard (Arrais, 2013). This specific analysis enables the updating of the OralCard, which is allusive to the human Oralia in different pathologic states. Meanwhile, the full neuroproteome (UniprotKB) was annotated in a non-restricted access server in the LIMMIT Lab, Faculty of Medicine, University of Lisbon, and accessible via: https://www.limmit.org/uploads/2/6/8/4/26841837/neuro.oralcard.xlsx (retrieved August 01, 2023). RESULTS: The final neuronal database, named Neuro.OralCard, revised not only neuronal produced but also peripheral proteins. These neuronal proteins ultimately aim to represent not only epigenome, transcriptome, proteome, and metabolome analysis but also the main findings of functional cellular assays. The results of this neurobiological approach (Martins et al., 2023) imply that alterations in neurotransmission, hormonal regulation, metabolism, the cell cycle, and the immune system may be partially responsible for neuronal and mental pathophysiology. The Neuro.OralCard database comprises 2416 proteins concerning ICD-10 neuropsychiatric conditions and mental health functioning. CONCLUSION: The Neuro.OralCard database aims to be used as a molecular data reference and support for biomarker discovery in neuropsychiatric illnesses and nonpathologic conditions of the nervous system.

  15. n

    ODB - Operon database

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Oct 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). ODB - Operon database [Dataset]. http://identifiers.org/RRID:SCR_007827/resolver/mentions?q=&i=rrid
    Explore at:
    Dataset updated
    Oct 17, 2019
    Description

    ODB (Operon DataBase) aims to collect known operons in multiple species and to offer a system to predict operons by user definitions. All the known operons are derived from the literature and from publicly available database including operon information. This system provides candidates of operons based on the conditions that users choice and also provide its prediction accuracy. This database integrates both known literature-based operons and as well as operon prediction, to provide a useful system for bioinformatics researchers and experimental biologists.

  16. Top-ranked candidate genes.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Accetturo; Teresa M. Creanza; Claudia Santoro; Giancarlo Tria; Antonio Giordano; Simone Battagliero; Antonella Vaccina; Gaetano Scioscia; Pietro Leo (2023). Top-ranked candidate genes. [Dataset]. http://doi.org/10.1371/journal.pone.0012742.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Matteo Accetturo; Teresa M. Creanza; Claudia Santoro; Giancarlo Tria; Antonio Giordano; Simone Battagliero; Antonella Vaccina; Gaetano Scioscia; Pietro Leo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene expression information are taken fromaNCBI Unigene [98],bUniProtKB [99],cHPRD database [100],dMorton Cochlear EST database [101], NCBI GEO [102],ethe table of gene expression in the developing ear from the Institute of Hearing Research [103],gBgee dataBase for Gene Expression Evolution [104] and literature. Gene function information have been inferred fromfNCBI Gene [39] and literature.

  17. Toxins and Toxin Target Database

    • kaggle.com
    zip
    Updated Dec 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Eltom (2022). Toxins and Toxin Target Database [Dataset]. https://www.kaggle.com/ahmedeltom/toxins-and-toxin-target-database
    Explore at:
    zip(8476298 bytes)Available download formats
    Dataset updated
    Dec 10, 2022
    Authors
    Ahmed Eltom
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Toxin and Toxin Target Database (T3DB), or, soon to be referred as, the Toxic Exposome Database, is a unique bioinformatics resource that combines detailed toxin data with comprehensive toxin target information. The database currently houses 3,678 toxins described by 41,602 synonyms, including pollutants, pesticides, drugs, and food toxins, which are linked to 2,073 corresponding toxin target records. Altogether there are 42,374 toxin, toxin target associations. Each toxin record (ToxCard) contains over 90 data fields and holds information such as chemical properties and descriptors, toxicity values, molecular and cellular interactions, and medical information. This information has been extracted from over 18,143 sources, which include other databases, government documents, books, and scientific literature.

    Source Link T3DB

    The focus of the T3DB is on providing mechanisms of toxicity and target proteins for each toxin. This dual nature of the T3DB, in which toxin and toxin target records are interactively linked in both directions, makes it unique from existing databases. It is also fully searchable and supports extensive text, sequence, chemical structure, and relational query searches. It is both modelled after and closely linked to the Human Metabolome Database (HMDB) and DrugBank. Potential applications of T3DB include toxin metabolism prediction, toxin/drug interaction prediction, and general toxin hazard awareness by the public, making it applicable to various fields. Overall, the variety and accessibility of the T3DB make it a valuable resource for both the casual user and the advanced researcher.

  18. u

    Data from: Meta-analysis reveals challenges and gaps for genome-to-phenome...

    • verso.uidaho.edu
    • data.niaid.nih.gov
    Updated Apr 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Melton; Stephanie Galla; Carlos Dumaguit; John Wojahn; Stephen Novak; Marcelo Serpe; Peggy Martinez; Sven Buerki (2023). Meta-analysis reveals challenges and gaps for genome-to-phenome research underpinning plant drought response [Dataset]. https://verso.uidaho.edu/esploro/outputs/dataset/Meta-analysis-reveals-challenges-and-gaps-for/996765630801851
    Explore at:
    Dataset updated
    Apr 26, 2023
    Dataset provided by
    Boise State University, Idaho EPSCoR, EPSCoR GEM3
    Authors
    Anthony Melton; Stephanie Galla; Carlos Dumaguit; John Wojahn; Stephen Novak; Marcelo Serpe; Peggy Martinez; Sven Buerki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 26, 2023
    Description

    Data used to identify species occurring in hyperarid environments for analyses described in "Meta-analysis reveals challenges and gaps for genome-to-phenome research underpinning plant drought response." The "PlantsLackingHumanUse_PrelimQCd_Data.csv" contains data for all plants queried, while "HyperArid_Occurrences.csv" contains the subset of data corresponding to plants occurring in hyperarid environments.

    Associated Manuscript Abstract
    Severe drought conditions and extreme weather events are increasing worldwide with climate change, threatening the persistence of native plant communities and ecosystems. Many studies have investigated the genomic basis of plant responses to drought. However, the extent of this research throughout the plant kingdom is unclear, particularly among species critical for the sustainability of natural ecosystems. This study aimed to broaden our understanding of genome-to-phenome (G2P) connections in drought-stressed plants and identify focal taxa for future research. Bioinformatics pipelines were developed to mine and link information from databases and abstracts from 7730 publications. This approach identified 1634 genes involved in drought responses among 497 plant taxa. Most (83.30%) of these species have been classified for human use, and most G2P interactions have been described within model organisms or crop species. Our analysis identifies several gaps in G2P research literature and database connectivity, with 21% of abstracts being linked to gene and taxonomy data in NCBI. Abstract text mining was more successful at identifying potential G2P pathways, with 34% of abstracts containing gene, taxa, and phenotype information. Expanding G2P studies to include non-model plants, especially those that are adapted to drought stress, will help advance our understanding of drought responsive G2P pathways.

    Data Use
    License
    Creative Commons Attribution 4.0 International (CC-BY 4.0)
    Recommended Citation
    Melton AE, Galla SJ, Dumaguit CDC, Wojahn JMA, Novak S, Serpe M, Martinez P, Buerki S. 2022. Meta-analysis reveals challenges and gaps for genome-to-phenome research underpinning plant drought response (Version 1.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.7153278

    Funding
    US National Science Foundation and Idaho EPSCoR: OIA-1757324

  19. d

    Comparison of adult intact naive C57Bl/6 Nogo-A KO versus WT spinal cord

    • datamed.org
    Updated Nov 1, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). Comparison of adult intact naive C57Bl/6 Nogo-A KO versus WT spinal cord [Dataset]. https://datamed.org/display-item.php?repository=0008&idName=ID&id=5914e0aa5152c67771b3b6f7
    Explore at:
    Dataset updated
    Nov 1, 2010
    Description

    Nogo-A localized on myelin adaxonal membrane in the adult CNS is well known for its role as neurite outgrowth inhibitor following a lesion. Nogo-A KO mice show enhanced regenerative/compensatory fiber growth following CNS lesion. However, changes undergoing in their intact CNS have not been studied. Moreover, Nogo-A in the intact adult CNS in also expressed in some neuronal subpopulations, e.g. in the hippocampus, olfactory bulbs and dorsal root ganglia. We compared the intact adult CNS (spinal cord) of Nogo-A KO mice in order to identify: potential compensating molecules which could be interesting new inhibitory neurite outgrowth candidates, possible molecules involved in the up to now not yet clarified downstream signalling pathway of Nogo-A, additional new functions for myelin or neuronal Nogo-A in the intact adult CNS. Keywords: gene expression, Nogo-A KO, spinal cord, adult, naive, unlesioned Overall design: Spinal cords from 3 adult C57Bl/6 wild type and Nogo-A KO mice have been explanted. Total RNA has been extracted and processed for hybridization on Mouse 430 2.0 Affymetrix GeneChips. Following scanning and first analysis with MAS 5.0, further analysis was performed by GeneSpring 7.2 (Silicon Genetics, Redwood City, CA). A present call filter (2 out of 3 present calls in at least one out of the different studied conditions) was applied. Normalization was run per chip as well as per gene to the median of the control replicates. Data were statistical restricted through a 1-way Anova (p=0.05). A final threshold of =1.2 folds of increase or decrease in the expression level of each single transcript was applied. Regulated transcripts have been assigned to functional categories according to GeneOntology as well as literature and database mining (Pubmed and Bioinformatics Harvester EMBL Heidelberg).

  20. f

    DataSheet_1_TCMIO: A Comprehensive Database of Traditional Chinese Medicine...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Apr 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wu, Qihui; Cai, Chuipu; Cui, Lu; Fang, Jiansong; Fan, Xiude; Du, Jiewen; Liu, Bingdong; Xie, Liwei; Liu, Zhihong (2020). DataSheet_1_TCMIO: A Comprehensive Database of Traditional Chinese Medicine on Immuno-Oncology.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000539255
    Explore at:
    Dataset updated
    Apr 15, 2020
    Authors
    Wu, Qihui; Cai, Chuipu; Cui, Lu; Fang, Jiansong; Fan, Xiude; Du, Jiewen; Liu, Bingdong; Xie, Liwei; Liu, Zhihong
    Description

    Advances in immuno-oncology (IO) are making immunotherapy a powerful tool for cancer treatment. With the discovery of an increasing number of IO targets, many herbs or ingredients from traditional Chinese medicine (TCM) have shown immunomodulatory function and antitumor effects via targeting the immune system. However, knowledge of underlying mechanisms is limited due to the complexity of TCM, which has multiple ingredients acting on multiple targets. To address this issue, we present TCMIO, a comprehensive database of Traditional Chinese Medicine on Immuno-Oncology, which can be used to explore the molecular mechanisms of TCM in modulating the cancer immune microenvironment. Over 120,000 small molecules against 400 IO targets were extracted from public databases and the literature. These ligands were further mapped to the chemical ingredients of TCM to identify herbs that interact with the IO targets. Furthermore, we applied a network inference-based approach to identify the potential IO targets of natural products in TCM. All of these data, along with cheminformatics and bioinformatics tools, were integrated into the publicly accessible database. Chemical structure mining tools are provided to explore the chemical ingredients and ligands against IO targets. Herb–ingredient–target networks can be generated online, and pathway enrichment analysis for TCM or prescription is available. This database is functional for chemical ingredient structure mining and network analysis for TCM. We believe that this database provides a comprehensive resource for further research on the exploration of the mechanisms of TCM in cancer immunity and TCM-inspired identification of novel drug leads for cancer immunotherapy. TCMIO can be publicly accessed at http://tcmio.xielab.net.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamed Reda Bouadjenek; Mohamed Reda Bouadjenek; Karin Verspoor; Karin Verspoor; Justin Zobel; Justin Zobel (2020). Literature consistency of bioinformatics sequence databases is effective for assessing record quality [Dataset]. http://doi.org/10.5281/zenodo.1238858
Organization logo

Data from: Literature consistency of bioinformatics sequence databases is effective for assessing record quality

Related Article
Explore at:
application/gzipAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mohamed Reda Bouadjenek; Mohamed Reda Bouadjenek; Karin Verspoor; Karin Verspoor; Justin Zobel; Justin Zobel
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using principal component analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that one record out of four is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.

Search
Clear search
Close search
Google apps
Main menu