52 datasets found
  1. Semantic Similarity Score Calculation and Reproducibility

    • figshare.com
    txt
    Updated May 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaston Mazandu; Kenneth B. Opap; Funmilayo Makinde; Victoria Nembaware; Francis Agamah; Christian Bope; Emile R. Chimusa; Ambroise Wonkam; Nicola Mulder (2021). Semantic Similarity Score Calculation and Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.14599992.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 14, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Gaston Mazandu; Kenneth B. Opap; Funmilayo Makinde; Victoria Nembaware; Francis Agamah; Christian Bope; Emile R. Chimusa; Ambroise Wonkam; Nicola Mulder
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Building the annotation file, consisting of protein (entity)-gene ontology process map extracted from the GOA UniProt dataset at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz. This protein-process map file is used to generate protein pairs used for testing the PySML library. Semantic similarity scores produced are also included.

  2. d

    Data from: The new bioinformatics: integrating ecological data from the gene...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman (2025). The new bioinformatics: integrating ecological data from the gene to the biosphere [Dataset]. http://doi.org/10.5061/dryad.qb0d6
    Explore at:
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman
    Time period covered
    Jan 1, 2012
    Description

    Bioinformatics, the application of computational tools to the management and analysis of biological data, has stimulated rapid research advances in genomics through the development of data archives such as GenBank, and similar progress is just beginning within ecology. One reason for the belated adoption of informatics approaches in ecology is the breadth of ecologically pertinent data (from genes to the biosphere) and its highly heterogeneous nature. The variety of formats, logical structures, and sampling methods in ecology create significant challenges. Cultural barriers further impede progress, especially for the creation and adoption of data standards. Here we describe informatics frameworks for ecology, from subject-specific data warehouses, to generic data collections that use detailed metadata descriptions and formal ontologies to catalog and cross-reference information. Combining these approaches with automated data integration techniques and scientific workflow systems will ma...

  3. Extracted Schemas from the Life Sciences Linked Open Data Cloud

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maulik Kamdar (2023). Extracted Schemas from the Life Sciences Linked Open Data Cloud [Dataset]. http://doi.org/10.6084/m9.figshare.12402425.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Maulik Kamdar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.

  4. m

    The Molecular Entities in Linked Data Dataset

    • data.mendeley.com
    Updated Apr 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominik Tomaszuk (2020). The Molecular Entities in Linked Data Dataset [Dataset]. http://doi.org/10.17632/fp4phyrbkz.1
    Explore at:
    Dataset updated
    Apr 4, 2020
    Authors
    Dominik Tomaszuk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Molecular Entities in Linked Data (MEiLD) dataset comprises data of distinct atoms, molecules, ions, ion pairs, radicals, radical ions, and others that can be identifiable as separately distinguishable chemical entities. The dataset is provided in a JSON-LD format and was generated by the SDFEater, a tool that allows parsing atoms, bonds, and other molecule data. MEiLD contains 349,960 of ‘small’ chemical entities.

  5. r

    G6GFINDR

    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). G6GFINDR [Dataset]. http://identifiers.org/RRID:SCR_015821
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Query-based web application that helps users find bioinformatics and artificial intelligence (AI) software. G6GFINDR is powered by "semantic annotation" vs. keyword search, which take advantage of semantic web graph technology.

  6. Data from: An Ontology-Based System for Querying Life in a Post-Taxonomic...

    • figshare.com
    pdf
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nico Cellinese; Hilmar Lapp (2016). An Ontology-Based System for Querying Life in a Post-Taxonomic Age [Dataset]. http://doi.org/10.6084/m9.figshare.1401984.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Nico Cellinese; Hilmar Lapp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Grant proposal (project description and references cited) to the US National Science Foundation, Advances in Biological Informatics (ABI) program as Collaborative Research. Funded in 2015. Files include public abstract as submitted to NSF.

  7. RDF/Jena : an extension for XSLT/Xalan. Testing with NCBI gene and the...

    • figshare.com
    application/gzip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre Lindenbaum (2023). RDF/Jena : an extension for XSLT/Xalan. Testing with NCBI gene and the disease ontology. [Dataset]. http://doi.org/10.6084/m9.figshare.105167.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Pierre Lindenbaum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current code contains an extension for the XSLT processor apache XALAN : it allows to search and inject some RDF statements during a XSLT transformation. As an example, the makefile transforms a NCBI-gene record to HTML and annotate it with the disease-ontology .

  8. f

    Additional file 2: of NeuroRDF: semantic integration of highly curated data...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kawalia, Shweta; Raschka, Tamara; Senger, Philipp; Iyappan, Anandhi; Hofmann-Apitius, Martin (2016). Additional file 2: of NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001526461
    Explore at:
    Dataset updated
    Dec 14, 2016
    Authors
    Kawalia, Shweta; Raschka, Tamara; Senger, Philipp; Iyappan, Anandhi; Hofmann-Apitius, Martin
    Description

    The developed RDF models and the SPARQL queries used are made available at: http://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/neurordf.html . (ZIP 178 kb)

  9. h

    umnsrs

    • huggingface.co
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Biomedical Datasets (2023). umnsrs [Dataset]. https://huggingface.co/datasets/bigbio/umnsrs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2023
    Dataset authored and provided by
    BigScience Biomedical Datasets
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    UMNSRS, developed by Pakhomov, et al., consists of 725 clinical term pairs whose semantic similarity and relatedness. The similarity and relatedness of each term pair was annotated based on a continuous scale by having the resident touch a bar on a touch sensitive computer screen to indicate the degree of similarity or relatedness. The following subsets are available: - similarity: A set of 566 UMLS concept pairs manually rated for semantic similarity (e.g. whale-dolphin) using a continuous response scale. - relatedness: A set of 588 UMLS concept pairs manually rated for semantic relatedness (e.g. needle-thread) using a continuous response scale. - similarity_mod: Modification of the UMNSRS-Similarity dataset to exclude control samples and those pairs that did not match text in clinical, biomedical and general English corpora. Exact modifications are detailed in the paper (Corpus Domain Effects on Distributional Semantic Modeling of Medical Terms. Serguei V.S. Pakhomov, Greg Finley, Reed McEwan, Yan Wang, and Genevieve B. Melton. Bioinformatics. 2016; 32(23):3635-3644). The resulting dataset contains 449 pairs. - relatedness_mod: Modification of the UMNSRS-Relatedness dataset to exclude control samples and those pairs that did not match text in clinical, biomedical and general English corpora. Exact modifications are detailed in the paper (Corpus Domain Effects on Distributional Semantic Modeling of Medical Terms. Serguei V.S. Pakhomov, Greg Finley, Reed McEwan, Yan Wang, and Genevieve B. Melton. Bioinformatics. 2016; 32(23):3635-3644). The resulting dataset contains 458 pairs.

  10. b

    WormBase

    • bioregistry.io
    • integbio.jp
    Updated Apr 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). WormBase [Dataset]. http://identifiers.org/re3data:r3d100010424
    Explore at:
    Dataset updated
    Apr 27, 2021
    License

    https://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0

    Description

    WormBase is an online bioinformatics database of the biology and genome of the model organism Caenorhabditis elegans and other nematodes. It is used by the C. elegans research community both as an information resource and as a mode to publish and distribute their results. This collection references WormBase-accessioned entities.

  11. GO term (Biological Process) similarity

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated Mar 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vafaee Lab (2020). GO term (Biological Process) similarity [Dataset]. http://doi.org/10.6084/m9.figshare.11955177.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Vafaee Lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pairwise GO term sets (Biological Process) similarities induced by drug-pairs and their PPI partners (degree =2) among all small-molecule drugs modeled by semantic similarity.

  12. Human Disease Ontology 2018 update: classification, content and workflow...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quentin St.Charles; Quentin St.Charles (2023). Human Disease Ontology 2018 update: classification, content and workflow expansion [Dataset]. http://doi.org/10.1093/nar/gky1032
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Quentin St.Charles; Quentin St.Charles
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    ABSTRACT:

    The Human Disease Ontology (DO) (http://www.disease-ontology.org), database has undergone significant expansion in the past three years. The DO disease classification includes specific formal semantic rules to express meaningful disease models and has expanded from a single asserted classification to include multiple-inferred mechanistic disease classifications, thus providing novel perspectives on related diseases. Expansion of disease terms, alternative anatomy, cell type and genetic disease classifications and workflow automation highlight the updates for the DO since 2015. The enhanced breadth and depth of the DO's knowledgebase has expanded the DO's utility for exploring the multi-etiology of human disease, thus improving the capture and communication of health-related data across biomedical databases, bioinformatics tools, genomic and cancer resources and demonstrated by a 6.6× growth in DO's user community since 2015. The DO's continual integration of human disease knowledge, evidenced by the more than 200 SVN/GitHub releases/revisions, since previously reported in our DO 2015 NAR paper, includes the addition of 2650 new disease terms, a 30% increase of textual definitions, and an expanding suite of disease classification hierarchies constructed through defined logical axioms.

    Instructions:

    Data was cleaned. Duplicates and unnecessary columns were removed. Title of columns were changed.

    Inspiration:

    This dataset uploaded to U-BRITE for "DRG_DEPOT" summer 2023 team project.

    Acknowledgements:

    Schriml, L. M., Mitraka, E., Munro, J., Tauber, B., Schor, M., Nickle, L., Felix, V., Jeng, L., Bearer, C., Lichenstein, R., Bisordi, K., Campion, N., Hyman, B., Kurland, D., Oates, C. P., Kibbey, S., Sreekumar, P., Le, C., Giglio, M., & Greene, C.

    Human Disease Ontology 2018 update: classification, content and workflow expansion

    Nucleic Acids Research 2019; 47(D1), D955–D962;PMID:30407550;DOI:https://doi.org/10.1093/nar/gky1032

    U-BRITE last update data: 06/28/2023

  13. b

    EDAM Ontology

    • bioregistry.io
    Updated Apr 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). EDAM Ontology [Dataset]. https://bioregistry.io/edam
    Explore at:
    Dataset updated
    Apr 24, 2021
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    EDAM is an ontology of general bioinformatics concepts, including topics, data types, formats, identifiers and operations. EDAM provides a controlled vocabulary for the description, in semantic terms, of things such as: web services (e.g. WSDL files), applications, tool collections and packages, work-benches and workflow software, databases and ontologies, XSD data schema and data objects, data syntax and file formats, web portals and pages, resource catalogues and documents (such as scientific publications).

  14. d

    Open PHACTS

    • dknet.org
    Updated Nov 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Open PHACTS [Dataset]. http://identifiers.org/RRID:SCR_005050
    Explore at:
    Dataset updated
    Nov 9, 2024
    Description

    Project that developed an open access discovery platform, called Open Pharmacological Space (OPS), via a semantic web approach, integrating pharmacological data from a variety of information resources and tools and services to question this integrated data to support pharmacological research. The project is based upon the assimilation of data already stored as triples, in the form subject-predicate-object. The software and data are available for download and local installation, under an open source and open access model. Tools and services are provided to query and visualize this data, and a sustainability plan will be in place, continuing the operation of the Open PHACTS Discovery Platform after the project funding ends. Throughout the project, a series of recommendations will be developed in conjunction with the community, building on open standards, to ensure wide applicability of the approaches used for integration of data.

  15. f

    Data from: Getting the best of Linked Data and Property Graphs: rdf2neo and...

    • swat4hcls.figshare.com
    png
    Updated Dec 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Brandizi; Ajit Singh; Christopher Rawlings; Keywan Hassani-Pak (2018). Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case [Dataset]. http://doi.org/10.6084/m9.figshare.7314323.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Dec 5, 2018
    Dataset provided by
    Semantic Web Applications and Tools for Healthcare and Life Sciences
    Authors
    Marco Brandizi; Ajit Singh; Christopher Rawlings; Keywan Hassani-Pak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Paper submitted to SWAT4LS 2018. We introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomics-related real use cases, we show how such mapping can allow for a hybrid approach to the management of networked knowledge, based on taking advantage of the best of both RDF and property graphs.

  16. LISC 2013 - Results: Discussion Groups on Semantic Web and Reproducibility

    • commons.datacite.org
    • figshare.com
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Groth; Peter Ansell; Kjetil Kjernsmo; Jacco Van Ossenbruggen; Guillermo Palma; Carol Goble; Cameron McLean; Richard Hosking; Steve Cassidy; Jun Zhao; Prashant Gupta; Niels Ockeloen; Graham Klyne (2016). LISC 2013 - Results: Discussion Groups on Semantic Web and Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.828798.v2
    Explore at:
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Figsharehttp://figshare.com/
    figshare
    Authors
    Paul Groth; Peter Ansell; Kjetil Kjernsmo; Jacco Van Ossenbruggen; Guillermo Palma; Carol Goble; Cameron McLean; Richard Hosking; Steve Cassidy; Jun Zhao; Prashant Gupta; Niels Ockeloen; Graham Klyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of discussion groups at the Linked Science Workshop 2013 held at the International Semantic Web Conference. (http://linkedscience.org/events/lisc2013/) Participants were asked to develop a matrices about how semantic web/linked data solutions can help address reproducbility/re* problems. The results are documented in the spreadsheets above and described in videos (to be posted) The participants also developed a set of challenges for the Linked Science and broader semantic web community to help address these re* problems. See below or (lisc2013-challenges.txt) Linked Science Community Challenges The Linked Science 2013 workshop discussion participants identified several challenges to the Linked Data/Semantic Web community in order to help reproducibility (and other re* problems i.e. repurposing, reuse, etc) in science. 1) Promote the basics of linked data for reproducibility Many basic linked data technologies (e.g. content negotiation or the use of dereferenceable URLs) could be usable for scientific reproducibility and reproducibility. The goal here would be to develop a set of how-to documents that guide e-scientists on how to use these technologies to support scientific re* problems. An important point would be to tie these solutions directly to domain scientist problems. 2) Integrate Semantic Web technologies and the publishing process. Publishing is central to the scientific process and the issues of reusing scientific work. Semantic Web technologies should be integrated into the publishing process to enable reuse. 3) Make it easier to publish data and then work with it than work directly on your own data. Publishing data should enable a scientist to do more. Can we make it so that publishing data is so useful to the scientist themselves that it would be their first option? 4) Provide an integrated view of the how, what, when, where, and why of the scientific process. Linked data technologies are designed for integration and aggregation. Can we use these technologies to provide an integrated view over all the questions one might have with respect to a scientific experiment? 5) Provide a mechanisms for dealing with copyright on data both from a technical and social perspective. Dealing with copyright is not always straightforward. Can we eliminate the barriers to reuse through helping scientists with these copyright issues in an automatic fashion. 6) Get an altmetric based award into one of our own venues. Part of supporting re* problems is promoting sharing. We should "eat are own dogfood" by promoting and rewarding sharing in the major semantic web venues. We suggest an award based on some sort of altmetric. 7) Make sure the EBI RDF platform does not get shut down in two years. The European Bioinformatics Institute has released RDF versions with SPARQL endpoints for many of their core data sets. They are making it available for two years and checking on whether it is used to determine if it continues in the long term. This is a key data resource for using Linked Data for reproducibility - let's make sure it keeps going.

  17. n

    Allen Institute Neurowiki

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Oct 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Allen Institute Neurowiki [Dataset]. http://identifiers.org/RRID:SCR_005042
    Explore at:
    Dataset updated
    Oct 26, 2019
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented September 6, 2016. The Allen Institute Neurowiki is a joint project between Vulcan Inc. and the Allen Institute to build a Semantic Wiki mapping genetic instances. It is a finished prototype testing the import pipelines and display componenets for combining 5 major RDF datasets from 4 different sources. Current planning includes mapping complete datasets, curating a better ontology, and creating multiple ontology management for a user class. Biological Linked Data Map: * Open, public online access * Data from multiple RDF data stores * Complete import pipeline using LDIF framework * Outlines of each imported instance embedding inline wiki properties and providing views of imported properties from original RDF datasets * Charting tools that ''''pivot'''' SPARQL queries providing several views of each query * Navigation and composition tools for accessing and mining the data Where did we get the data? * KEGG: Kyoto Encyclopedia of Genes and Genomes: KEGG GENES is a collection of gene catalogs for all complete genomes generated from publicly available resources, mostly NCBI RefSeq * Diseasome: The Diseasome website is a disease / disorder relationships explorer and a sample of an innovative map-oriented scientific work. Built by a team of researchers and engineers, it uses the Human Disease Network dataset. * DrugBank: The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. * Sider: Sider contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. Every piece of content on every instance page is generated by Semantic Result Formatters interpreting SPARQL results.

  18. Provenance RDF Models

    • figshare.com
    zip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gang Fu (2016). Provenance RDF Models [Dataset]. http://doi.org/10.6084/m9.figshare.1399197.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Gang Fu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A reference data set crowdsourced from multiple data sources. Codes to generate multiple provenance RDF models are available. The sample queries for comparative analysis are also included.

  19. s

    Data from: Whole Brain Catalog

    • scicrunch.org
    Updated Oct 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Whole Brain Catalog [Dataset]. http://identifiers.org/RRID:SCR_007011
    Explore at:
    Dataset updated
    Oct 17, 2019
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented May 26, 2016. An open source, downloadable, 3d atlas of the mouse brain and its cellular constituents that allows multi-scale data to be visualized in a seamless way, similar to Google earth. Data within the Catalog is marked up with annotations and can link out to additional data sources via a semantic framework. This next generation open environment has been developed to connect members of the neuroscience community to facilitate solutions for today's intractable challenges in brain research through cooperation and crowd sourcing. The client-server platform provides rich 3-D views for researchers to zoom in, out, and around structures deep in a multi-scale spatial framework of the mouse brain. An open-source, 3-D graphics engine used in graphics-intensive computer gaming generates high-resolution visualizations that bring data to life through biological simulations and animations. Within the Catalog, researchers can view and contribute a wide range of data including: * 3D meshes of subcellular scenes or brain region territories * Large 2D image datasets from both electron and light level microscopy * NeuroML and Neurolucida neuronal reconstructions * Protein Database molecular structures Users of the Whole Brain Catalog can: * Fit data of any scale into the international standard atlas coordinate system for spatial brain mapping, the Waxholm Space. * View brain slices, neurons and their animation, neuropil reconstructions, and molecules in appropriate locations * View data up close and at a high resolution * View their own data in the Whole Brain Catalog environment * View data within a semantic environment supported by vocabularies from the Neuroscience Information Framework (NIF) at http://www.neuinfo.org. * Contribute code and connect personal tools to the environment * Make new connections with related research and researchers 5 Easy Ways to Explore: * Explore the datasets across multiple scales. * View data closely at high resolution. * Observe accurately simulated neurons. * Readily search for content. * Contribute your own research.

  20. S

    Crop trait regulating-genes knowledge graph dataset

    • scidb.cn
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhang dan dan (2025). Crop trait regulating-genes knowledge graph dataset [Dataset]. http://doi.org/10.57760/sciencedb.agriculture.00175
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Science Data Bank
    Authors
    zhang dan dan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    In the scientific research of crop breeding, breeding new crop varieties with various excellent traits has always been the direction of efforts of breeders. At present, with the accelerated application of information technology in the field of crop breeding, the multi-dimensional scientific data related to crop breeding has shown exponential growth. These semi-structured and structured scientific data are distributed in scientific databases in different fields and lack the association and fusion of multi-dimensional scientific data across species. It hindered the transfer and reuse of existing crop breeding knowledge and maximized the value of crop breeding scientific data, which brought challenges to the knowledge discovery of crop trait regulation genes. Therefore, more and more crop breeding research work is based on the reorganization, correlation, analysis and utilization of existing breeding scientific data, so as to achieve the discovery of crop trait regulation gene knowledge.The dataset of knowledge map of crop trait regulatory genes was selected from PubMed literature database, Phytozome (genomic information of 4 species) and Ensembl (European Molecular Biology Laboratory's European) Bioinformatics Institute (Bioinformatics Institute) plants (Genome information of 4 species), UniProt (Universal Protein) (protein Annotation information of 4 species), Rice Genome Annotation (RGAP) Project), STRING (protein interaction information for 4 species), Pfam (Protein family analysis and modeling) (protein family information for 4 species), KEGG (Kyoto Encyclopedia of Genes) The entities and relationships of the multi-source scientific data with different data formats were extracted using the and Genomes (pathway annotation information of the 4 species) and the GO (Gene Ontology) domain scientific database as the data sources. It mainly includes mapping knowledge extraction for structured data. For XML semi-structured data, knowledge extraction based on Kettle data analysis is adopted. For FASTA semi-structured data, knowledge extraction based on BLAST model is adopted. For Text unstructured data, knowledge extraction based on large language model is adopted. On the basis of the above entity and relationship extraction, the association fusion of multi-source crop breeding knowledge was realized based on entity mapping and specific attribute association. Finally, the crop trait regulatory gene knowledge map dataset was formed, which consisted of 13 entity datasets and 16 entity relationship datasets.The crop trait -egulating gene knowledge graph dataset provides a key semantic model and important data basis for crop breeding knowledge discovery, such as excellent pleiotropic gene discovery, cross-species gene function prediction and potential discovery of pathway gene network.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gaston Mazandu; Kenneth B. Opap; Funmilayo Makinde; Victoria Nembaware; Francis Agamah; Christian Bope; Emile R. Chimusa; Ambroise Wonkam; Nicola Mulder (2021). Semantic Similarity Score Calculation and Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.14599992.v2
Organization logo

Semantic Similarity Score Calculation and Reproducibility

Explore at:
txtAvailable download formats
Dataset updated
May 14, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Gaston Mazandu; Kenneth B. Opap; Funmilayo Makinde; Victoria Nembaware; Francis Agamah; Christian Bope; Emile R. Chimusa; Ambroise Wonkam; Nicola Mulder
License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

Building the annotation file, consisting of protein (entity)-gene ontology process map extracted from the GOA UniProt dataset at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz. This protein-process map file is used to generate protein pairs used for testing the PySML library. Semantic similarity scores produced are also included.

Search
Clear search
Close search
Google apps
Main menu