Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Building the annotation file, consisting of protein (entity)-gene ontology process map extracted from the GOA UniProt dataset at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz. This protein-process map file is used to generate protein pairs used for testing the PySML library. Semantic similarity scores produced are also included.
Facebook
TwitterBioinformatics, the application of computational tools to the management and analysis of biological data, has stimulated rapid research advances in genomics through the development of data archives such as GenBank, and similar progress is just beginning within ecology. One reason for the belated adoption of informatics approaches in ecology is the breadth of ecologically pertinent data (from genes to the biosphere) and its highly heterogeneous nature. The variety of formats, logical structures, and sampling methods in ecology create significant challenges. Cultural barriers further impede progress, especially for the creation and adoption of data standards. Here we describe informatics frameworks for ecology, from subject-specific data warehouses, to generic data collections that use detailed metadata descriptions and formal ontologies to catalog and cross-reference information. Combining these approaches with automated data integration techniques and scientific workflow systems will ma...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Molecular Entities in Linked Data (MEiLD) dataset comprises data of distinct atoms, molecules, ions, ion pairs, radicals, radical ions, and others that can be identifiable as separately distinguishable chemical entities. The dataset is provided in a JSON-LD format and was generated by the SDFEater, a tool that allows parsing atoms, bonds, and other molecule data. MEiLD contains 349,960 of ‘small’ chemical entities.
Facebook
TwitterQuery-based web application that helps users find bioinformatics and artificial intelligence (AI) software. G6GFINDR is powered by "semantic annotation" vs. keyword search, which take advantage of semantic web graph technology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Grant proposal (project description and references cited) to the US National Science Foundation, Advances in Biological Informatics (ABI) program as Collaborative Research. Funded in 2015. Files include public abstract as submitted to NSF.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current code contains an extension for the XSLT processor apache XALAN : it allows to search and inject some RDF statements during a XSLT transformation. As an example, the makefile transforms a NCBI-gene record to HTML and annotate it with the disease-ontology .
Facebook
TwitterThe developed RDF models and the SPARQL queries used are made available at: http://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/neurordf.html . (ZIP 178 kb)
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
UMNSRS, developed by Pakhomov, et al., consists of 725 clinical term pairs whose semantic similarity and relatedness. The similarity and relatedness of each term pair was annotated based on a continuous scale by having the resident touch a bar on a touch sensitive computer screen to indicate the degree of similarity or relatedness. The following subsets are available: - similarity: A set of 566 UMLS concept pairs manually rated for semantic similarity (e.g. whale-dolphin) using a continuous response scale. - relatedness: A set of 588 UMLS concept pairs manually rated for semantic relatedness (e.g. needle-thread) using a continuous response scale. - similarity_mod: Modification of the UMNSRS-Similarity dataset to exclude control samples and those pairs that did not match text in clinical, biomedical and general English corpora. Exact modifications are detailed in the paper (Corpus Domain Effects on Distributional Semantic Modeling of Medical Terms. Serguei V.S. Pakhomov, Greg Finley, Reed McEwan, Yan Wang, and Genevieve B. Melton. Bioinformatics. 2016; 32(23):3635-3644). The resulting dataset contains 449 pairs. - relatedness_mod: Modification of the UMNSRS-Relatedness dataset to exclude control samples and those pairs that did not match text in clinical, biomedical and general English corpora. Exact modifications are detailed in the paper (Corpus Domain Effects on Distributional Semantic Modeling of Medical Terms. Serguei V.S. Pakhomov, Greg Finley, Reed McEwan, Yan Wang, and Genevieve B. Melton. Bioinformatics. 2016; 32(23):3635-3644). The resulting dataset contains 458 pairs.
Facebook
Twitterhttps://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0
WormBase is an online bioinformatics database of the biology and genome of the model organism Caenorhabditis elegans and other nematodes. It is used by the C. elegans research community both as an information resource and as a mode to publish and distribute their results. This collection references WormBase-accessioned entities.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pairwise GO term sets (Biological Process) similarities induced by drug-pairs and their PPI partners (degree =2) among all small-molecule drugs modeled by semantic similarity.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
ABSTRACT:
The Human Disease Ontology (DO) (http://www.disease-ontology.org), database has undergone significant expansion in the past three years. The DO disease classification includes specific formal semantic rules to express meaningful disease models and has expanded from a single asserted classification to include multiple-inferred mechanistic disease classifications, thus providing novel perspectives on related diseases. Expansion of disease terms, alternative anatomy, cell type and genetic disease classifications and workflow automation highlight the updates for the DO since 2015. The enhanced breadth and depth of the DO's knowledgebase has expanded the DO's utility for exploring the multi-etiology of human disease, thus improving the capture and communication of health-related data across biomedical databases, bioinformatics tools, genomic and cancer resources and demonstrated by a 6.6× growth in DO's user community since 2015. The DO's continual integration of human disease knowledge, evidenced by the more than 200 SVN/GitHub releases/revisions, since previously reported in our DO 2015 NAR paper, includes the addition of 2650 new disease terms, a 30% increase of textual definitions, and an expanding suite of disease classification hierarchies constructed through defined logical axioms.
Instructions:
Data was cleaned. Duplicates and unnecessary columns were removed. Title of columns were changed.
Inspiration:
This dataset uploaded to U-BRITE for "DRG_DEPOT" summer 2023 team project.
Acknowledgements:
Schriml, L. M., Mitraka, E., Munro, J., Tauber, B., Schor, M., Nickle, L., Felix, V., Jeng, L., Bearer, C., Lichenstein, R., Bisordi, K., Campion, N., Hyman, B., Kurland, D., Oates, C. P., Kibbey, S., Sreekumar, P., Le, C., Giglio, M., & Greene, C.
Human Disease Ontology 2018 update: classification, content and workflow expansion
Nucleic Acids Research 2019; 47(D1), D955–D962;PMID:30407550;DOI:https://doi.org/10.1093/nar/gky1032
U-BRITE last update data: 06/28/2023
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
EDAM is an ontology of general bioinformatics concepts, including topics, data types, formats, identifiers and operations. EDAM provides a controlled vocabulary for the description, in semantic terms, of things such as: web services (e.g. WSDL files), applications, tool collections and packages, work-benches and workflow software, databases and ontologies, XSD data schema and data objects, data syntax and file formats, web portals and pages, resource catalogues and documents (such as scientific publications).
Facebook
TwitterProject that developed an open access discovery platform, called Open Pharmacological Space (OPS), via a semantic web approach, integrating pharmacological data from a variety of information resources and tools and services to question this integrated data to support pharmacological research. The project is based upon the assimilation of data already stored as triples, in the form subject-predicate-object. The software and data are available for download and local installation, under an open source and open access model. Tools and services are provided to query and visualize this data, and a sustainability plan will be in place, continuing the operation of the Open PHACTS Discovery Platform after the project funding ends. Throughout the project, a series of recommendations will be developed in conjunction with the community, building on open standards, to ensure wide applicability of the approaches used for integration of data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Paper submitted to SWAT4LS 2018. We introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomics-related real use cases, we show how such mapping can allow for a hybrid approach to the management of networked knowledge, based on taking advantage of the best of both RDF and property graphs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of discussion groups at the Linked Science Workshop 2013 held at the International Semantic Web Conference. (http://linkedscience.org/events/lisc2013/) Participants were asked to develop a matrices about how semantic web/linked data solutions can help address reproducbility/re* problems. The results are documented in the spreadsheets above and described in videos (to be posted) The participants also developed a set of challenges for the Linked Science and broader semantic web community to help address these re* problems. See below or (lisc2013-challenges.txt) Linked Science Community Challenges The Linked Science 2013 workshop discussion participants identified several challenges to the Linked Data/Semantic Web community in order to help reproducibility (and other re* problems i.e. repurposing, reuse, etc) in science. 1) Promote the basics of linked data for reproducibility Many basic linked data technologies (e.g. content negotiation or the use of dereferenceable URLs) could be usable for scientific reproducibility and reproducibility. The goal here would be to develop a set of how-to documents that guide e-scientists on how to use these technologies to support scientific re* problems. An important point would be to tie these solutions directly to domain scientist problems. 2) Integrate Semantic Web technologies and the publishing process. Publishing is central to the scientific process and the issues of reusing scientific work. Semantic Web technologies should be integrated into the publishing process to enable reuse. 3) Make it easier to publish data and then work with it than work directly on your own data. Publishing data should enable a scientist to do more. Can we make it so that publishing data is so useful to the scientist themselves that it would be their first option? 4) Provide an integrated view of the how, what, when, where, and why of the scientific process. Linked data technologies are designed for integration and aggregation. Can we use these technologies to provide an integrated view over all the questions one might have with respect to a scientific experiment? 5) Provide a mechanisms for dealing with copyright on data both from a technical and social perspective. Dealing with copyright is not always straightforward. Can we eliminate the barriers to reuse through helping scientists with these copyright issues in an automatic fashion. 6) Get an altmetric based award into one of our own venues. Part of supporting re* problems is promoting sharing. We should "eat are own dogfood" by promoting and rewarding sharing in the major semantic web venues. We suggest an award based on some sort of altmetric. 7) Make sure the EBI RDF platform does not get shut down in two years. The European Bioinformatics Institute has released RDF versions with SPARQL endpoints for many of their core data sets. They are making it available for two years and checking on whether it is used to determine if it continues in the long term. This is a key data resource for using Linked Data for reproducibility - let's make sure it keeps going.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented September 6, 2016. The Allen Institute Neurowiki is a joint project between Vulcan Inc. and the Allen Institute to build a Semantic Wiki mapping genetic instances. It is a finished prototype testing the import pipelines and display componenets for combining 5 major RDF datasets from 4 different sources. Current planning includes mapping complete datasets, curating a better ontology, and creating multiple ontology management for a user class. Biological Linked Data Map: * Open, public online access * Data from multiple RDF data stores * Complete import pipeline using LDIF framework * Outlines of each imported instance embedding inline wiki properties and providing views of imported properties from original RDF datasets * Charting tools that ''''pivot'''' SPARQL queries providing several views of each query * Navigation and composition tools for accessing and mining the data Where did we get the data? * KEGG: Kyoto Encyclopedia of Genes and Genomes: KEGG GENES is a collection of gene catalogs for all complete genomes generated from publicly available resources, mostly NCBI RefSeq * Diseasome: The Diseasome website is a disease / disorder relationships explorer and a sample of an innovative map-oriented scientific work. Built by a team of researchers and engineers, it uses the Human Disease Network dataset. * DrugBank: The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. * Sider: Sider contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. Every piece of content on every instance page is generated by Semantic Result Formatters interpreting SPARQL results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A reference data set crowdsourced from multiple data sources. Codes to generate multiple provenance RDF models are available. The sample queries for comparative analysis are also included.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented May 26, 2016. An open source, downloadable, 3d atlas of the mouse brain and its cellular constituents that allows multi-scale data to be visualized in a seamless way, similar to Google earth. Data within the Catalog is marked up with annotations and can link out to additional data sources via a semantic framework. This next generation open environment has been developed to connect members of the neuroscience community to facilitate solutions for today's intractable challenges in brain research through cooperation and crowd sourcing. The client-server platform provides rich 3-D views for researchers to zoom in, out, and around structures deep in a multi-scale spatial framework of the mouse brain. An open-source, 3-D graphics engine used in graphics-intensive computer gaming generates high-resolution visualizations that bring data to life through biological simulations and animations. Within the Catalog, researchers can view and contribute a wide range of data including: * 3D meshes of subcellular scenes or brain region territories * Large 2D image datasets from both electron and light level microscopy * NeuroML and Neurolucida neuronal reconstructions * Protein Database molecular structures Users of the Whole Brain Catalog can: * Fit data of any scale into the international standard atlas coordinate system for spatial brain mapping, the Waxholm Space. * View brain slices, neurons and their animation, neuropil reconstructions, and molecules in appropriate locations * View data up close and at a high resolution * View their own data in the Whole Brain Catalog environment * View data within a semantic environment supported by vocabularies from the Neuroscience Information Framework (NIF) at http://www.neuinfo.org. * Contribute code and connect personal tools to the environment * Make new connections with related research and researchers 5 Easy Ways to Explore: * Explore the datasets across multiple scales. * View data closely at high resolution. * Observe accurately simulated neurons. * Readily search for content. * Contribute your own research.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
In the scientific research of crop breeding, breeding new crop varieties with various excellent traits has always been the direction of efforts of breeders. At present, with the accelerated application of information technology in the field of crop breeding, the multi-dimensional scientific data related to crop breeding has shown exponential growth. These semi-structured and structured scientific data are distributed in scientific databases in different fields and lack the association and fusion of multi-dimensional scientific data across species. It hindered the transfer and reuse of existing crop breeding knowledge and maximized the value of crop breeding scientific data, which brought challenges to the knowledge discovery of crop trait regulation genes. Therefore, more and more crop breeding research work is based on the reorganization, correlation, analysis and utilization of existing breeding scientific data, so as to achieve the discovery of crop trait regulation gene knowledge.The dataset of knowledge map of crop trait regulatory genes was selected from PubMed literature database, Phytozome (genomic information of 4 species) and Ensembl (European Molecular Biology Laboratory's European) Bioinformatics Institute (Bioinformatics Institute) plants (Genome information of 4 species), UniProt (Universal Protein) (protein Annotation information of 4 species), Rice Genome Annotation (RGAP) Project), STRING (protein interaction information for 4 species), Pfam (Protein family analysis and modeling) (protein family information for 4 species), KEGG (Kyoto Encyclopedia of Genes) The entities and relationships of the multi-source scientific data with different data formats were extracted using the and Genomes (pathway annotation information of the 4 species) and the GO (Gene Ontology) domain scientific database as the data sources. It mainly includes mapping knowledge extraction for structured data. For XML semi-structured data, knowledge extraction based on Kettle data analysis is adopted. For FASTA semi-structured data, knowledge extraction based on BLAST model is adopted. For Text unstructured data, knowledge extraction based on large language model is adopted. On the basis of the above entity and relationship extraction, the association fusion of multi-source crop breeding knowledge was realized based on entity mapping and specific attribute association. Finally, the crop trait regulatory gene knowledge map dataset was formed, which consisted of 13 entity datasets and 16 entity relationship datasets.The crop trait -egulating gene knowledge graph dataset provides a key semantic model and important data basis for crop breeding knowledge discovery, such as excellent pleiotropic gene discovery, cross-species gene function prediction and potential discovery of pathway gene network.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Building the annotation file, consisting of protein (entity)-gene ontology process map extracted from the GOA UniProt dataset at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz. This protein-process map file is used to generate protein pairs used for testing the PySML library. Semantic similarity scores produced are also included.