Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structured data characterizing selected avian conservation aspects of North Carolina's Wildlife Action Plans were already encoded in a Semantic MediaWiki database (http://wiki.ncpif.org/). That database was created, and is maintained by, the North Carolina Partners in Flight (NC PIF) program, which is a program of the North Carolina Wildlife Resources Commission. The NC PIF wiki database was ported into a Neo4j labeled property graph database for an experiment in linking avian species, organizations, geographies, and management plans. This JSON file is an export from that Neo4j database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adverse Outcome Pathways (AOPs) have been proposed to facilitate mechanistic understanding of interactions of chemicals/materials with biological systems. Each AOP starts with a molecular initiating event (MIE) and possibly ends with adverse outcome(s) (AOs) via a series of key events (KEs). So far, the interaction of engineered nanomaterials (ENMs) with biomolecules, biomembranes, cells, and biological structures, in general, is not yet fully elucidated. There is also a huge lack of information on which AOPs are ENMs-relevant or -specific, despite numerous published data on toxicological endpoints they trigger, such as oxidative stress and inflammation. We propose to integrate related data and knowledge recently collected. Our approach combines the annotation of nanomaterials and their MIEs with ontology annotation to demonstrate how we can then query AOPs and biological pathway information for these materials. We conclude that a FAIR (Findable, Accessible, Interoperable, Reusable) representation of the ENM-MIE knowledge simplifies integration with other knowledge.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
YAGO 3 combines the information from the Wikipedias in multiple languages with WordNet, GeoNames, and other data sources. YAGO 3 taps into multilingual resources of Wikipedia, getting to know more local entities and facts. This version has been extracted from 10 different Wikipedia versions (English, German, French, Dutch, Italian, Spanish, Polish, Romanian, Persian, and Arabic). YAGO 3 is special in several ways: * YAGO 3 combines the clean taxonomy of WordNet with the richness of the Wikipedia category system, assigning the entities to more than 350,000 classes. * YAGO 3 is anchored in time and space. YAGO attaches a temporal dimension and a spatial dimension to many of its facts and entities. * In addition to taxonomy, YAGO has thematic domains such as “music” or “science” from WordNet Domains. * YAGO 3 extracts and combines entities and facts from 10 Wikipedias in different languages. * YAGO 3 contains canonical representations of entities appearing in different Wikipedia language editions. * YAGO 3 integrates all non-English entities into the rich type taxonomy of YAGO. * YAGO 3 provides a mapping between non-English infobox attributes and YAGO relations. YAGO 3 knows more than 17 million entities (like persons, organizations, cities, etc.) and contains more than 150 million facts about these entities. As with all major releases, the accuracy of YAGO 3 has been manually evaluated, proving a confirmed accuracy of 95%. Every relation is annotated with its confidence value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RELEASE V2.1.0 KNOWLEDGE GRAPH: ORIGINAL DATA SOURCES
Release: v2.1.0
The goal of this build was to create a knowledge graph that represented human disease mechanisms and included the central dogma. The data sources utilized in this release include many of the sources used in the initial release, as well as some new data made available by the Comparative Toxicogenomics Database and experimental data from the Human Protein Atlas.
Data sources are listed by type (Ontology and Data not represented in an ontology [Database Sources]). Additional details are provided for each data source below. Please see documentation on the primary release (https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for additional details on each data source as well as citation information.
Data Access:
ONTOLOGIES
Cell Ontology
Cell Line Ontology
Chemical Entities of Biological Interest (ChEBI) Ontology
Gene Ontology
Human Phenotype Ontology
Mondo Disease Ontology
Pathway Ontology
Protein Ontology
Relations Ontology
Sequence Ontology
Uber-Anatomy Ontology
Vaccine Ontology
Cell Ontology (CL)
Homepage: GitHub Citation:
Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21
Usage: Utilized to connect transcripts and proteins to cells. Additionally, the edges between this ontology and its dependencies are utilized:
ChEBI
GO
PATO
PRO
RO
UBERON
Cell Line Ontology (CLO)
Homepage: http://www.clo-ontology.org/ Citation:
Sarntivijai S, Lin Y, Xiang Z, Meehan TF, Diehl AD, Vempati UD, Schürer SC, Pang C, Malone J, Parkinson H, Liu Y. CLO: the cell line ontology. Journal of Biomedical Semantics. 2014;5(1):37
Usage: Utilized this ontology to map cell lines to transcripts and proteins. Additionally, the edges between this ontology and its dependencies are utilized:
CL
DOID
NCBITaxon
UBERON
Chemical Entities of Biological Interest (ChEBI)
Homepage: https://www.ebi.ac.uk/chebi/ Citation:
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2015;44(D1):D1214-9
Usage: Utilized to connect chemicals to complexes, diseases, genes, GO biological processes, GO cellular components, GO molecular functions, pathways, phenotypes, reactions, and transcripts.
Gene Ontology (GO)
Homepage: http://geneontology.org/ Citations:
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2018;47(D1):D330-8
Usage: Utilized to connect biological processes, cellular components, and molecular functions to chemicals, pathways, and proteins. Additionally, the edges between this ontology and its dependencies are utilized:
CL
NCBITaxon
RO
UBERON
Other Gene Ontology Data Used: goa_human.gaf.gz
Human Phenotype Ontology (HPO)
Homepage: https://hpo.jax.org/ Citation:
Köhler S, Carmody L, Vasilevsky N, Jacobsen JO, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. 2018;47(D1):D1018-27
Usage: Utilized to connect phenotypes to chemicals, diseases, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized:
CL
ChEBI
GO
UBERON
Files
Other Human Phenotype Ontology Data Used: phenotype.hpoa
Mondo Disease Ontology (Mondo)
Homepage: https://mondo.monarchinitiative.org/ Citation:
Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research. 2017;45(D1):D712-22
Usage: Utilized to connect diseases to chemicals, phenotypes, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized:
CL
NCBITaxon
GO
HPO
UBERON
Pathway Ontology (PW)
Homepage: rgd.mcw.edu Citation:
Petri V, Jayaraman P, Tutaj M, Hayman GT, Smith JR, De Pons J, Laulederkind SJ, Lowry TF, Nigam R, Wang SJ, Shimoyama M. The pathway ontology–updates and applications. Journal of Biomedical Semantics. 2014;5(1):7.
Usage: Utilized to connect pathways to GO biological processes, GO cellular components, GO molecular functions, Reactome pathways. Several steps are taken in order to connect Pathway Ontology identifiers to Reactome pathways and GO biological processes. To connect Pathway Ontology identifiers to Reactome pathways, we use ComPath Pathway Database Mappings developed by Daniel Domingo-Fernández (PMID:30564458).
Files
Downloaded Mapping Data
curated_mappings.txt
kegg_reactome.csv
Generated Mapping Data
REACTOME_PW_GO_MAPPINGS.txt
Protein Ontology (PRO)
Homepage: https://proconsortium.org/ Citation:
Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D’Eustachio P, Evsikov AV, Huang H, Nchoutmboube J. The Protein Ontology: a structured representation of protein forms and complexes. Nucleic Acids Research. 2010;39(suppl_1):D539-45
Usage: Utilized to connect proteins to chemicals, genes, anatomy, catalysts, cell lines, cofactors, complexes, GO biological processes, GO cellular components, GO molecular functions, pathways, proteins, reactions, and transcripts. Additionally, the edges between this ontology and its dependencies are utilized:
ChEBI
DOID
GO
Notes: A partial, human-only version of this ontology was used. Details on how this version of the ontology was generated can be found under the Protein Ontology section of the Data_Preparation.ipynb Jupyter Notebook.
Files
Generated Human Version Protein Ontology (PRO)
human_pro.owl (closed with hermit reasoner)
Other PRO Data Used: promapping.txt
Generated Mapping Data
Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl
Ensembl Transcript-PRO Identifier Mapping: ENSEMBL_TRANSCRIPT_PROTEIN_ONTOLOGY_MAP.txt
Entrez Gene-PRO Identifier Mapping: ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt
UniProt Accession-PRO Identifier Mapping: UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt
STRING-PRO Identifier Mapping: STRING_PRO_ONTOLOGY_MAP.txt
Relations Ontology (RO)
Homepage: GitHub Citation:
Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biology. 2005;6(5):R46.
Usage: Utilizing this ontology to connect all data sources in knowledge graph. Additionally, the ontology is queried prior to building the knowledge graph to identify all relations, their inverse properties, and their labels.
Files
Generated RO Data
INVERSE_RELATIONS.txt
RELATIONS_LABELS.txt
Sequence Ontology (SO)
Homepage: GitHub Citation:
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biology. 2005;6(5):R44
Usage: Utilized to connect transcripts and other genomic material like genes and variants.
Files
Generated Mapping Data
genomic_sequence_ontology_mappings.xlsx
SO_GENE_TRANSCRIPT_VARIANT_TYPE_MAPPING.txt
Uber-Anatomy Ontology (Uberon)
Homepage: GitHub Citation:
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biology. 2012;13(1):R5
Usage: Utilized to connect tissues, fluids, and cells to proteins and transcripts. Additionally, the edges between this ontology and its dependencies are utilized:
ChEBI
CL
GO
PRO
Vaccine Ontology (VO)
Homepage: http://www.violinet.org/vaccineontology/ Citations:
He Y, Racz R, Sayers S, Lin Y, Todd T, Hur J, Li X, Patel M, Zhao B, Chung M, Ostrow J. Updates on the web-based VIOLIN vaccine database and analysis system. Nucleic Acids Research. 2013;42(D1):D1124-32
Xiang Z, Todd T, Ku KP, Kovacic BL, Larson CB, Chen F, Hodges AP, Tian Y, Olenzek EA, Zhao B, Colby LA. VIOLIN: vaccine investigation and online information network. Nucleic Acids Research. 2007;36(suppl_1):D923-8
Usage: Utilized the edges between this ontology and its dependencies:
ChEBI
DOID
GO
PRO
UBERON
DATABASE SOURCES
BioPortal
ClinVar
Comparative Toxicogenomics Database
DisGeNET
Ensembl
GeneMANIA
Genotype-Tissue Expression Project
Human Genome Organisation Gene Nomenclature Committee
Human Protein Atlas
National Center for Biotechnology Information Gene
Reactome Pathway Database
Search Tool for Recurring Instances of Neighbouring Genes Database
Universal Protein Resource Knowledgebase
BioPortal
Homepage: BioPortal Citation:
BioPortal. Lexical OWL Ontology Matcher (LOOM)
Ghazvinian A, Noy NF, Musen MA. Creating mappings for ontologies in biomedicine: simple methods work. In AMIA Annual Symposium Proceedings 2009 (Vol. 2009, p. 198). American Medical Informatics Association
Usage: BioPortal was utilized to obtain mappings between MeSH identifiers and ChEBI identifiers for chemicals-diseases, chemicals-genes, chemical-GO biological processes, chemicals-GO cellular components, chemicals-GO molecular functions, chemicals-phenotypes, chemicals-proteins, and chemicals-transcripts. Additional information on how this data was processed can be obtained
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
YAGO 4 is a version of the YAGO knowledge base that is based on Wikidata — the largest public general-purpose knowledge base. YAGO refines the data as follows: * All entity identifiers and property identifiers are human-readable. * The top-level classes come from schema.org — a standard repertoire of classes and properties maintained by Google and others, combined with bioschemas.org. The lower level classes are a selection of Wikidata classes. * The properties come from schema.org. * YAGO 4 contains semantic constraints in the form of SHACL. These constraints keep the data clean, and allow for logical reasoning on YAGO. YAGO contains more than 50 million entities and 2 billion facts.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
YAGO 2 is an improved version of the original YAGO knowledge base: * YAGO 2 is anchored in time and space. YAGO 2 attaches a temporal dimension and a spacial dimension to many of its facts and entities. * YAGO 2 is particularly suited for disambiguation purposes, as it contains a large number of names for entities. It also knows the gender of people. * As all major releases, the accuracy of YAGO 2 has been manually evaluated, proving an accuracy of 95% with respect to Wikipedia. Every relation is annotated with its confidence value.
Facebook
Twitterhttps://www.gnu.org/copyleft/fdl.htmlhttps://www.gnu.org/copyleft/fdl.html
This is the 2008 version of YAGO. It knows more than 2 million entities (like persons, organizations, cities, etc.). It knows 20 million facts about these entities. This version of YAGO includes the data extracted from the categories and infoboxes of Wikipedia, combined with the taxonomy of WordNet. YAGO 1 was manually evaluated, and found to have an accuracy of 95% with respect to the extraction source.
Facebook
Twitterhttps://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
This dataset aims to build a graph of subreddit links based on how they reference each other. The original database dump can be found here.
name (str): name of the subreddit.
type (str): type of the subreddit.
title (str): title of the subredditdescription (str): short description of the subreddit.subscribers (int?): amount of subscribers at the moment.nsfw (bool?): indicator if its flaged as not safe for work 🔞.quarantined (bool?): indicator if it has been quarantined 😷.color (str): key color of the subreddit.img_banner (str?): url of the image used as the banner.img_icon (str?): url of the image used as the icon (snoo).created_at (datetime): utc timestamp of when the subreddit was created.updated_at (datetime): utc timestamp of when the information of the subreddit was last updated.note: the '?' indicates that the value can be null under certain conditions.
| TYPE | AMOUNT |
|---|---|
| TOTAL | 127800 |
| public | 59227 |
| banned | 31473 |
| restricted | 14601 |
| public [nsfw] | 14244 |
| private | 5139 |
| restricted [nsfw] | 3014 |
| public [quarantined] | 29 |
| restricted [quarantined] | 21 |
| archived | 17 |
| premium | 12 |
| public [nsfw] [quarantined] | 11 |
| user [nsfw] | 6 |
| user | 4 |
| restricted [nsfw] [quarantined] | 1 |
| employees | 1 |
source (str): name of the subreddit where the link was found.target (str): name of the linked subreddit.type (str): place where the reference from source to target was found.
updated_at (datetime): utc timestamp of when the information the link was last updated.| TYPE | AMOUNT |
|---|---|
| TOTAL | 349744 |
| wiki | 214206 |
| sidebar | 123650 |
| topbar | 7291 |
| description | 4597 |
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
YAGO 2s is an improved version of YAGO 2, with the following main characteristics: * YAGO2s is stored natively in Turtle, making it completely RDF/OWL compliant while still maintaining the fact identifiers that are unique to YAGO. * The YAGO2s architecture enables cooperation of several contributors, facilitates debugging and maintenance. The data is divided into themes, so that users can download only particular pieces of YAGO (“YAGO à la carte”). * YAGO2s contains thematic domains such as “music” or “science” from WordNet Domains, which gives a topic structure to YAGO. As all major releases, the accuracy of YAGO2s has been manually evaluated, proving an accuracy of 95% with respect to Wikipedia.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structured data characterizing selected avian conservation aspects of North Carolina's Wildlife Action Plans were already encoded in a Semantic MediaWiki database (http://wiki.ncpif.org/). That database was created, and is maintained by, the North Carolina Partners in Flight (NC PIF) program, which is a program of the North Carolina Wildlife Resources Commission. The NC PIF wiki database was ported into a Neo4j labeled property graph database for an experiment in linking avian species, organizations, geographies, and management plans. This JSON file is an export from that Neo4j database.