3 datasets found

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL
zenodo.org
bz2, zip
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aidan Hogan; Aidan Hogan; Cristian Riveros; Cristian Riveros; Carlos Rojas; Carlos Rojas; Adrián Soto; Adrián Soto (2021). Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL [Dataset]. http://doi.org/10.5281/zenodo.4035223
Explore at:
zip, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.4035223
Dataset updated
Jan 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Aidan Hogan; Aidan Hogan; Cristian Riveros; Cristian Riveros; Carlos Rojas; Carlos Rojas; Adrián Soto; Adrián Soto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Wikidata Graph Pattern Benchmark (WGPB) is a benchmark consisting of 50 instances of 17 different abstract query patterns giving a total of 850 SPARQL queries. The goal of the benchmark is to test the performance of query engines for more complex basic graph patterns. The benchmark was designed for evaluating worst-case optimal join algorithms but also serves as a general-purpose benchmark for evaluating (basic) graph patterns. The queries are provided in SPARQL syntax and all return at least one solution. We limit the number of results returned to a maximum of 1,000.

Queries

We provide an example of a "square" basic graph pattern (comments are added here for readability):

SELECT * WHERE { ?x1 <http://www.wikidata.org/prop/direct/P149> ?x2 . # architectural style ?x2 <http://www.wikidata.org/prop/direct/P1269> ?x3 . # facet of ?x3 <http://www.wikidata.org/prop/direct/P156> ?x4 . # followed by ?x1 <http://www.wikidata.org/prop/direct/P135> ?x4 . # movement } LIMIT 1000

There are 49 other queries similar to this one in the dataset (replacing the predicates with other predicates), and 50 queries for 16 other abstract query patterns. For more details on these patterns, we refer to the publication mentioned below.

Note that you can try the queries on the public Wikidata Query Service, though some might give a timeout.

Generation

The queries were generated over a reduced version of the Wikidata truthy dump from November 15, 2018 that we call the Wikidata Core Graph (WCG). Specifically, in order to reduce the data volume, multilingual labels, comments, etc., were removed as they have limited use for evaluating joins (English labels were kept under schema:name). Thereafter, in order to facilitate the generation of the queries, triples with rare predicates appearing in fewer than 1,000 triples, and very common predicates appearing in more than 1,000,000 triples, were removed. The queries provided will generate the same results over both graphs.

Files

In this dataset, we then include three files:

wgpb-queries.zip The list of 850 queries

wikidata-wcg.nt.gz Wikidata truthy graph with English labels

wikidata-wcg-filtered.nt.bz2 Wikidata truthy graph with English labels filtering triples with rare (<1000 triples) and very common (>1000000) predicates

Code

We provide the code for generating the datasets, queries, etc., along with scripts and instructions on how to run these queries in a variety of SPARQL engines (Blazegraph, Jena, Virtuoso and our worst-case optimal variant of Jena), .

Publication

The benchmark is proposed, described and used in the following paper. You can find more details about how it was generated, the 17 abstract patterns that were used, as well as results for prominent SPARQL engines.

Aidan Hogan, Cristian Riveros, Carlos Rojas and Adrián Soto. "A Worst-Case Optimal Join Algorithm for SPARQL". In the Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, October 26–30, 2019.
JSON export from a Neo4j Graph database experimental data for bird...
figshare.com
json
Updated Mar 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Anderson; Brian Wee (2021). JSON export from a Neo4j Graph database experimental data for bird conservation planning [Dataset]. http://doi.org/10.6084/m9.figshare.14200058.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14200058.v1
Dataset updated
Mar 11, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Scott Anderson; Brian Wee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Structured data characterizing selected avian conservation aspects of North Carolina's Wildlife Action Plans were already encoded in a Semantic MediaWiki database (http://wiki.ncpif.org/). That database was created, and is maintained by, the North Carolina Partners in Flight (NC PIF) program, which is a program of the North Carolina Wildlife Resources Commission. The NC PIF wiki database was ported into a Neo4j labeled property graph database for an experiment in linking avian species, organizations, geographies, and management plans. This JSON file is an export from that Neo4j database.
S
PheKnowLator Human Disease Knowledge Graphs - Build Data (Processed)
data.subak.org
data.niaid.nih.gov
csv
Updated Feb 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Colorado Anschutz Medical Campus (2023). PheKnowLator Human Disease Knowledge Graphs - Build Data (Processed) [Dataset]. https://data.subak.org/dataset/pheknowlator-human-disease-knowledge-graphs-build-data-processed
Explore at:
csvAvailable download formats
Dataset updated
Feb 16, 2023
Dataset provided by
University of Colorado Anschutz Medical Campus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RELEASE V2.1.0 KNOWLEDGE GRAPH: PROCESSED DATA SOURCES

Release: v2.1.0

The goal of this build was to create a knowledge graph that represented human disease mechanisms and included the central dogma. The data sources utilized in this release include many of the sources used in the initial release, as well as some new data made available by the Comparative Toxicogenomics Database and experimental data from the Human Protein Atlas.

Data sources are listed by type (Ontology and Data not represented in an ontology [Database Sources]). Additional details are provided for each data source below. Please see documentation on the primary release (https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for additional details on each data source as well as citation information.

Data Access:

https://console.cloud.google.com/storage/browser/pheknowlator/archived_builds/release_v2.1.0/build_01MAY2021

ONTOLOGIES

Cell Ontology

Cell Line Ontology

Chemical Entities of Biological Interest (ChEBI) Ontology

Gene Ontology

Human Phenotype Ontology

Mondo Disease Ontology

Pathway Ontology

Protein Ontology

Relations Ontology

Sequence Ontology

Uber-Anatomy Ontology

Vaccine Ontology

Cell Ontology (CL)

Homepage: GitHub

Citation:

Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21

Usage: Utilized to connect transcripts and proteins to cells. Additionally, the edges between this ontology and its dependencies are utilized:

ChEBI

GO

PATO

PRO

RO

UBERON

Cell Line Ontology (CLO)

Homepage: http://www.clo-ontology.org/

Citation:

Sarntivijai S, Lin Y, Xiang Z, Meehan TF, Diehl AD, Vempati UD, Schürer SC, Pang C, Malone J, Parkinson H, Liu Y. CLO: the cell line ontology. Journal of Biomedical Semantics. 2014;5(1):37

Usage: Utilized this ontology to map cell lines to transcripts and proteins. Additionally, the edges between this ontology and its dependencies are utilized:

CL

DOID

NCBITaxon

UBERON

Chemical Entities of Biological Interest (ChEBI)

Homepage: https://www.ebi.ac.uk/chebi/

Citation:

Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2015;44(D1):D1214-9

Usage: Utilized to connect chemicals to complexes, diseases, genes, GO biological processes, GO cellular components, GO molecular functions, pathways, phenotypes, reactions, and transcripts.

Gene Ontology (GO)

Homepage: http://geneontology.org/

Citations:

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25

The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2018;47(D1):D330-8

Usage: Utilized to connect biological processes, cellular components, and molecular functions to chemicals, pathways, and proteins. Additionally, the edges between this ontology and its dependencies are utilized:

CL

NCBITaxon

RO

UBERON

Other Gene Ontology Data Used: goa_human.gaf.gz

Human Phenotype Ontology (HPO)

Homepage: https://hpo.jax.org/

Citation:

Köhler S, Carmody L, Vasilevsky N, Jacobsen JO, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. 2018;47(D1):D1018-27

Usage: Utilized to connect phenotypes to chemicals, diseases, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized:

CL

ChEBI

GO

UBERON

Files

Other Human Phenotype Ontology Data Used: phenotype.hpoa

Mondo Disease Ontology (Mondo)

Homepage: https://mondo.monarchinitiative.org/

Citation:

Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research. 2017;45(D1):D712-22

Usage: Utilized to connect diseases to chemicals, phenotypes, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized:

CL

NCBITaxon

GO

HPO

UBERON

Pathway Ontology (PW)

Homepage: rgd.mcw.edu

Citation:

Petri V, Jayaraman P, Tutaj M, Hayman GT, Smith JR, De Pons J, Laulederkind SJ, Lowry TF, Nigam R, Wang SJ, Shimoyama M. The pathway ontology–updates and applications. Journal of Biomedical Semantics. 2014;5(1):7.

Usage: Utilized to connect pathways to GO biological processes, GO cellular components, GO molecular functions, Reactome pathways. Several steps are taken in order to connect Pathway Ontology identifiers to Reactome pathways and GO biological processes. To connect Pathway Ontology identifiers to Reactome pathways, we use ComPath Pathway Database Mappings developed by Daniel Domingo-Fernández (PMID:30564458).

Files

Downloaded Mapping Data

curated_mappings.txt

kegg_reactome.csv

Generated Mapping Data

REACTOME_PW_GO_MAPPINGS.txt

Protein Ontology (PRO)

Homepage: https://proconsortium.org/

Citation:

Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D’Eustachio P, Evsikov AV, Huang H, Nchoutmboube J. The Protein Ontology: a structured representation of protein forms and complexes. Nucleic Acids Research. 2010;39(suppl_1):D539-45

Usage: Utilized to connect proteins to chemicals, genes, anatomy, catalysts, cell lines, cofactors, complexes, GO biological processes, GO cellular components, GO molecular functions, pathways, proteins, reactions, and transcripts. Additionally, the edges between this ontology and its dependencies are utilized:

ChEBI

DOID

GO

Notes: A partial, human-only version of this ontology was used. Details on how this version of the ontology was generated can be found under the Protein Ontology section of the Data_Preparation.ipynb Jupyter Notebook.

Files

Generated Human Version Protein Ontology (PRO)

human_pro.owl (closed with hermit reasoner)

Other PRO Data Used: promapping.txt

Generated Mapping Data

Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl

Ensembl
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aidan Hogan; Aidan Hogan; Cristian Riveros; Cristian Riveros; Carlos Rojas; Carlos Rojas; Adrián Soto; Adrián Soto (2021). Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL [Dataset]. http://doi.org/10.5281/zenodo.4035223

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip, bz2Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.4035223

Dataset updated

Jan 11, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Aidan Hogan; Aidan Hogan; Cristian Riveros; Cristian Riveros; Carlos Rojas; Carlos Rojas; Adrián Soto; Adrián Soto

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Wikidata Graph Pattern Benchmark (WGPB) is a benchmark consisting of 50 instances of 17 different abstract query patterns giving a total of 850 SPARQL queries. The goal of the benchmark is to test the performance of query engines for more complex basic graph patterns. The benchmark was designed for evaluating worst-case optimal join algorithms but also serves as a general-purpose benchmark for evaluating (basic) graph patterns. The queries are provided in SPARQL syntax and all return at least one solution. We limit the number of results returned to a maximum of 1,000.

Queries

We provide an example of a "square" basic graph pattern (comments are added here for readability):

SELECT * WHERE { 
 ?x1 <http://www.wikidata.org/prop/direct/P149> ?x2 . # architectural style
 ?x2 <http://www.wikidata.org/prop/direct/P1269> ?x3 . # facet of
 ?x3 <http://www.wikidata.org/prop/direct/P156> ?x4 . # followed by
 ?x1 <http://www.wikidata.org/prop/direct/P135> ?x4 . # movement
} LIMIT 1000

There are 49 other queries similar to this one in the dataset (replacing the predicates with other predicates), and 50 queries for 16 other abstract query patterns. For more details on these patterns, we refer to the publication mentioned below.

Note that you can try the queries on the public Wikidata Query Service, though some might give a timeout.

Generation

The queries were generated over a reduced version of the Wikidata truthy dump from November 15, 2018 that we call the Wikidata Core Graph (WCG). Specifically, in order to reduce the data volume, multilingual labels, comments, etc., were removed as they have limited use for evaluating joins (English labels were kept under schema:name). Thereafter, in order to facilitate the generation of the queries, triples with rare predicates appearing in fewer than 1,000 triples, and very common predicates appearing in more than 1,000,000 triples, were removed. The queries provided will generate the same results over both graphs.

Files

In this dataset, we then include three files:

wgpb-queries.zip The list of 850 queries
wikidata-wcg.nt.gz Wikidata truthy graph with English labels
wikidata-wcg-filtered.nt.bz2 Wikidata truthy graph with English labels filtering triples with rare (<1000 triples) and very common (>1000000) predicates

Code

We provide the code for generating the datasets, queries, etc., along with scripts and instructions on how to run these queries in a variety of SPARQL engines (Blazegraph, Jena, Virtuoso and our worst-case optimal variant of Jena), .

Publication

The benchmark is proposed, described and used in the following paper. You can find more details about how it was generated, the 17 abstract patterns that were used, as well as results for prominent SPARQL engines.

Aidan Hogan, Cristian Riveros, Carlos Rojas and Adrián Soto. "A Worst-Case Optimal Join Algorithm for SPARQL". In the Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, October 26–30, 2019.

Clear search

Close search

Google apps

Main menu

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL

JSON export from a Neo4j Graph database experimental data for bird...

PheKnowLator Human Disease Knowledge Graphs - Build Data (Processed)

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL