Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. z

    Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL

    • zenodo.org
    bz2, zip
    Updated Oct 30, 2019
  2. Wikidata

    • ustudy.herokuapp.com
    • flashack.com
    • +11more
    full json dump +3
    Updated Mar 15, 2021
  3. Wikidata

    • triplydb.com
    application/n-quads +3
    Updated Feb 17, 2021
  4. Wikidata dump extension (sitelinks)

    • zenodo.org
    gz
    Updated May 22, 2020
  5. i

    Wikidata

    • registry.identifiers.org
    Updated Aug 23, 2019
  6. f

    Wikidata item quality labels

    • figshare.com
    txt
    Updated Dec 17, 2019
  7. Wembedder wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20

    • zenodo.org
    • search.datacite.org
    zip
    Updated Jul 5, 2017
  8. f

    Data from: 10 steps to integrate CIViCdb with other public data in Wikidata

    • figshare.com
    zip
    Updated Mar 26, 2017
  9. ML-You-Can-Use Wikidata Employers labeled

    • www.kaggle.com
    zip
    Updated Jun 1, 2020
  10. P

    Wikidata-Disamb Dataset

    • paperswithcode.com
    Updated Jan 27, 2021
  11. ML-You-Can-Use Wikidata Occupations labeled

    • www.kaggle.com
    zip
    Updated Apr 23, 2020
  12. Wikidata Human Settlements

    • www.kaggle.com
    zip
    Updated May 22, 2020
  13. z

    20 GB in 10 minutes: Data linking across major biodiversity databases: Data...

    • zenodo.org
    • figshare.com
    gz
    Updated Apr 6, 2018
  14. P

    Wikidata-14M Dataset

    • paperswithcode.com
    Updated Jul 13, 2021
  15. Wikidata Property Ranking

    • www.kaggle.com
    zip
    Updated Aug 22, 2017
  16. f

    Data from: Wikidata's linked data for cultural heritage digital resources:...

    • figshare.com
    • zenodo.org
    • +1more
    zip
    Updated Jan 12, 2020
  17. f

    Statistics GTAA & Wikidata

    • figshare.com
    zip
    Updated Jul 15, 2018
  18. a

    Wikidata PageRank

    • danker.s3.amazonaws.com
    application/vnd.hdt +2
    Updated Jan 18, 2021
  19. f

    Wikidata Constraint Violations - July 2018 - extended

    • figshare.com
    txt
    Updated Dec 7, 2020
  20. csisc/WikidataCOVID19SPARQL: Data about Wikidata coverage of COVID-19

    • zenodo.org
    zip
    Updated Sep 14, 2020
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aidan Hogan; Cristian Riveros; Carlos Rojas; Adrián Soto (2019). Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL [Dataset]. http://doi.org/10.5281/zenodo.4035223

Wikidata Graph Pattern Benchmark (WGPB) for RDF/SPARQL

2 scholarly articles cite this dataset (View in Google Scholar)
bz2, zipAvailable download formats
Dataset updated Oct 30, 2019
Dataset provided by
DCC, Universidad de Chile; IMFD
DCC, Pontificia Universidad Católica de Chile; IMFD
Authors
Aidan Hogan; Cristian Riveros; Carlos Rojas; Adrián Soto
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Wikidata Graph Pattern Benchmark (WGPB) is a benchmark consisting of 50 instances of 17 different abstract query patterns giving a total of 850 SPARQL queries. The goal of the benchmark is to test the performance of query engines for more complex basic graph patterns. The benchmark was designed for evaluating worst-case optimal join algorithms but also serves as a general-purpose benchmark for evaluating (basic) graph patterns. The queries are provided in SPARQL syntax and all return at least one solution. We limit the number of results returned to a maximum of 1,000.

Queries

We provide an example of a "square" basic graph pattern (comments are added here for readability):

SELECT * WHERE { 
 ?x1 

There are 49 other queries similar to this one in the dataset (replacing the predicates with other predicates), and 50 queries for 16 other abstract query patterns. For more details on these patterns, we refer to the publication mentioned below.

Note that you can try the queries on the public Wikidata Query Service, though some might give a timeout.

Generation

The queries were generated over a reduced version of the Wikidata truthy dump from November 15, 2018 that we call the Wikidata Core Graph (WCG). Specifically, in order to reduce the data volume, multilingual labels, comments, etc., were removed as they have limited use for evaluating joins (English labels were kept under schema:name). Thereafter, in order to facilitate the generation of the queries, triples with rare predicates appearing in fewer than 1,000 triples, and very common predicates appearing in more than 1,000,000 triples, were removed. The queries provided will generate the same results over both graphs.

Files

In this dataset, we then include three files:

  • wgpb-queries.zip The list of 850 queries
  • wikidata-wcg.nt.gz Wikidata truthy graph with English labels
  • wikidata-wcg-filtered.nt.bz2 Wikidata truthy graph with English labels filtering triples with rare (<1000 triples) and very common (>1000000) predicates

Code

We provide the code for generating the datasets, queries, etc., along with scripts and instructions on how to run these queries in a variety of SPARQL engines (Blazegraph, Jena, Virtuoso and our worst-case optimal variant of Jena), .

Publication

The benchmark is proposed, described and used in the following paper. You can find more details about how it was generated, the 17 abstract patterns that were used, as well as results for prominent SPARQL engines.

  • Aidan Hogan, Cristian Riveros, Carlos Rojas and Adrián Soto. "A Worst-Case Optimal Join Algorithm for SPARQL". In the Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, October 26–30, 2019.
Search
Clear search
Close search
Google apps
Main menu