WikiWebQuestions: a highquality question answering benchmark for Wikidata.
./training_data/best.json
For more detail see https://github.com/stanford-oval/wikidata-emnlp23 and https://github.com/stanford-oval/wikidata-emnlp23
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset accompanies the paper: When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction It includes the original Wikidata questions used in our experiments, with train/test split. For a detailed explanation of the dataset construction and usage, please refer to the paper. Code: https://github.com/ayyyq/llm-retraction
Citation
@misc{yang2025llmsadmitmistakesunderstanding, title={When Do LLMs Admit Their Mistakes? Understanding the Role of… See the full description on the dataset page: https://huggingface.co/datasets/ayyyq/Wikidata.
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
This data dump of Wikidata is published to allow fair and replicable evaluation of KGQA systems with the QALD-10 benchmark. QALD-10 is newly released and was used in the QALD-10 Challenge. Anyone interested in evaluating their KGQA systems with QALD-10 can download this dump and set up a local Wikidata endpoint in their server.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We present RuBQ (pronounced [`rubik]) -- Russian Knowledge Base Questions, a KBQA dataset that consists of 1,500 Russian questions of varying complexity along with their English machine translations, corresponding SPARQL queries, answers, as well as a subset of Wikidata covering entities with Russian labels. 300 RuBQ questions are unanswerable, which poses a new challenge for KBQA systems and makes the task more realistic. The dataset is based on a collection of quiz questions. The data generation pipeline combines automatic processing, crowdsourced and in-house verification, see details in the paper. To the best of our knowledge, this is the first Russian KBQA and semantic parsing dataset.
ISWC 2020 paper (newest) :page_facing_up:
arXiv paper :page_facing_up:
RuWikidata sample
Dataset is also published on Zenodo
The dataset is thought to be used as a development and test sets in cross-lingual transfer, few-shot learning, or learning with synthetic data scenarios.
Data set files are presented in JSON format as an array of dictionary entries. See full specifications here.
Question | Query | Answers | Tags |
---|---|---|---|
Rus: Кто написал роман «Хижина дяди Тома»? Eng: Who wrote the novel "Uncle Tom's Cabin"? | SELECT ?answer | wd:Q102513 (Harriet Beecher Stowe) | 1-hop |
Rus: Кто сыграл князя Андрея Болконского в фильме С. Ф. Бондарчука «Война и мир»? Eng: Who played Prince Andrei Bolkonsky in S. F. Bondarchuk's film "War and peace"? | SELECT ?answer | wd:Q312483 (Vyacheslav Tikhonov) | qualifier-constraint |
Rus: Кто на работе пользуется теодолитом? Eng: Who uses a theodolite for work? | SELECT ?answer | wd:Q1734662 (cartographer) wd:Q11699606 (geodesist) wd:Q294126 (land surveyor) | multi-hop |
Rus: Какой океан самый маленький? Eng: Which ocean is the smallest? | SELECT ?answer | wd:Q788 (Arctic Ocean) | multi-constraint reverse ranking |
We provide a Wikidata sample containing all the entities with Russian labels. It consists of about 212M triples with 8.1M unique entities. This snapshot mitigates the problem of Wikidata’s dynamics – a reference answer may change with time as the knowledge base evolves. The sample guarantees the correctness of the queries and answers. In addition, the smaller dump makes it easier to conduct experiments with our dataset.
We strongly recommend using this sample for evaluation.
Sample is a collection of several RDF files in Turtle.
wdt_all.ttl
contains all the truthy statements.names.ttl
contains Russian and English labels and aliases for all entities. Names in other language also provided when needed.onto.ttl
contains all Wikidata triples with relation wdt:P279
- subclass of. It represents some class hierarchy, but remember that there is no class or instance concepts in Wikidata.pch_{0,6}.ttl
contain all statetment nodes and their data for all entities.Some question in our dataset require using rdfs:label or skos:altLabel for retrieving answer which is a literal. In cases where answer language doesn't have to be inferred from question, our evaluation script takes into account Russian literals only.
If you use RuBQ dataset in your work, please cite:
@inproceedings{RuBQ2020,
title={{RuBQ}: A {Russian} Dataset for Question Answering over {Wikidata}},
author={Vladislav Korablinov and Pavel Braslavski},
booktitle={ISWC},
year={2020},
pages={97--110}
}
This work is licensed under a "http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a snapshot of QAWiki from 2025-09-09: a dataset for knowledge graph question answering (KGQA) and/or SPARQL query generation over Wikidata.
The dataset is presented in two formats:
The dataset contains 518 question/query pairs in English and Spanish with SPARQL queries (and 8 additional ambiguous questions without queries). Some questions also feature Italian and Danish translations provided by the community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WD50K dataset: An hyper-relational dataset derived from Wikidata statements. The dataset is constructed by the following procedure based on the [Wikidata RDF dump](https://dumps.wikimedia.org/wikidatawiki/20190801/) of August 2019: - A set of seed nodes corresponding to entities from FB15K-237 having a direct mapping in Wikidata (P646 "Freebase ID") is extracted from the dump. - For each seed node, all statements whose main object and qualifier values corresponding to wikibase:Item are extracted from the dump. - All literals are filtered out from the qualifiers of the above obtained statements. - All the entities from the dataset which have less than two mentions are dropped. The statements corresponding to the dropped entities are also dropped. - The remaining statements are randomly split into the train, test, and validation sets. - All statements from train and validation sets are removed which share the same main triple (s,p,o) with test statements. - WD50k_33, WD50k_66, WD50k_100 are then sampled from the above statements. Here 33, 66, 100 represents the amount of hyper-relational facts (statements with qualifiers) in the dataset. The table below provides some basic statistics of our dataset and its three further variations: | Dataset | Statements | w/Quals (%) | Entities | Relations | E only in Quals | R only in Quals | Train | Valid | Test | |-------------|------------|----------------|----------|-----------|-----------------|-----------------|---------|--------|--------| | WD50K | 236,507 | 32,167 (13.6%) | 47,156 | 532 | 5460 | 45 | 166,435 | 23,913 | 46,159 | | WD50K (33) | 102,107 | 31,866 (31.2%) | 38,124 | 475 | 6463 | 47 | 73,406 | 10,668 | 18,133 | | WD50K (66) | 49,167 | 31,696 (64.5%) | 27,347 | 494 | 7167 | 53 | 35,968 | 5,154 | 8,045 | | WD50K (100) | 31,314 | 31,314 (100%) | 18,792 | 279 | 7862 | 75 | 22,738 | 3,279 | 5,297 |
When using the dataset please cite:
@inproceedings{StarE,
title={Message Passing for Hyper-Relational Knowledge Graphs},
author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
booktitle={EMNLP},
year={2020}
}
For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de
The first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on. Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG. This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on. The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature. Starting from the latest advances in this field, we want to make a further step towards this direction. The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language. This dataset has also been used to evaluate three QA systems available at the state of the art.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ConvQuestions is the first realistic benchmark for conversational question answering over knowledge graphs. It contains 11,200 conversations which can be evaluated over Wikidata. The questions feature a variety of complex question phenomena like comparisons, aggregations, compositionality, and temporal reasoning.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
For questions or use cases calling for large, multi-use aggregate data files, please visit the EOL Services forum at
http://discuss.eol.org/c/eol-servicesKGConv, a Conversational Corpus grounded in Wikidata
Dataset Summary
KGConv is a large corpus of 71k english conversations where each question-answer pair is grounded in a Wikidata fact. The conversations were generated automatically: in particular, questions were created using a collection of 10,355 templates; subsequently, the naturalness of conversations was improved by inserting ellipses and coreference into questions, via both handcrafted rules and a generative… See the full description on the dataset page: https://huggingface.co/datasets/Orange/KGConv.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities. Full details on the Mintaka dataset can be found in the paper: https://aclanthology.org/2022.coling-1.138/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. Please see our paper for details about the dataset creation process and framework.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
QALD-9-plus Dataset Description
QALD-9-plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9. QALD-9-plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 9 different languages: English, German, Russian, French, Armenian, Belarusian, Lithuanian, Bashkir, and Ukrainian. Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems… See the full description on the dataset page: https://huggingface.co/datasets/casey-martin/qald_9_plus.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
EntityQuestions is a dataset of simple, entity-rich questions based on facts from Wikidata (e.g., "Where was Arve Furset born? ").
This dataset contains a set of event sub types for Motions, Questions and Answers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
This kind of system needs to translate the question of the user, written using natural language, into a query formulated through a data query language that is compliant with the underlying KG.
The translation process is already non-trivial to solve even when trying to answer simple questions that involve a single triple pattern but becomes troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction by giving a comprehensive description of this topic and the main issues revolving around it and making publicly available a dataset designed to evaluate the performance of a QA system in translating such articulated questions into a specific data query language.
This dataset has also been used to evaluate the best QA systems available at the state of the art.
WikiWebQuestions: a highquality question answering benchmark for Wikidata.
./training_data/best.json
For more detail see https://github.com/stanford-oval/wikidata-emnlp23 and https://github.com/stanford-oval/wikidata-emnlp23