84 datasets found

Stanford OVAL WikiWeb Questions (WikiData)
kaggle.com
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Mooney (2025). Stanford OVAL WikiWeb Questions (WikiData) [Dataset]. https://www.kaggle.com/datasets/paultimothymooney/stanford-oval-wikiweb-questions/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Paul Mooney
Description
WikiWebQuestions: a highquality question answering benchmark for Wikidata.

./training_data/best.json

For more detail see https://github.com/stanford-oval/wikidata-emnlp23 and https://github.com/stanford-oval/wikidata-emnlp23
h
Wikidata
huggingface.co
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuqing Yang (2025). Wikidata [Dataset]. https://huggingface.co/datasets/ayyyq/Wikidata
Explore at:
Dataset updated
May 23, 2025
Authors
Yuqing Yang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset accompanies the paper: When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction It includes the original Wikidata questions used in our experiments, with train/test split. For a detailed explanation of the dataset construction and usage, please refer to the paper. Code: https://github.com/ayyyq/llm-retraction

Citation

@misc{yang2025llmsadmitmistakesunderstanding, title={When Do LLMs Admit Their Mistakes? Understanding the Role of… See the full description on the dataset page: https://huggingface.co/datasets/ayyyq/Wikidata.
QALD-10 Wikidata Dump
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Usbeck; Ricardo Usbeck; Xi Yan; Aleksandr Perevalov; Longquan Jiang; Longquan Jiang; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both; Xi Yan; Aleksandr Perevalov; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both (2023). QALD-10 Wikidata Dump [Dataset]. http://doi.org/10.5281/zenodo.7496690
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7496690
Dataset updated
Jan 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ricardo Usbeck; Ricardo Usbeck; Xi Yan; Aleksandr Perevalov; Longquan Jiang; Longquan Jiang; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both; Xi Yan; Aleksandr Perevalov; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Description
This data dump of Wikidata is published to allow fair and replicable evaluation of KGQA systems with the QALD-10 benchmark. QALD-10 is newly released and was used in the QALD-10 Challenge. Anyone interested in evaluating their KGQA systems with QALD-10 can download this dump and set up a local Wikidata endpoint in their server.

RuBQ 1.0

kaggle.com

Updated Aug 9, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Valentin Biryukov (2021). RuBQ 1.0 [Dataset]. https://www.kaggle.com/valentinbiryukov/rubq-10/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 9, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Valentin Biryukov

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

RuBQ 1.0: A Russian Knowledge Base Question Answering Data Set

Introduction

We present RuBQ (pronounced [`rubik]) -- Russian Knowledge Base Questions, a KBQA dataset that consists of 1,500 Russian questions of varying complexity along with their English machine translations, corresponding SPARQL queries, answers, as well as a subset of Wikidata covering entities with Russian labels. 300 RuBQ questions are unanswerable, which poses a new challenge for KBQA systems and makes the task more realistic. The dataset is based on a collection of quiz questions. The data generation pipeline combines automatic processing, crowdsourced and in-house verification, see details in the paper. To the best of our knowledge, this is the first Russian KBQA and semantic parsing dataset.

Links

ISWC 2020 paper (newest) :page_facing_up:

arXiv paper :page_facing_up:

Test and Dev subsets

RuWikidata sample

Dataset is also published on Zenodo

Usage

The dataset is thought to be used as a development and test sets in cross-lingual transfer, few-shot learning, or learning with synthetic data scenarios.

Format

Data set files are presented in JSON format as an array of dictionary entries. See full specifications here.

Examples

Question	Query	Answers	Tags
Rus: Кто написал роман «Хижина дяди Тома»? Eng: Who wrote the novel "Uncle Tom's Cabin"?	SELECT ?answer WHERE { wd:Q2222 wdt:P50 ?answer . }	wd:Q102513 (Harriet Beecher Stowe)	1-hop
Rus: Кто сыграл князя Андрея Болконского в фильме С. Ф. Бондарчука «Война и мир»? Eng: Who played Prince Andrei Bolkonsky in S. F. Bondarchuk's film "War and peace"?	SELECT ?answer WHERE { wd:Q845176 p:P161 [ ps:P161 ?answer; pq:P453 wd:Q2737140 ] . }	wd:Q312483 (Vyacheslav Tikhonov)	qualifier-constraint
Rus: Кто на работе пользуется теодолитом? Eng: Who uses a theodolite for work?	SELECT ?answer WHERE { wd:Q181517 wdt:P366 [ wdt:P3095 ?answer ] . }	wd:Q1734662 (cartographer) wd:Q11699606 (geodesist) wd:Q294126 (land surveyor)	multi-hop
Rus: Какой океан самый маленький? Eng: Which ocean is the smallest?	SELECT ?answer WHERE { ?answer p:P2046/ psn:P2046/ wikibase:quantityAmount ?sq . ?answer wdt:P31 wd:Q9430 . } ORDER BY ASC(?sq) LIMIT 1	wd:Q788 (Arctic Ocean)	multi-constraint reverse ranking

RuWikidata8M Sample

We provide a Wikidata sample containing all the entities with Russian labels. It consists of about 212M triples with 8.1M unique entities. This snapshot mitigates the problem of Wikidata’s dynamics – a reference answer may change with time as the knowledge base evolves. The sample guarantees the correctness of the queries and answers. In addition, the smaller dump makes it easier to conduct experiments with our dataset.

We strongly recommend using this sample for evaluation.

Details

Sample is a collection of several RDF files in Turtle.

wdt_all.ttl contains all the truthy statements.
names.ttl contains Russian and English labels and aliases for all entities. Names in other language also provided when needed.
onto.ttl contains all Wikidata triples with relation wdt:P279 - subclass of. It represents some class hierarchy, but remember that there is no class or instance concepts in Wikidata.
pch_{0,6}.ttl contain all statetment nodes and their data for all entities.

Evaluation

rdfs:label and skos:altLabel predicates convention

Some question in our dataset require using rdfs:label or skos:altLabel for retrieving answer which is a literal. In cases where answer language doesn't have to be inferred from question, our evaluation script takes into account Russian literals only.

Reference

If you use RuBQ dataset in your work, please cite:

@inproceedings{RuBQ2020,
 title={{RuBQ}: A {Russian} Dataset for Question Answering over {Wikidata}},
 author={Vladislav Korablinov and Pavel Braslavski},
 booktitle={ISWC},
 year={2020},
 pages={97--110}
}

This work is licensed under a "http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License.

QALD-9-Plus
figshare.com
opendatalab.com
txt
Updated Dec 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck (2021). QALD-9-Plus [Dataset]. http://doi.org/10.6084/m9.figshare.16864273.v7
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16864273.v7
Dataset updated
Dec 21, 2021
Dataset provided by
figshare
Authors
Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus
QAWiki v1: Knowledge Graph Question Answering (KGQA) / SPARQL Query...
zenodo.org
bin, tsv
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Moya Loustaunau; Aidan Hogan; Aidan Hogan; Alberto Moya Loustaunau (2025). QAWiki v1: Knowledge Graph Question Answering (KGQA) / SPARQL Query Generation Dataset for Wikidata [Dataset]. http://doi.org/10.5281/zenodo.16787599
Explore at:
bin, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16787599
Dataset updated
Aug 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Moya Loustaunau; Aidan Hogan; Aidan Hogan; Alberto Moya Loustaunau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a snapshot of QAWiki from 2025-09-09: a dataset for knowledge graph question answering (KGQA) and/or SPARQL query generation over Wikidata.

The dataset is presented in two formats:

The simple format is a TSV file, and contains language-tagged questions and paraphrased questions with SPARQL queries.

The full format is a TTL file, and contains a full RDF dump of QAWiki featuring also entity mentions, relation mentions, question relations, quality tags, etc.

The dataset contains 518 question/query pairs in English and Spanish with SPARQL queries (and 8 additional ambiguous questions without queries). Some questions also feature Italian and Danish translations provided by the community.

WD50K

zenodo.org
data.niaid.nih.gov

zip

Updated Jan 7, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens; Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens (2021). WD50K [Dataset]. http://doi.org/10.5281/zenodo.4036498

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4036498

Dataset updated

Jan 7, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens; Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

WD50K dataset: An hyper-relational dataset derived from Wikidata statements.

The dataset is constructed by the following procedure based on the [Wikidata RDF dump](https://dumps.wikimedia.org/wikidatawiki/20190801/) of August 2019:

- A set of seed nodes corresponding to entities from FB15K-237 having a direct mapping in Wikidata (P646 "Freebase ID") is extracted from the dump.
- For each seed node, all statements whose main object and qualifier values corresponding to wikibase:Item are extracted from the dump.
- All literals are filtered out from the qualifiers of the above obtained statements.
- All the entities from the dataset which have less than two mentions are dropped. The statements corresponding to the dropped entities are also dropped.
- The remaining statements are randomly split into the train, test, and validation sets.
- All statements from train and validation sets are removed which share the same main triple (s,p,o) with test statements.
- WD50k_33, WD50k_66, WD50k_100 are then sampled from the above statements. Here 33, 66, 100 represents the amount of hyper-relational facts (statements with qualifiers) in the dataset.


The table below provides some basic statistics of our dataset and its three further variations:

| Dataset   | Statements | w/Quals (%)  | Entities | Relations | E only in Quals | R only in Quals | Train  | Valid | Test  |
|-------------|------------|----------------|----------|-----------|-----------------|-----------------|---------|--------|--------|
| WD50K    | 236,507  | 32,167 (13.6%) | 47,156  | 532    | 5460      | 45       | 166,435 | 23,913 | 46,159 |
| WD50K (33) | 102,107  | 31,866 (31.2%) | 38,124  | 475    | 6463      | 47       | 73,406 | 10,668 | 18,133 |
| WD50K (66) | 49,167  | 31,696 (64.5%) | 27,347  | 494    | 7167      | 53       | 35,968 | 5,154 | 8,045 |
| WD50K (100) | 31,314  | 31,314 (100%) | 18,792  | 279    | 7862      | 75       | 22,738 | 3,279 | 5,297 |

When using the dataset please cite:


@inproceedings{StarE,
 title={Message Passing for Hyper-Relational Knowledge Graphs},
 author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
 booktitle={EMNLP},
 year={2020}
}

For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de

P
RuBQ Dataset
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vladislav Korablinov; Pavel Braslavski, RuBQ Dataset [Dataset]. https://library.toponeai.link/dataset/rubq
Explore at:
Authors
Vladislav Korablinov; Pavel Braslavski
Description
The first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
n
MQALD
data.niaid.nih.gov
Updated Apr 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierpaolo Basile (2021). MQALD [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3746634
Explore at:
Dataset updated
Apr 2, 2021
Dataset provided by
Pierpaolo Basile
Lucia Siciliani
Pasquale Lops
Description
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on. Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG. This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on. The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature. Starting from the latest advances in this field, we want to make a further step towards this direction. The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language. This dataset has also been used to evaluate three QA systems available at the state of the art.
h
conv_questions
huggingface.co
opendatalab.com
Updated Oct 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Christmann (2019). conv_questions [Dataset]. https://huggingface.co/datasets/pchristm/conv_questions
Explore at:
Dataset updated
Oct 9, 2019
Authors
Philipp Christmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ConvQuestions is the first realistic benchmark for conversational question answering over knowledge graphs. It contains 11,200 conversations which can be evaluated over Wikidata. The questions feature a variety of complex question phenomena like comparisons, aggregations, compositionality, and temporal reasoning.
WikiData: 2017 03 06 wikidata dump stats
zenodo.org
bin, txt
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Encyclopedia of Life; Encyclopedia of Life (2024). WikiData: 2017 03 06 wikidata dump stats [Dataset]. http://doi.org/10.5281/zenodo.13283255
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13283255
Dataset updated
Nov 24, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Encyclopedia of Life; Encyclopedia of Life
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Time period covered
Mar 22, 2017
Description
For questions or use cases calling for large, multi-use aggregate data files, please visit the EOL Services forum at
http://discuss.eol.org/c/eol-services
h
KGConv
huggingface.co
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orange (2024). KGConv [Dataset]. https://huggingface.co/datasets/Orange/KGConv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2024
Dataset authored and provided by
Orange
Description
KGConv, a Conversational Corpus grounded in Wikidata

Dataset Summary

KGConv is a large corpus of 71k english conversations where each question-answer pair is grounded in a Wikidata fact. The conversations were generated automatically: in particular, questions were created using a collection of 10,355 templates; subsequently, the naturalness of conversations was improved by inserting ellipses and coreference into questions, via both handcrafted rules and a generative… See the full description on the dataset page: https://huggingface.co/datasets/Orange/KGConv.
Amazon Mintaka
kaggle.com
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshuman Mishra (2022). Amazon Mintaka [Dataset]. https://www.kaggle.com/datasets/shivanshuman/amazon-research-mintaka-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 16, 2022
Dataset provided by
Kaggle
Authors
Anshuman Mishra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities. Full details on the Mintaka dataset can be found in the paper: https://aclanthology.org/2022.coling-1.138/
Data from: mintaka
opendatalab.com
huggingface.co
zip
Updated Feb 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Alexa AI (2023). mintaka [Dataset]. https://opendatalab.com/OpenDataLab/mintaka
Explore at:
zipAvailable download formats
Dataset updated
Feb 7, 2023
Dataset provided by
亚马逊http://amazon.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities.
h
lc_quad
huggingface.co
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohnish Dubey (2024). lc_quad [Dataset]. https://huggingface.co/datasets/mohnish/lc_quad
Explore at:
Dataset updated
May 17, 2024
Authors
Mohnish Dubey
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. Please see our paper for details about the dataset creation process and framework.
T-REx
opendatalab.com
zip
Updated Aug 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Southampton (2022). T-REx [Dataset]. https://opendatalab.com/OpenDataLab/T-REx
Explore at:
zip(54879867845 bytes)Available download formats
Dataset updated
Aug 31, 2022
Dataset provided by
里昂大学http://www.universite-lyon.fr/
University of Southampton
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
A dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences).
h
qald_9_plus
huggingface.co
Updated Dec 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey (2024). qald_9_plus [Dataset]. https://huggingface.co/datasets/casey-martin/qald_9_plus
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2024
Authors
Casey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QALD-9-plus Dataset Description

QALD-9-plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9. QALD-9-plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 9 different languages: English, German, Russian, French, Armenian, Belarusian, Lithuanian, Bashkir, and Ukrainian. Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems… See the full description on the dataset page: https://huggingface.co/datasets/casey-martin/qald_9_plus.
O
EntityQuestions
opendatalab.com
zip
Updated May 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton University (2023). EntityQuestions [Dataset]. https://opendatalab.com/OpenDataLab/EntityQuestions
Explore at:
zip(38007059 bytes)Available download formats
Dataset updated
May 1, 2023
Dataset provided by
Princeton University
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
EntityQuestions is a dataset of simple, entity-rich questions based on facts from Wikidata (e.g., "Where was Arve Furset born? ").
Scottish Parliament - Motions, Questions and Answers: Question and Motion...
dtechtive.com
find.data.gov.scot
json, xml
Updated Sep 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Scottish Parliament (2023). Scottish Parliament - Motions, Questions and Answers: Question and Motion subtypes [Dataset]. https://dtechtive.com/datasets/25229
Explore at:
json(0.0015 MB), xml(0.0031 MB)Available download formats
Dataset updated
Sep 3, 2023
Dataset provided by
Scottish Parliamenthttp://parliament.scot/
Area covered
Scotland
Description
This dataset contains a set of event sub types for Motions, Questions and Answers.
MQALD
zenodo.org
application/gzip
Updated Apr 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Siciliani; Lucia Siciliani; Pierpaolo Basile; Pierpaolo Basile; Pasquale Lops; Pasquale Lops; Giovanni Semeraro; Giovanni Semeraro (2021). MQALD [Dataset]. http://doi.org/10.5281/zenodo.3746635
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3746635
Dataset updated
Apr 1, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lucia Siciliani; Lucia Siciliani; Pierpaolo Basile; Pierpaolo Basile; Pasquale Lops; Pasquale Lops; Giovanni Semeraro; Giovanni Semeraro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
This kind of system needs to translate the question of the user, written using natural language, into a query formulated through a data query language that is compliant with the underlying KG.
The translation process is already non-trivial to solve even when trying to answer simple questions that involve a single triple pattern but becomes troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction by giving a comprehensive description of this topic and the main issues revolving around it and making publicly available a dataset designed to evaluate the performance of a QA system in translating such articulated questions into a specific data query language.
This dataset has also been used to evaluate the best QA systems available at the state of the art.

Facebook

Twitter

Click to copy link

Link copied

Cite

Paul Mooney (2025). Stanford OVAL WikiWeb Questions (WikiData) [Dataset]. https://www.kaggle.com/datasets/paultimothymooney/stanford-oval-wikiweb-questions/discussion

Stanford OVAL WikiWeb Questions (WikiData)

WikiWebQuestions: a highquality question answering benchmark for Wikidata.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 21, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Paul Mooney

Description

WikiWebQuestions: a highquality question answering benchmark for Wikidata.

./training_data/best.json

For more detail see https://github.com/stanford-oval/wikidata-emnlp23 and https://github.com/stanford-oval/wikidata-emnlp23

Clear search

Close search

Google apps

Main menu

Stanford OVAL WikiWeb Questions (WikiData)

Wikidata

QALD-10 Wikidata Dump

RuBQ 1.0

RuBQ 1.0: A Russian Knowledge Base Question Answering Data Set

Introduction

Links

Usage

Format

Examples

RuWikidata8M Sample

Details

Evaluation

rdfs:label and skos:altLabel predicates convention

Reference

QALD-9-Plus

QAWiki v1: Knowledge Graph Question Answering (KGQA) / SPARQL Query...

WD50K

RuBQ Dataset

MQALD

conv_questions

WikiData: 2017 03 06 wikidata dump stats

KGConv

Amazon Mintaka

Data from: mintaka

lc_quad

T-REx

qald_9_plus

EntityQuestions

Scottish Parliament - Motions, Questions and Answers: Question and Motion...

MQALD

Stanford OVAL WikiWeb Questions (WikiData)

WikiWebQuestions: a highquality question answering benchmark for Wikidata.