84 datasets found
  1. Stanford OVAL WikiWeb Questions (WikiData)

    • kaggle.com
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Mooney (2025). Stanford OVAL WikiWeb Questions (WikiData) [Dataset]. https://www.kaggle.com/datasets/paultimothymooney/stanford-oval-wikiweb-questions/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Paul Mooney
    Description

    WikiWebQuestions: a highquality question answering benchmark for Wikidata.

    ./training_data/best.json
    

    For more detail see https://github.com/stanford-oval/wikidata-emnlp23 and https://github.com/stanford-oval/wikidata-emnlp23

  2. h

    Wikidata

    • huggingface.co
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuqing Yang (2025). Wikidata [Dataset]. https://huggingface.co/datasets/ayyyq/Wikidata
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Yuqing Yang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset accompanies the paper: When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction It includes the original Wikidata questions used in our experiments, with train/test split. For a detailed explanation of the dataset construction and usage, please refer to the paper. Code: https://github.com/ayyyq/llm-retraction

      Citation
    

    @misc{yang2025llmsadmitmistakesunderstanding, title={When Do LLMs Admit Their Mistakes? Understanding the Role of… See the full description on the dataset page: https://huggingface.co/datasets/ayyyq/Wikidata.

  3. QALD-10 Wikidata Dump

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo Usbeck; Ricardo Usbeck; Xi Yan; Aleksandr Perevalov; Longquan Jiang; Longquan Jiang; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both; Xi Yan; Aleksandr Perevalov; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both (2023). QALD-10 Wikidata Dump [Dataset]. http://doi.org/10.5281/zenodo.7496690
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ricardo Usbeck; Ricardo Usbeck; Xi Yan; Aleksandr Perevalov; Longquan Jiang; Longquan Jiang; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both; Xi Yan; Aleksandr Perevalov; Julius Schulz; Angelie Kraft; Cedric Möller; Junbo Huang; Jan Reineke; Axel-Cyrille Ngonga Ngomo; Muhammad Saleem; Andreas Both
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This data dump of Wikidata is published to allow fair and replicable evaluation of KGQA systems with the QALD-10 benchmark. QALD-10 is newly released and was used in the QALD-10 Challenge. Anyone interested in evaluating their KGQA systems with QALD-10 can download this dump and set up a local Wikidata endpoint in their server.

  4. RuBQ 1.0

    • kaggle.com
    Updated Aug 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Biryukov (2021). RuBQ 1.0 [Dataset]. https://www.kaggle.com/valentinbiryukov/rubq-10/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Valentin Biryukov
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    RuBQ 1.0: A Russian Knowledge Base Question Answering Data Set

    Introduction

    We present RuBQ (pronounced [`rubik]) -- Russian Knowledge Base Questions, a KBQA dataset that consists of 1,500 Russian questions of varying complexity along with their English machine translations, corresponding SPARQL queries, answers, as well as a subset of Wikidata covering entities with Russian labels. 300 RuBQ questions are unanswerable, which poses a new challenge for KBQA systems and makes the task more realistic. The dataset is based on a collection of quiz questions. The data generation pipeline combines automatic processing, crowdsourced and in-house verification, see details in the paper. To the best of our knowledge, this is the first Russian KBQA and semantic parsing dataset.

    Links

    ISWC 2020 paper (newest) :page_facing_up:

    arXiv paper :page_facing_up:

    Test and Dev subsets

    RuWikidata sample

    Dataset is also published on Zenodo

    Usage

    The dataset is thought to be used as a development and test sets in cross-lingual transfer, few-shot learning, or learning with synthetic data scenarios.

    Format

    Data set files are presented in JSON format as an array of dictionary entries. See full specifications here.

    Examples

    QuestionQueryAnswersTags
    Rus: Кто написал роман «Хижина дяди Тома»?

    Eng: Who wrote the novel "Uncle Tom's Cabin"?
    SELECT ?answer 
    WHERE {
    wd:Q2222 wdt:P50 ?answer .
    }
    wd:Q102513
    (Harriet Beecher Stowe)
    1-hop
    Rus: Кто сыграл князя Андрея Болконского в фильме С. Ф. Бондарчука «Война и мир»?

    Eng: Who played Prince Andrei Bolkonsky in S. F. Bondarchuk's film "War and peace"?
    SELECT ?answer
    WHERE {
    wd:Q845176 p:P161 [
    ps:P161 ?answer;
    pq:P453 wd:Q2737140
    ] .
    }
    wd:Q312483
    (Vyacheslav Tikhonov)
    qualifier-constraint
    Rus: Кто на работе пользуется теодолитом?

    Eng: Who uses a theodolite for work?
    SELECT ?answer 
    WHERE {
    wd:Q181517 wdt:P366 [
    wdt:P3095 ?answer
    ] .
    }
    wd:Q1734662
    (cartographer)
    wd:Q11699606
    (geodesist)
    wd:Q294126
    (land surveyor)
    multi-hop
    Rus: Какой океан самый маленький?

    Eng: Which ocean is the smallest?
    SELECT ?answer 
    WHERE {
    ?answer p:P2046/
    psn:P2046/
    wikibase:quantityAmount ?sq .
    ?answer wdt:P31 wd:Q9430 .
    }
    ORDER BY ASC(?sq)
    LIMIT 1
    wd:Q788
    (Arctic Ocean)
    multi-constraint

    reverse

    ranking

    RuWikidata8M Sample

    We provide a Wikidata sample containing all the entities with Russian labels. It consists of about 212M triples with 8.1M unique entities. This snapshot mitigates the problem of Wikidata’s dynamics – a reference answer may change with time as the knowledge base evolves. The sample guarantees the correctness of the queries and answers. In addition, the smaller dump makes it easier to conduct experiments with our dataset.

    We strongly recommend using this sample for evaluation.

    Details

    Sample is a collection of several RDF files in Turtle.

    • wdt_all.ttl contains all the truthy statements.
    • names.ttl contains Russian and English labels and aliases for all entities. Names in other language also provided when needed.
    • onto.ttl contains all Wikidata triples with relation wdt:P279 - subclass of. It represents some class hierarchy, but remember that there is no class or instance concepts in Wikidata.
    • pch_{0,6}.ttl contain all statetment nodes and their data for all entities.

    Evaluation

    rdfs:label and skos:altLabel predicates convention

    Some question in our dataset require using rdfs:label or skos:altLabel for retrieving answer which is a literal. In cases where answer language doesn't have to be inferred from question, our evaluation script takes into account Russian literals only.

    Reference

    If you use RuBQ dataset in your work, please cite:

    @inproceedings{RuBQ2020,
     title={{RuBQ}: A {Russian} Dataset for Question Answering over {Wikidata}},
     author={Vladislav Korablinov and Pavel Braslavski},
     booktitle={ISWC},
     year={2020},
     pages={97--110}
    }
    

    This work is licensed under a "http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License.

    CC BY-SA 4.0

  5. QALD-9-Plus

    • figshare.com
    • opendatalab.com
    txt
    Updated Dec 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck (2021). QALD-9-Plus [Dataset]. http://doi.org/10.6084/m9.figshare.16864273.v7
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 21, 2021
    Dataset provided by
    figshare
    Authors
    Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus

  6. QAWiki v1: Knowledge Graph Question Answering (KGQA) / SPARQL Query...

    • zenodo.org
    bin, tsv
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Moya Loustaunau; Aidan Hogan; Aidan Hogan; Alberto Moya Loustaunau (2025). QAWiki v1: Knowledge Graph Question Answering (KGQA) / SPARQL Query Generation Dataset for Wikidata [Dataset]. http://doi.org/10.5281/zenodo.16787599
    Explore at:
    bin, tsvAvailable download formats
    Dataset updated
    Aug 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Moya Loustaunau; Aidan Hogan; Aidan Hogan; Alberto Moya Loustaunau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a snapshot of QAWiki from 2025-09-09: a dataset for knowledge graph question answering (KGQA) and/or SPARQL query generation over Wikidata.

    The dataset is presented in two formats:

    • The simple format is a TSV file, and contains language-tagged questions and paraphrased questions with SPARQL queries.
    • The full format is a TTL file, and contains a full RDF dump of QAWiki featuring also entity mentions, relation mentions, question relations, quality tags, etc.

    The dataset contains 518 question/query pairs in English and Spanish with SPARQL queries (and 8 additional ambiguous questions without queries). Some questions also feature Italian and Danish translations provided by the community.

  7. WD50K

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens; Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens (2021). WD50K [Dataset]. http://doi.org/10.5281/zenodo.4036498
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens; Galkin Mikhail; Trivedi Priyansh; Maheshwari Gaurav; Usbeck Ricardo; Lehmann Jens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    WD50K dataset: An hyper-relational dataset derived from Wikidata statements.
    
    The dataset is constructed by the following procedure based on the [Wikidata RDF dump](https://dumps.wikimedia.org/wikidatawiki/20190801/) of August 2019:
    
    - A set of seed nodes corresponding to entities from FB15K-237 having a direct mapping in Wikidata (P646 "Freebase ID") is extracted from the dump.
    - For each seed node, all statements whose main object and qualifier values corresponding to wikibase:Item are extracted from the dump.
    - All literals are filtered out from the qualifiers of the above obtained statements.
    - All the entities from the dataset which have less than two mentions are dropped. The statements corresponding to the dropped entities are also dropped.
    - The remaining statements are randomly split into the train, test, and validation sets.
    - All statements from train and validation sets are removed which share the same main triple (s,p,o) with test statements.
    - WD50k_33, WD50k_66, WD50k_100 are then sampled from the above statements. Here 33, 66, 100 represents the amount of hyper-relational facts (statements with qualifiers) in the dataset.
    
    
    The table below provides some basic statistics of our dataset and its three further variations:
    
    | Dataset   | Statements | w/Quals (%)  | Entities | Relations | E only in Quals | R only in Quals | Train  | Valid | Test  |
    |-------------|------------|----------------|----------|-----------|-----------------|-----------------|---------|--------|--------|
    | WD50K    | 236,507  | 32,167 (13.6%) | 47,156  | 532    | 5460      | 45       | 166,435 | 23,913 | 46,159 |
    | WD50K (33) | 102,107  | 31,866 (31.2%) | 38,124  | 475    | 6463      | 47       | 73,406 | 10,668 | 18,133 |
    | WD50K (66) | 49,167  | 31,696 (64.5%) | 27,347  | 494    | 7167      | 53       | 35,968 | 5,154 | 8,045 |
    | WD50K (100) | 31,314  | 31,314 (100%) | 18,792  | 279    | 7862      | 75       | 22,738 | 3,279 | 5,297 |
    

    When using the dataset please cite:
    
    
    @inproceedings{StarE,
     title={Message Passing for Hyper-Relational Knowledge Graphs},
     author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
     booktitle={EMNLP},
     year={2020}
    }
    
    For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de
  8. P

    RuBQ Dataset

    • library.toponeai.link
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladislav Korablinov; Pavel Braslavski, RuBQ Dataset [Dataset]. https://library.toponeai.link/dataset/rubq
    Explore at:
    Authors
    Vladislav Korablinov; Pavel Braslavski
    Description

    The first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.

  9. n

    MQALD

    • data.niaid.nih.gov
    Updated Apr 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierpaolo Basile (2021). MQALD [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3746634
    Explore at:
    Dataset updated
    Apr 2, 2021
    Dataset provided by
    Pierpaolo Basile
    Lucia Siciliani
    Pasquale Lops
    Description

    Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on. Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG. This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on. The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature. Starting from the latest advances in this field, we want to make a further step towards this direction. The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language. This dataset has also been used to evaluate three QA systems available at the state of the art.

  10. h

    conv_questions

    • huggingface.co
    • opendatalab.com
    Updated Oct 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Christmann (2019). conv_questions [Dataset]. https://huggingface.co/datasets/pchristm/conv_questions
    Explore at:
    Dataset updated
    Oct 9, 2019
    Authors
    Philipp Christmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ConvQuestions is the first realistic benchmark for conversational question answering over knowledge graphs. It contains 11,200 conversations which can be evaluated over Wikidata. The questions feature a variety of complex question phenomena like comparisons, aggregations, compositionality, and temporal reasoning.

  11. WikiData: 2017 03 06 wikidata dump stats

    • zenodo.org
    bin, txt
    Updated Nov 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Encyclopedia of Life; Encyclopedia of Life (2024). WikiData: 2017 03 06 wikidata dump stats [Dataset]. http://doi.org/10.5281/zenodo.13283255
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Nov 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Encyclopedia of Life; Encyclopedia of Life
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Time period covered
    Mar 22, 2017
    Description

    For questions or use cases calling for large, multi-use aggregate data files, please visit the EOL Services forum at

    http://discuss.eol.org/c/eol-services

  12. h

    KGConv

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orange (2024). KGConv [Dataset]. https://huggingface.co/datasets/Orange/KGConv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Orange
    Description

    KGConv, a Conversational Corpus grounded in Wikidata

      Dataset Summary
    

    KGConv is a large corpus of 71k english conversations where each question-answer pair is grounded in a Wikidata fact. The conversations were generated automatically: in particular, questions were created using a collection of 10,355 templates; subsequently, the naturalness of conversations was improved by inserting ellipses and coreference into questions, via both handcrafted rules and a generative… See the full description on the dataset page: https://huggingface.co/datasets/Orange/KGConv.

  13. Amazon Mintaka

    • kaggle.com
    Updated Oct 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshuman Mishra (2022). Amazon Mintaka [Dataset]. https://www.kaggle.com/datasets/shivanshuman/amazon-research-mintaka-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2022
    Dataset provided by
    Kaggle
    Authors
    Anshuman Mishra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities. Full details on the Mintaka dataset can be found in the paper: https://aclanthology.org/2022.coling-1.138/

  14. Data from: mintaka

    • opendatalab.com
    • huggingface.co
    zip
    Updated Feb 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Alexa AI (2023). mintaka [Dataset]. https://opendatalab.com/OpenDataLab/mintaka
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 7, 2023
    Dataset provided by
    亚马逊http://amazon.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities.

  15. h

    lc_quad

    • huggingface.co
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohnish Dubey (2024). lc_quad [Dataset]. https://huggingface.co/datasets/mohnish/lc_quad
    Explore at:
    Dataset updated
    May 17, 2024
    Authors
    Mohnish Dubey
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. Please see our paper for details about the dataset creation process and framework.

  16. T-REx

    • opendatalab.com
    zip
    Updated Aug 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Southampton (2022). T-REx [Dataset]. https://opendatalab.com/OpenDataLab/T-REx
    Explore at:
    zip(54879867845 bytes)Available download formats
    Dataset updated
    Aug 31, 2022
    Dataset provided by
    里昂大学http://www.universite-lyon.fr/
    University of Southampton
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    A dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences).

  17. h

    qald_9_plus

    • huggingface.co
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey (2024). qald_9_plus [Dataset]. https://huggingface.co/datasets/casey-martin/qald_9_plus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Authors
    Casey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QALD-9-plus Dataset Description

    QALD-9-plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9. QALD-9-plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 9 different languages: English, German, Russian, French, Armenian, Belarusian, Lithuanian, Bashkir, and Ukrainian. Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems… See the full description on the dataset page: https://huggingface.co/datasets/casey-martin/qald_9_plus.

  18. O

    EntityQuestions

    • opendatalab.com
    zip
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton University (2023). EntityQuestions [Dataset]. https://opendatalab.com/OpenDataLab/EntityQuestions
    Explore at:
    zip(38007059 bytes)Available download formats
    Dataset updated
    May 1, 2023
    Dataset provided by
    Princeton University
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    EntityQuestions is a dataset of simple, entity-rich questions based on facts from Wikidata (e.g., "Where was Arve Furset born? ").

  19. Scottish Parliament - Motions, Questions and Answers: Question and Motion...

    • dtechtive.com
    • find.data.gov.scot
    json, xml
    Updated Sep 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Scottish Parliament (2023). Scottish Parliament - Motions, Questions and Answers: Question and Motion subtypes [Dataset]. https://dtechtive.com/datasets/25229
    Explore at:
    json(0.0015 MB), xml(0.0031 MB)Available download formats
    Dataset updated
    Sep 3, 2023
    Dataset provided by
    Scottish Parliamenthttp://parliament.scot/
    Area covered
    Scotland
    Description

    This dataset contains a set of event sub types for Motions, Questions and Answers.

  20. MQALD

    • zenodo.org
    application/gzip
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucia Siciliani; Lucia Siciliani; Pierpaolo Basile; Pierpaolo Basile; Pasquale Lops; Pasquale Lops; Giovanni Semeraro; Giovanni Semeraro (2021). MQALD [Dataset]. http://doi.org/10.5281/zenodo.3746635
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lucia Siciliani; Lucia Siciliani; Pierpaolo Basile; Pierpaolo Basile; Pasquale Lops; Pasquale Lops; Giovanni Semeraro; Giovanni Semeraro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
    This kind of system needs to translate the question of the user, written using natural language, into a query formulated through a data query language that is compliant with the underlying KG.
    The translation process is already non-trivial to solve even when trying to answer simple questions that involve a single triple pattern but becomes troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
    The attention over this aspect is growing but has never been thoroughly addressed by the existing literature.
    Starting from the latest advances in this field, we want to make a further step towards this direction by giving a comprehensive description of this topic and the main issues revolving around it and making publicly available a dataset designed to evaluate the performance of a QA system in translating such articulated questions into a specific data query language.
    This dataset has also been used to evaluate the best QA systems available at the state of the art.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Paul Mooney (2025). Stanford OVAL WikiWeb Questions (WikiData) [Dataset]. https://www.kaggle.com/datasets/paultimothymooney/stanford-oval-wikiweb-questions/discussion
Organization logo

Stanford OVAL WikiWeb Questions (WikiData)

WikiWebQuestions: a highquality question answering benchmark for Wikidata.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Paul Mooney
Description

WikiWebQuestions: a highquality question answering benchmark for Wikidata.

./training_data/best.json

For more detail see https://github.com/stanford-oval/wikidata-emnlp23 and https://github.com/stanford-oval/wikidata-emnlp23

Search
Clear search
Close search
Google apps
Main menu